LINQ Intersect() 2.7x faster with HashSet

Programer

Active Member
Reputation
0
Sometime one shouldn’t be too optimistic with .NET Fx optimization. I knew that the extension method IEnumerable.Count() was optimized to check if the enumerable was a collection, in such case the Collection.Count property was called instead (see Count() impl at the end of the post).
So when it comes to set operations such as Intersect Union… if one of the sequence involved was an HashSet, I assumed that the obvious optimization of using the O(1) HashSet.Contains() magic, was implemented. But actually, by decompiling I discovered that it was not implemented. A Set was created each time the code achieve a common set operation, no matter the type of sequences. In the case where one wishes to do set operations with an existing HashSet this is unfortunate. The gain one can expect is around 2.7, with variations depending on the sequences collateral size. This comes from the fact that with the HashSet optimization, the operation cost is O(other sequence size).

If the HashSet is bigger than the other sequence, gains factor can be high (like 15x).
If the HashSet is smaller than the other sequence, gains factor tends to 1x.
If the HashSet size is comparable to the other sequence size, the gain factor is around 2.7.

Here are quantitative measures:
nbElemHashsetA=1000  nbElemB=1000  intersectionSize=800  nbLoop=10000
Gain : 283%
nbElemHashsetA=1000  nbElemB=1000  intersectionSize=200  nbLoop=10000
Gain : 270%
nbElemHashsetA=1000  nbElemB=100  intersectionSize=50  nbLoop=10000
Gain : 1539%
nbElemHashsetA=100  nbElemB=1000  intersectionSize=50  nbLoop=10000
Gain : 127%
Here is the code optimization:













internal static IEnumerable MyIntersect(this IEnumerable first, IEnumerable second) {
 
Back
Top