Get the most out of LINQ - We code IT happen

Language-Integrated Query (LINQ) is a powerful tool that allows developers to harness the power of functional programming in C#. However, when used incorrectly, in can result in queries that are needlessly slow and inefficient. To leverage the full potential of LINQ, it is important to keep a few rules in mind. In this article, we will look at the best methods to tune your queries in order to get the fastest possible results.

Language-Integrated Query (LINQ) is the name for a set of technologies based on the integration of query capabilities directly into the C# language. It is used to write queries against strongly typed collections of objects. For a developer who writes queries, the most visible “language-integrated” part of LINQ is the query expression. However, even with the LINQ standard query operators, used to create these queries, it is easy to misuse these operators unless you have a solid understanding of how they work internally.

Such misuse leads to inefficient queries that are slower – in some cases much slower – than the equivalent properly tuned query. This article contains a list of tips that you can use to tune your queries to yield results faster.

1. Avoid projections like ToList() when possible

LINQ offers a deferred execution and lazy evaluation. Deferred execution means that the evaluation of an expression is delayed until its realized value is required. So in some cases that means that iteration over the collection will be done just once.

When you write a method that implements deferred execution, you also must decide whether to implement the method using lazy evaluation or eager evaluation. In lazy evaluation, a single element of the source collection is processed during each call to the iterator. This is the typical way in which iterators are implemented, while in eager evaluation the first call to the iterator will result in the entire collection being processed. A temporary copy of the source collection might also be required. Lazy evaluation usually yields better performance because it distributes overhead processing evenly throughout the evaluation of the collection and minimizes the use of temporary data. Of course, for some operations, there is no other option than to materialize intermediate results.

So, to come back to the tip, these projections force the list values to be evaluated, which is not helpful because that negates the benefits of lazy evaluation and deferred execution and that is why we should avoid that.

NOTE: Please keep in mind that all benchmarks in the below examples were done on the List type object. When working on IQueryable, i.e. when the collection is not materialized, then execution times could be different and more close to each other because the query can be optimized by the provider.

All tests were done on the List of 1 million Person class like here (for some examples it was changed a little bit):

2. Avoid syntax like .Where().First()

Putting .Where() in front of the query forces the provider to fetch all items which meet the conditions. There is no need to fetch all items when we are interested just in the first one. There is a simple way to correct this by replacing the Where() clause with the First() from the end.pp

Benchmarks:

The results:

| WhereFirstPerson | 64.02 ns | 1.329 ns | 1.949 ns |

| FirstPerson | 29.32 ns | 0.623 ns | 0.487 ns |

3. Combine multiple .Where() clauses into a single clause as long as they are done on the same collection

Multiple Where() clauses may be translated in SQL as multiple embedded queries like :
SELECT * FROM (SELECT * FROM PERSONS WHERE LastName = ‘Bond’) WHERE FirstName = ‘James’

The above query can be read like ‘fetch all persons with last name Bond and then, from them take all those with the first name James’. For every Where() clause, it must loop over the set once, which means that the looping process will take place as many times as there are Where() loops on any given collection.

So while refactoring multiple nested if statements together in a Where clause, try putting all of the clauses together:

Now such a query translates into SQL like this:
SELECT * FROM PERSONS WHERE LastName = ‘Bond’ AND FirstName = ‘James’

so: ‘fetch all persons named James Bond’.

Benchmark results:

| WhereTwoCalusesPeople | 7.360 ms | 0.0556 ms | 0.0520 ms |

| WhereOneClausePeople | 5.012 ms | 0.0989 ms | 0.1178 ms |

4. Understand the difference between Single()/SingleOrDefault() and First()/FirstOrDefault()

Let’s consider the following:

Translated into raw SQL it is:

FirstOrDefault() : SELECT TOP 1 FROM …
SingleOrDefault() : SELECT TOP 2 FROM …

What does this mean? Single()/SingleOrDefault() ensures that the item is the only match in the set, whereas the First()/FirstOrDefault() returns just the first match immediately after it was found. In the first case, it has to iterate over all elements until the second match will be hit which is both costly and usually unneeded, unless validating user input or such.

In most cases, you are not interested in ensuring that there is just one match – or often it is logically impossible. See the examples above – if your database has two Person entries with same primary key (ID), you have far bigger problems than using LINQ badly.

A special case is Single() without predicate, which should be used when logically or semantically there should be only a single element in the set.

When the item which we are looking for is in the middle of the collection, the FirstOrDefault() will be twice as fast as the SingleOrDefault() on the same collection, because it does not have to go over the rest of the entries to make sure that that the item is the only match.

| FirstOrDefault | 6.399 ms | 0.0891 ms | 0.0834 ms |

| SingleOrDefault | 12.043 ms | 0.0623 ms | 0.0583 ms |

5. Count vs Count() vs Any()/All()

a. Count() vs Any()/All()

Count() will do the full iteration of the matched items in the set. If you just want to check if there is a matching item in the set, use Any(). It returns true when there is at least one matching item, with an empty Any() just returning true when there are items in your set.

All() is the opposite of Any() – it will return false when encountering the first item which met the condition, without going over all of the set.

The exceptional case is when Count() and Any()/All() methods are an empty-predicate.

b. Count vs Any()

The next tip is to use the public property Count in the Boolean expression list Object.Count > 0 instead of Any().

The Count property is updated whenever a new entry gets added to the collection. Calling the Count property is a constant-time operation, therefore, using Count > 0 to determine whether the collection contains any elements is faster than calling Any(), which has to loop through the collection and update a local variable with the count, as long as the collection remains iterable. Thus, Any() is slower.

For any container that supports a public Count property, prefer .Count == 0 over Any().

c. Count() vs Count

Do not use Enumerable.Count() method for arrays or any collections that implement ICollection/ICollection interfaces, such as List and Dictionary. Use Length for arrays and the Count property for ICollection implementations.

Depending on the platform, there are optimizations in place preventing the O(n) operation of the Count() method, just returning the existing Count value.

Conclusion:

If you are starting with something that has a .Length or .Count (such as ICollection, IList, List, etc) – then this will be the fastest option since it doesn’t need to go through the GetEnumerator()/MoveNext()/Dispose() sequence required by Any() to check for a non-empty IEnumerable sequence.

For just IEnumerable, then Any() will generally be quicker, as it only has to look at one iteration. However, note that the LINQ-to-Objects implementation of Count() does check for ICollection (using .Count as an optimization) – so if your underlying data source is directly a list/collection, there won’t be a huge difference.

Of course, if you have used LINQ to filter it etc (Where etc), you will have an iterator-block-based sequence, and so this ICollection optimization is useless.

In general, with IEnumerable: stick with Any().

Indexing over an array:
Integer indexing over an array or an IList implementation is the fastest performance you can get in terms of indexing over a generic collection in .NET. In contrast, Last() loops through the entire collection as long as it remains iterable. However, whenever a new element is added to a collection, the Count property gets updated. Thus, Count –1 gets evaluated in constant time, regardless of the size of the collection. Therefore, retrieving the last element using [Count – 1] is usually faster than using Last(). The exception is when you are using the overload of Last() that takes a predicate. In that case, the execution time is essentially the same.

6. Use the TrueForAll() method available on the IList implementation instead of All()

For long-running operations and cross-collection membership checks, All<T>() offers worse performance than TrueForAll. TrueForAll() uses a for loop, whereas All<T>() uses a for each loop.

This time the list looks like this:

Benchmarks:

Results:

| All | 48.173 ns | 0.9841 ns | 2.0541 ns |

| TrueForAll | 6.435 ns | 0.0253 ns | 0.0224 ns |

Piotr Pietrzak

Senior Commissioning Engineer

Sources :

https://learn.microsoft.com/en-us/dotnet/csharp/linq/?source=recommendations
Sudipta Mukherjee – Thinking in LINQ: Harnessing the Power of Functional Programming in .NET Applications