IEnumerable Isn’t What You Think It Is—And It’s Breaking Your Code

TLDR: Let's review in detail the most common mistake in relation to IEnumerable - repeated enumeration - but this time we will go a bit deeper and review why the repeated enumeration is a mistake and what potentials problems it may cause, including hard-to-catch and reproduce bugs.

What IEnumerable is

First things first - let's review (yet again, as there are quite many articles on this) what IEnumerable, both generic and non-generic, is. Many developers, as many interviews and code reviews show, unwittingly view instances of IEnumerable as collections, and this is where we will start.

When we look at the interface definition of IEnumerable, here is what we see:

public interface IEnumerable<out T> : IEnumerable {
    IEnumerator<T> GetEnumerator();
}

We will not go into the details of enumerators and so on; it is enough to state one very important thing: IEnumerable is not a collection. Most collection types do implement IEnumerable, but that does not turn all IEnumerable implementations into collections. Surprisingly, this is what many developers miss when they implement code consuming or producing IEnumerable, and that is what has a great potential for problems.

So, what IEnumerable is? There are many different implementations for IEnumerable, but for the sake of simplicity we can summarize them into one (rather vague) definition: it is a piece of code that produces elements on iteration. For in-memory collections this code would simply read the current element from the underlying collection and move its internal pointer to the next element, if it exists. For more sophisticated cases the logic may be very varied, and may have any kind of side effects that can also include modifying the shared state, or depend on the shared state.

Now we have a slightly better picture of what IEnumerable is, and that hints us to implement the consuming code in a way that should not make any assumptions on these points:

the cost required to produce items - that is, if an item was retrieved from some sort of storage (reused) or it was created;
the same item can be produced ever again on subsequent iterations;
any potential side effects that could affect (or not) subsequent iterations.

As we can see, this is almost the opposite to general conventions when iterating over in-memory collections, for example:

a collection cannot be modified during an iteration - if a collection is modified, this will cause an exception when moving to the next element in the collection;
iterating over the same collection (containing the same elements) will always produce the same results and will always have the same costs.

A safe way to look at IEnumerable is to perceive it as an 'on-demand data producer'. The only guarantee this data producer gives is that it will either procure another item or signal that there are no more items available when it gets called. Everything else is implementation details of a particular data producer. By the way, here we described the contract of the IEnumerator interface that allows to iterate over an IEnumerable instance.

Another important piece of the on-demand data producer is that it produces one item per iteration, and the consuming code may decide if it wants to exhaust whatever the producer is capable of producing or stop the consumption earlier. Since the on-demand data producer have not even tried to work on any potential 'future' items, this allows to save resources when the consumption finishes prematurely.

So, when implementing IEnumerable producers, we should never make any assumptions on the consumption patterns. The consumers may initiate and stop consumption at any point.

Potential effects of repeated iterations.

Now, since we defined the proper way to consume IEnumerable, let's review a few examples of repeated iterations and their potential impact.

Before we go to negative examples, it is worth mentioning that when IEnumerable impersonates an in-memory collection - array, list, hashset, etc. - there is no harm in repeated iterations per se. The code that consumes IEnumerable over in-memory collections in most cases would run (almost) as efficiently as the code consuming matching collection types. Of course, there may be differences in certain cases, though not necessarily negative, as Linq has seen many major performance boosts that would allow, for example, to use vectorized CPU instructions for in-memory collections or compact multiple interface method calls into one for complex Linq expressions. Please read these articles for more details: https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-8/#linq and https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-9/#linq

However, from the code quality point of view, having multiple iterations over IEnumerable is considered a bad practice, as we can never be certain what concrete implementation would arrive under the hood.

A side note: since IEnumerable is an interface, using it instead of concrete types forces the compiler to emit virtual method calls (the 'callvirt' IL instruction), even when a concrete underlying class implements this method as non-virtual, thus a non-virtual method call would suffice. Virtual method calls are more expensive, as they always need to go through the instance method table to resolve the method address; also, they prevent potential method inlining. While this may be considered as a micro-optimization, there are quite many code paths that would show different performance metrics if concrete types were used instead of interfaces.

When a repeated iteration is really a poor choice.

A small disclaimer: this example is based on a real-life code piece which has been anonymized and has all real implementation details abstracted away.

This piece of code was retrieving data from a remote endpoint for the incoming parameter list.

async Task<IEnumerable<IData>> RetrieveAndProcessDataAsync(IList<int> ids, CancellationToken ct) {
    var retrievalTasks = ids.Select(id => externalService.QueryForDataAsync(id, ct));
    await Task.WhenAll(retrievalTasks);
    return retrievalTasks.Select(t => t.Result);
}

What may go wrong here? Let's review the simplest example:

var results = await RetrieveAndProcessDataAsync(ids, cancellationToken);
var output = results.ToArray();

Many developers would consider this code safe - because it prevents repeated iterations by materializing the method output into an in-memory collection. But is it?

Before going into the details let's do a test run. We can take a very simple 'externalService' implementation for testing:

record Data(int Value);

class Service {
    private static int counter = 0;

    public async Task<IData> QueryForDataAsync(int id, CancellationToken ct) {
        var timestamp = Stopwatch.GetTimestamp();
        await Task.Delay(TimeSpan.FromMilliseconds(30), ct);
        int cv = Interlocked.Increment(ref counter);
        Console.WriteLine($"QueryForData - id={id} - {cv}; took {Stopwatch.GetElapsedTime(timestamp).TotalMilliseconds:F0} ms");

        return new Data(id);
    }
}

Then we can run the test:

var externalService = new Service();
var results = (await RetrieveAndProcessDataAsync([1, 2, 3], CancellationToken.None)).ToList();
Console.WriteLine("Querying completed");
int count = results.Count();
if (count == 0) {
    Console.WriteLine("No results");
} else {
    var array = results.ToArray();
    Console.WriteLine($"Retrieved {array.Length} elements");
}

Console.WriteLine($"Getting the count again: {results.Count()}");

And get the output:

QueryForData - id=3 - 1; took 41 ms
QueryForData - id=1 - 3; took 43 ms
QueryForData - id=2 - 2; took 42 ms
QueryForData - id=1 - 4; took 33 ms
QueryForData - id=2 - 5; took 30 ms
QueryForData - id=3 - 6; took 31 ms
Querying completed
Retrieved 3 elements
Getting the count again: 3

Something is off here, right? We would have expected to get the 'QueryForData' output only 3 times, since we have only 3 ids in the input argument. However, the output clearly shows that the number of executions doubled even before the ToList() call completed.

To understand the why, let's look at the RetrieveAndProcessDataAsync method:

1: var retrievalTasks = ids.Select(id => externalService.QueryForDataAsync(id, ct));
2: await Task.WhenAll(retrievalTasks);
3: return retrievalTasks.Select(t => t.Result);

And let's have a look at this call:

(await RetrieveAndProcessDataAsync([1, 2, 3], CancellationToken.None)).ToList();

When the RetrieveAndProcessDataAsync method is called, the following things happen.

On line 1 we get an IEnumerable<Task<Data>> instance - in our case, it would be 3 tasks, since we submit an input array with 3 elements. Each task gets queued by the thread pool for execution, and as soon as there is a thread available, it starts. The exact point of completion for these tasks is undetermined due to the thread pool scheduling specifics and the concrete hardware this code would run on.

On line 2 the Task.WhenAll call makes sure all tasks from the IEnumerable<Task<Data>> instance have arrived at completion; essentially, at this point we get the first 3 outputs from the QueryForDataAsync method. When line 2 completes, we can be sure that all 3 tasks have completed as well.

However, line 3 is where all the devils laid ambush. Let's unearth them out.

The 'retrievalTasks' variable (on line 1) is an IEnumerable<Task<Data>> instance. Now, let's take a step back and remember that IEnumerable is nothing else but a producer - a piece of code that produces (creates or reuses) instances of a given type. In this case the 'retrievalTasks' variable is a piece of code that would:

go over the 'ids' collection;
for each element of this collection, it would call externalService.QueryForDataAsync method;
return a Task instance produced by the previous call.

We can express all this logic behind our IEnumerable<Task<Data>> instance slightly differently. Please do note that while this code piece looks quite distinct from the original ids.Select(id => externalService.QueryForDataAsync(id, ct)) expression, it does exactly the same.

IEnumerable<Task<Data>> DataProducer(IList<int> ids, CancellationToken ct) {
    foreach (int id in ids) {
        var task = externalService.QueryForData(id, ct);
        yield return task;
    }
}

So, we can treat the 'retrievalTasks' variable as a function call with a constant predefined set of inputs. This function would be called each time we resolve the variable value. We can rewrite the RetrieveAndProcessDataAsync method in a way that would fully reflect this idea, and that would work absolutely equally to the initial implementation:

async Task<IEnumerable<Data>> RetrieveAndProcessDataAsync(IList<int> ids, CancellationToken ct) {
    var retrievalFunc = () => DataProducer(ids, ct);
    await Task.WhenAll(retrievalFunc());
    return retrievalFunc().Select(t => t.Result);
}

Now we can see very clearly why our test code output was doubled: the 'retrievalFunc' function gets called twice... If our consuming code keeps going over the same IEnumerable instance, it would equal the repeated calls to a 'DataProducer' method, which would run its logic over and over again for each re-iteration.

I hope now the logic behind repeated iterations of IEnumerable is clear.

Further potential implications of repeated iterations.

There is still one thing to mention about this code sample, though.

Let's look at the rewritten implementation again:

IEnumerable<Task<Data>> DataProducer(IList<int> ids, CancellationToken ct) {
    foreach (int id in ids) {
        var task = externalService.QueryForData(id, ct);
        yield return task;
    }
}

async Task<IEnumerable<Data>> RetrieveAndProcessDataAsync(IList<int> ids, CancellationToken ct) {
    var retrievalFunc = () => DataProducer(ids, ct);
    await Task.WhenAll(retrievalFunc());            // First producer call.
    return retrievalFunc().Select(t => t.Result);   // Second producer call.
}

The producer in this case creates new task instances every time, and we call it twice. This leads to a rather peculiar and not so obvious fact that when we call Task.WhenAll and .Select(t => t.Result) the task instances these two code pieces operate on are different. The tasks that had been awaited on (and thus arrived to completion) are not the same tasks that the method returns the results from.

So, here the producer creates two different sets of tasks. The first set of tasks is awaited asynchronously - the Task.WhenAll call - but the second set of tasks is not awaited. Instead, the code calls directly to the Result property getter which is effectively the infamous sync-over-async anti-pattern. I would not go into the details of this anti-pattern, as this is a large subject. This article by Stephen Toub sheds quite a bit of light on it: https://devblogs.microsoft.com/pfxteam/should-i-expose-synchronous-wrappers-for-asynchronous-methods/

However, just for the sake of completeness, here are some potential issues this code may cause:

deadlocks, when used in the desktop (WinForms, WPF, MAUI) or the .Net Fx ASP.NET applications;
thread pool starvation when under higher loads.

If we abstract from the current code sample that was producing these simple tasks, we face a fact that the repeated iterations may easily cause multiple executions for any operation, and it may not be idempotent (that is, subsequent calls with the same inputs are bound to produce different results or even simply fail). For example, account balance changes.

Even if those operations were idempotent, they may have high computation costs, and thus their repeated execution would simply burn our resources in vain. And if we speak about code running in the cloud, these resources may have a cost which we would have to pay for.

Again, because repeated iterations over IEnumerable instances are quite easy to miss, it may be very hard to find out why an application crashes, spends a lot of resources (including money), or does things it is not supposed to be doing.

Spicing things up just a little.

Let's take the original test code and change it slightly:

var externalService = new Service();
var cts = new CancellationTokenSource(); // New line.
var results = (await RetrieveAndProcessDataAsync([1, 2, 3], cts.Token)); // Using cts.Token instead of a default token, and not materializing the IEnumerable.
Console.WriteLine("Querying completed");
int count = results.Count();
if (count == 0) {
    Console.WriteLine("No results");
} else {
    var array = results.ToArray();
    Console.WriteLine($"Retrieved {array.Length} elements");
}

cts.Cancel(); // New line.
Console.WriteLine($"Getting the count again: {results.Count()}");

I will leave it to the reader to try and run this code. It will be a good demonstration of potential side effects the repeated iterations may unexpectedly encounter.

How to fix this code?

Let's have a look:

async Task<IEnumerable<IData>> RetrieveAndProcessDataAsync(IList<int> ids, CancellationToken ct) {
    var retrievalTasks = ids.Select(id => externalService.QueryForDataAsync(id, ct)).ToArray(); // Adding .ToArray() call.
    await Task.WhenAll(retrievalTasks);
    return retrievalTasks.Select(t => t.Result);
}

By adding a single .ToArray() call to the initial IEnumerable<Task<Data>> we would 'materialize' the IEnumerable instance into an in-memory collection, and any subsequent re-iterations over the in-memory collection do exactly what we would suppose - simply read the data from memory without any unexpected side effects caused by repeated code executions.

Essentially, when developers write such code (as in the initial code sample), they normally presume that this data is 'set in stone', and nothing unexpected would ever happen when it is accessed. Though, as we have just seen, this is rather far from the truth.

We could improve the method further, but we will leave this for the next chapter.

On producing an IEnumerable.

We just looked at the issues that may arise from the use of IEnumerable when it is based on misconceptions - when it does not take into account that neither of these assumptions should be made when consuming IEnumerable:

the cost required to produce items - that is, if an item was retrieved from some sort of storage (reused) or it was created;
if the same item can be produced ever again on subsequent iterations;
any potential side effects that could affect (or not) subsequent iterations.

Now, let's have a look at the promise IEnumerable producers should (ideally) keep for their consumers:

items are produced 'on-demand' - no effort is supposed to be done 'in advance';
consumers are free to stop iteration at any moment, and this should save the resources that would be required if the consumption continued;
if iteration (consumption) has not started, no resources should be used.

Again, let's review our previous code sample from this standpoint.

async Task<IEnumerable<IData>> RetrieveAndProcessDataAsync(IList<int> ids, CancellationToken ct) {
    var retrievalTasks = ids.Select(id => externalService.QueryForDataAsync(id, ct)).ToArray();
    await Task.WhenAll(retrievalTasks);
    return retrievalTasks.Select(t => t.Result);
}

Essentially, this code does not fulfill these promises, as all the hard lifting is done on the first two lines, before it starts producing the IEnumerable. So, if any consumer would decide to stop consumption earlier, or even would not start it at all, the QueryForDataAsync method would still be called for all inputs.

Considering the behavior of the first two lines, it would be much better to rewrite the method to produce an in-memory collection, such as:

async Task<IList<IData>> RetrieveAndProcessDataAsync(IList<int> ids, CancellationToken ct) {
    var retrievalTasks = ids.Select(id => externalService.QueryForDataAsync(id, ct)).ToArray();
    await Task.WhenAll(retrievalTasks);
    return retrievalTasks.Select(t => t.Result).ToArray();
}

This implementation does not provide any 'on-demand' guarantees - on the contrary, it is very clear that all the work required to process the given input would be completed, and the matching results would be returned.

However, if we do need the 'on-demand data producer' behavior, the method would have to be rewritten completely to provide it. For example:

async IAsyncEnumerable<Data> RetrieveAndProcessDataAsAsyncEnumerable(IList<int> ids, [EnumeratorCancellation] CancellationToken ct) {
    foreach (int id in ids) {
        var result = await externalService.QueryForData(id, ct);
        yield return result;
    }
}

While developers usually do not think about these contract specifics of IEnumerable, other code consuming it would often make assumptions matching these specifics. So, when the code producing IEnumerable matches those specifics, the whole application would work better.

Conclusion.

I hope this article helped the reader to see the difference between a collection contract and the IEnumerable contract specifics. Collections generally provide some storage for their items (typically, in memory) and ways to go over the stored items; non-readonly collections also extend this contract by allowing to modify/add/remove the stored items. While collections are very consistent about the stored items, the IEnumerable essentially declares very high volatility in this regard since the items are produced when an IEnumerable instance is iterated over.

So, what would be the best practices when coming to IEnumerable? Let's just give the point list:

Always avoid repeated iterations - unless this is what you really intend and understand the consequences. It is safe to chain multiple Linq extension methods to an IEnumerable instance (such as .Where and .Select) but any other call that would cause an actual iteration is the thing to avoid. If the processing logic requires multiple passes over an IEnumerable, either materialize it into an in-memory collection or review if the logic can be changed to a single pass on the per-item basis.
When producing an IEnumerable involves async code, consider changing it to IAsyncEnumerable or replace IEnumerable with a 'materialized' representation - for example, when you would prefer to take advantage of the parallel execution, and return the results after all tasks have completed.
Code producing IEnumerable should be built in a way that would allow avoiding spending resources if the iteration would stop earlier or would not begin at all.
Do not use IEnumerable for data types unless you need its specifics. If your code needs some degree of 'generalization', prefer other collection type interfaces that do not imply the 'on-demand data producer' behavior, such as IList or IReadOnlyCollection.

IEnumerable Isn’t What You Think It Is—And It’s Breaking Your Code

Too Long; Didn't Read

What IEnumerable is

Potential effects of repeated iterations.

When a repeated iteration is really a poor choice.

Further potential implications of repeated iterations.

Spicing things up just a little.

How to fix this code?

On producing an IEnumerable.

Conclusion.

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

IEnumerable Isn’t What You Think It Is—And It’s Breaking Your Code

Too Long; Didn't Read

What IEnumerable is

Potential effects of repeated iterations.

When a repeated iteration is really a poor choice.

Further potential implications of repeated iterations.

Spicing things up just a little.

How to fix this code?

On producing an IEnumerable.

Conclusion.

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics