Hi everyone! Today I want to share with you some .Net 5 performance tips with benchmarking!
My system:
I will provide benchmarks results in percentages where 100% is fastest result.
As you probably know, strings are immutable. So whenever you concatenate strings, a new string object is allocated, populated with content, and eventually garbage collected. All of that is expensive and that’s why StringBuilder will always have better performance.
Benchmark example:
private static StringBuilder sb = new();
[Benchmark]
public void Concat3() => ExecuteConcat(3);
[Benchmark]
public void Concat5() => ExecuteConcat(5);
[Benchmark]
public void Concat10() => ExecuteConcat(10);
[Benchmark]
public void Concat100() => ExecuteConcat(100);
[Benchmark]
public void Concat1000() => ExecuteConcat(1000);
[Benchmark]
public void Builder3() => ExecuteBuilder(3);
[Benchmark]
public void Builder5() => ExecuteBuilder(5);
[Benchmark]
public void Builder10() => ExecuteBuilder(10);
[Benchmark]
public void Builder100() => ExecuteBuilder(100);
[Benchmark]
public void Builder1000() => ExecuteBuilder(1000);
public void ExecuteConcat(int size)
{
string s = "";
for (int i = 0; i < size; i++)
{
s += "a";
}
}
public void ExecuteBuilder(int size)
{
sb.Clear();
for (int i = 0; i < size; i++)
{
sb.Append("a");
}
}
Results:
.NET provides a lot of collections like List<T>, Dictionary<T>, and HashSet<T>. All those collections have dynamic size capacity. They automatically expand their size as you add more items.
When the collection reaches its size limit, it will allocate a new larger memory buffer (usually an array double in size). That means an additional allocation and deallocation.
Benchmark example:
[Benchmark]
public void ListDynamicCapacity()
{
List<int> list = new List<int>();
for (int i = 0; i < Size; i++)
{
list.Add(i);
}
}
[Benchmark]
public void ListPlannedCapacity()
{
List<int> list = new List<int>(Size);
for (int i = 0; i < Size; i++)
{
list.Add(i);
}
}
In the first method, the List collection started with default capacity and expanded in size. In the second benchmark the initial capacity is set to the number of items it’s going to have.
For 1000 items the results are:
Benchmarks for Dictionary and HashSet:
Allocation of arrays and the inevitable de-allocation can be quite costly. Performing these allocations in high frequency will cause GC pressure and hurt performance. An elegant solution is the System.Buffers.ArrayPool class found in the Systems.Buffers NuGet.
The idea is pretty similar to to the ThreadPool. A shared buffer for arrays is allocated, which you can reuse without actually allocating and de-allocating memory. The basic usage is by calling ArrayPool<T>.Shared.Rent(size). This returns a regular array, which you can use any way you please. When finished, call ArrayPool<int>.Shared.Return(array) to return the buffer back to the shared pool.
Benchmark example:
[Benchmark]
public void RegularArray()
{
int[] array = new int[ArraySize];
}
[Benchmark]
public void SharedArrayPool()
{
var pool = ArrayPool<int>.Shared;
int[] array = pool.Rent(ArraySize);
pool.Return(array);
}
Result for ArraySize = 1000:
Structs have several benefits when it comes to deallocation:
Decide whether to use struct or not based on guidelines.
Benchmark example:
class VectorClass
{
public int X { get; set; }
public int Y { get; set; }
}
struct VectorStruct
{
public int X { get; set; }
public int Y { get; set; }
}
private const int ITEMS = 10000;
[Benchmark]
public void WithClass()
{
VectorClass[] vectors = new VectorClass[ITEMS];
for (int i = 0; i < ITEMS; i++)
{
vectors[i] = new VectorClass();
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
[Benchmark]
public void WithStruct()
{
VectorStruct[] vectors = new VectorStruct[ITEMS];
// At this point all the vectors instances are already allocated with default values
for (int i = 0; i < ITEMS; i++)
{
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
Results:
The StackAlloc keyword in C# allows for very fast allocation and deallocation of unmanaged memory. That is, classes won’t work, but primitives, structs, and arrays are supported.
Benchmark example:
struct VectorStruct
{
public int X { get; set; }
public int Y { get; set; }
}
[Benchmark]
public void WithNew()
{
VectorStruct[] vectors = new VectorStruct[5];
for (int i = 0; i < 5; i++)
{
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
[Benchmark]
public unsafe void WithStackAlloc() // Note that unsafe context is required
{
VectorStruct* vectors = stackalloc VectorStruct[5];
for (int i = 0; i < 5; i++)
{
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
[Benchmark]
public void WithStackAllocSpan() // When using Span, no need for unsafe context
{
Span<VectorStruct> vectors = stackalloc VectorStruct[5];
for (int i = 0; i < 5; i++)
{
vectors[i].X = 5;
vectors[i].Y = 10;
}
}
Results:
Never use ConcurrentBag<T> without benchmarking. This collection has been designed for very specific use-cases (when most of the time an item is dequeued by the thread that enqueued it) and suffers from important performance issues if used otherwise. If in need of a concurrent collection, prefer ConcurrentQueue<T>.
Benchmark example:
private static int Size = 1000;
[Benchmark]
public void Bag()
{
ConcurrentBag<int> bag = new();
for (int i = 0; i < Size; i++)
{
bag.Add(i);
}
}
[Benchmark]
public void Queue()
{
ConcurrentQueue<int> bag = new();
for (int i = 0; i < Size; i++)
{
bag.Enqueue(i);
}
}
Results:
P.S. Thanks for reading! More benchmarking comming soon!
Special thanks to Michael's Coding Spot and his ideas.