C# Object Pooling Pattern Implementation

C# Object Pooling Pattern implementation


Object Pooling in .NET Core

The dotnet core has an implementation of object pooling added to the base class library (BCL). You can read the original GitHub issue here and view the code for System.Buffers. Currently the ArrayPool is the only type available and is used to pool arrays. There is a nice blog post here.

namespace System.Buffers
{
public abstract class ArrayPool<T>
{
public static ArrayPool<T> Shared { get; internal set; }

public static ArrayPool<T> Create(int maxBufferSize = <number>, int numberOfBuffers = <number>);

public T[] Rent(int size);

public T[] Enlarge(T[] buffer, int newSize, bool clearBuffer = false);

public void Return(T[] buffer, bool clearBuffer = false);
}
}

An example of its usage can be seen in ASP.NET Core. Because it is in the dotnet core BCL, ASP.NET Core can share it's object pool with other objects such as Newtonsoft.Json's JSON serializer. You can read this blog post for more information on how Newtonsoft.Json is doing this.

Object Pooling in Microsoft Roslyn C# Compiler

The new Microsoft Roslyn C# compiler contains the ObjectPool type, which is used to pool frequently used objects which would normally get new'ed up and garbage collected very often. This reduces the amount and size of garbage collection operations which have to happen. There are a few different sub-implementations all using ObjectPool (See: Why are there so many implementations of Object Pooling in Roslyn?).

1 - SharedPools - Stores a pool of 20 objects or 100 if the BigDefault is used.

// Example 1 - In a using statement, so the object gets freed at the end.
using (PooledObject<Foo> pooledObject = SharedPools.Default<List<Foo>>().GetPooledObject())
{
// Do something with pooledObject.Object
}

// Example 2 - No using statement so you need to be sure no exceptions are not thrown.
List<Foo> list = SharedPools.Default<List<Foo>>().AllocateAndClear();
// Do something with list
SharedPools.Default<List<Foo>>().Free(list);

// Example 3 - I have also seen this variation of the above pattern, which ends up the same as Example 1, except Example 1 seems to create a new instance of the IDisposable [PooledObject<T>][4] object. This is probably the preferred option if you want fewer GC's.
List<Foo> list = SharedPools.Default<List<Foo>>().AllocateAndClear();
try
{
// Do something with list
}
finally
{
SharedPools.Default<List<Foo>>().Free(list);
}

2 - ListPool and StringBuilderPool - Not strictly separate implementations but wrappers around the SharedPools implementation shown above specifically for List and StringBuilder's. So this re-uses the pool of objects stored in SharedPools.

// Example 1 - No using statement so you need to be sure no exceptions are thrown.
StringBuilder stringBuilder= StringBuilderPool.Allocate();
// Do something with stringBuilder
StringBuilderPool.Free(stringBuilder);

// Example 2 - Safer version of Example 1.
StringBuilder stringBuilder= StringBuilderPool.Allocate();
try
{
// Do something with stringBuilder
}
finally
{
StringBuilderPool.Free(stringBuilder);
}

3 - PooledDictionary and PooledHashSet - These use ObjectPool directly and have a totally separate pool of objects. Stores a pool of 128 objects.

// Example 1
PooledHashSet<Foo> hashSet = PooledHashSet<Foo>.GetInstance()
// Do something with hashSet.
hashSet.Free();

// Example 2 - Safer version of Example 1.
PooledHashSet<Foo> hashSet = PooledHashSet<Foo>.GetInstance()
try
{
// Do something with hashSet.
}
finally
{
hashSet.Free();
}

Microsoft.IO.RecyclableMemoryStream

This library provides pooling for MemoryStream objects. It's a drop-in replacement for System.IO.MemoryStream. It has exactly the same semantics. It was designed by Bing engineers. Read the blog post here or see the code on GitHub.

var sourceBuffer = new byte[]{0,1,2,3,4,5,6,7}; 
var manager = new RecyclableMemoryStreamManager();
using (var stream = manager.GetStream())
{
stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}

Note that RecyclableMemoryStreamManager should be declared once and it will live for the entire process–this is the pool. It is perfectly fine to use multiple pools if you desire.

Why are there so many implementations of Object Pooling in Roslyn?

I'm the lead for the Roslyn performance v-team. All object pools are designed to reduce the allocation rate and, therefore, the frequency of garbage collections. This comes at the expense of adding long-lived (gen 2) objects. This helps compiler throughput slightly but the major effect is on Visual Studio responsiveness when using the VB or C# IntelliSense.

why there are so many implementations".

There's no quick answer, but I can think of three reasons:

  1. Each implementation serves a slightly different purpose and they are tuned for that purpose.
  2. "Layering" - All the pools are internal and internal details from the Compiler layer may not be referenced from the Workspace layer or vice versa. We do have some code sharing via linked files, but we try to keep it to a minimum.
  3. No great effort has gone into unifying the implementations you see today.

what the preferred implementation is

ObjectPool<T> is the preferred implementation and what the majority of code uses. Note that ObjectPool<T> is used by ArrayBuilder<T>.GetInstance() and that's probably the largest user of pooled objects in Roslyn. Because ObjectPool<T> is so heavily used, this is one of the cases where we duplicated code across the layers via linked files. ObjectPool<T> is tuned for maximum throughput.

At the workspace layer, you'll see that SharedPool<T> tries to share pooled instances across disjoint components to reduce overall memory usage. We were trying to avoid having each component create its own pool dedicated to a specific purpose and, instead share based on the type of element. A good example of this is the StringBuilderPool.

why they picked a pool size of 20, 100 or 128.

Usually, this is the result of profiling and instrumentation under typical workloads. We usually have to strike a balance between allocation rate ("misses" in the pool) and the total live bytes in the pool. The two factors at play are:

  1. The maximum degree of parallelism (concurrent threads accessing the pool)
  2. The access pattern including overlapped allocations and nested allocations.

In the grand scheme of things, the memory held by objects in the pool is very small compared to the total live memory (size of the Gen 2 heap) for a compilation but, we do also take care not to return giant objects (typically large collections) back to the pool - we'll just drop them on the floor with a call to ForgetTrackedObject

For the future, I think one area we can improve is to have pools of byte arrays (buffers) with constrained lengths. This will help, in particular, the MemoryStream implementation in the emit phase (PEWriter) of the compiler. These MemoryStreams require contiguous byte arrays for fast writing but they are dynamically sized. That means they occasionally need to resize - usually doubling in size each time. Each resize is a new allocation, but it would be nice to be able to grab a resized buffer from a dedicated pool and return the smaller buffer back to a different pool. So, for example, you would have a pool for 64-byte buffers, another for 128-byte buffers and so on. The total pool memory would be constrained, but you avoid "churning" the GC heap as buffers grow.

Thanks again for the question.

Paul Harrington.

Object Pooling in c# for dll referenced by asp.net, .Net 4.5.1

ConcurrentBag is the optimal structure for a object pool.

You are misinterpreting the advice of "this works best when a single thread is adding/removing items from the pool". A better way to say it would be "A ConcurrentBag works best when the thread removing from the pool has a high chance of being the same thread that put the object in the pool" which is exactly what your object pool will be doing.

The way ConcurrentBag works is each thread has a thread local collection of objects. When you add to the ConcurrentBag it inserts to that thread local collection, when you remove from the ConcurrentBag it first tries to remove from the thread local collection but if it is empty it goes to another thread and removes it from the other thread's collection.

So the reason it is recommend the same thread add as remove is so you don't tie up two lists with locks instead of a single one.

You can even use a single thread to populate the pool then as workers take items out they will steal from the initialization thread's pool but then return it to their own pool pulling from their own pool from then on.

Generic object pool

You can use the Activator.CreateInstance Method

public static object CreateInstance(
Type type,
params object[] args
)

Like this

return (T)Activator.CreateInstance(typeof(T), id);

There is, however, no way to specify that a type must provide a constructor with an argument; neither in an interface declaration, nor in the generic type constraints, nor with class inheritance.

ObjectPool T or similar for .NET already in a library?

UPDATE:

I'd also put forward BufferBlock<T> from TPL DataFlow. IIRC it's part of .net now. The great thing about BufferBlock<T> is that you can wait asynchronously for items to become available using the Post<T> and ReceiveAsync<T> extension methods. Pretty handy in an async/await world.

ORIGINAL ANSWER

A while back I faced this problem and came up with a lightweight (rough'n'ready) threadsafe (I hope) pool that has proved very useful, reusable and robust:

    public class Pool<T> where T : class
{
private readonly Queue<AsyncResult<T>> asyncQueue = new Queue<AsyncResult<T>>();
private readonly Func<T> createFunction;
private readonly HashSet<T> pool;
private readonly Action<T> resetFunction;

public Pool(Func<T> createFunction, Action<T> resetFunction, int poolCapacity)
{
this.createFunction = createFunction;
this.resetFunction = resetFunction;
pool = new HashSet<T>();
CreatePoolItems(poolCapacity);
}

public Pool(Func<T> createFunction, int poolCapacity) : this(createFunction, null, poolCapacity)
{
}

public int Count
{
get
{
return pool.Count;
}
}

private void CreatePoolItems(int numItems)
{
for (var i = 0; i < numItems; i++)
{
var item = createFunction();
pool.Add(item);
}
}

public void Push(T item)
{
if (item == null)
{
Console.WriteLine("Push-ing null item. ERROR");
throw new ArgumentNullException();
}
if (resetFunction != null)
{
resetFunction(item);
}
lock (asyncQueue)
{
if (asyncQueue.Count > 0)
{
var result = asyncQueue.Dequeue();
result.SetAsCompletedAsync(item);
return;
}
}
lock (pool)
{
pool.Add(item);
}
}

public T Pop()
{
T item;
lock (pool)
{
if (pool.Count == 0)
{
return null;
}
item = pool.First();
pool.Remove(item);
}
return item;
}

public IAsyncResult BeginPop(AsyncCallback callback)
{
var result = new AsyncResult<T>();
result.AsyncCallback = callback;
lock (pool)
{
if (pool.Count == 0)
{
lock (asyncQueue)
{
asyncQueue.Enqueue(result);
return result;
}
}
var poppedItem = pool.First();
pool.Remove(poppedItem);
result.SetAsCompleted(poppedItem);
return result;
}
}

public T EndPop(IAsyncResult asyncResult)
{
var result = (AsyncResult<T>) asyncResult;
return result.EndInvoke();
}
}

In order to avoid any interface requirements of the pooled objects, both the creation and resetting of the objects is performed by user supplied delegates: i.e.

Pool<MemoryStream> msPool = new Pool<MemoryStream>(() => new MemoryStream(2048), pms => {
pms.Position = 0;
pms.SetLength(0);
}, 500);

In the case that the pool is empty, the BeginPop/EndPop pair provide an APM (ish) means of retrieving the object asynchronously when one becomes available (using Jeff Richter's excellent AsyncResult<TResult> implementation).

I can't quite remember why it is constained to T : class... there's probably none.



Related Topics



Leave a reply



Submit