How do I load balance parallelism in tasks wherein some tasks are very costly?

I have a list of object that needs to processed. So say that the list is a list of all customers and I need to perform CPU intensive calculation on all of them. Though before and after this calculation I need to fetch and commit data back into a database, so its not purely a CPU task.

so what I do is

Parallel.ForEach(list, action);

Where Action is literally

1 Fetch customer data
2 Process calculate (time and memory intensive task)
3 Commit back customer data

The code used to work good but lately sometimes when multiple customers with very a high number of records are processed we get system out of memory.

So is there a way to load balance this? Most of customers are processed quickly, but it is few that take all the resources away. Can I avoid a few of them running together?

One way I can implement this is by sorting the list based on size and then try and pick first and last item and control parallelism myself but wanting to see what options I have here.


Since you say you have an approximation of the size of the computation before you've actually done it, it simplifies the operation quite a bit. At that point you simply need a synchronization primitive that doesn't limit the number of operations to perform, but instead has some total weight value, and ensures that the sum of all of the currently running operations is less than that specified weight value. You can then request a given operation runs with a given weight value, and it won't actually run until there is enough unused weight value for it.

There is no existing primitive that does exactly that (Semaphore comes very close, but isn't quite there). However you can make one fairly easily out of the existing synchronization primitives.

public class WeightedSemaphore
{
    public WeightedSemaphore(int totalWeight)
    {
        currentWeight = TotalWeight = totalWeight;
    }

    private ManualResetEvent signal = new ManualResetEvent(false);
    private int currentWeight;
    public int TotalWeight { get; }
    public int CurrentWeight { get { lock (signal) return currentWeight; } }

    public void Wait(int weight)
    {
        while (true)
        {
            lock (signal)
            {
                if (currentWeight >= weight)
                {
                    currentWeight -= weight;
                    return;
                }
            }

            signal.Reset();
            signal.WaitOne();
        }
    }
    public void Release(int weight)
    {
        lock (signal)
        {
            currentWeight += weight;
            signal.Set();
        }
    }
}

Now you can go through each of your operations, ensure that before doing their work they wait on it and provide their "size" value. From there it'll just take some experimentation to figure out what total weight your current system can support.

Note that a side effect of this is that you'll find that quicker operations tend to get prioritized sooner. When some space gets freed up, a shorter operation is much more likely to be able to run with what's there, meaning it'll reserve that space before a more expensive operation even gets a shot at running. This is actually a desirable property in many cases, as average response time actually goes down when you prioritize the faster operations over the more expensive ones.

链接地址: http://www.djcxy.com/p/8842.html

上一篇: JavaScript中3D渲染的最佳在线资源是什么?

下一篇: 如何在任务非常昂贵的任务中平衡并行性?