memory range lookup against +5M record table

I have a database table with +5M static records in place. Simple structure: (starting int, ending int, result int). So i have an certain INT and i need to find it's corresponding result(int). Currently, the look-up table is in DB, but it needs to reside in-memory, most likely in the environment without a database access.

My solution needs to perform this logic without a database access, in memory and super fast as I need to process 1000s of transactions per second. The size of the set is little over 50MB so I could throw the whole thing into the memory and run range look-ups against it, per this post: Doing a range lookup in C# - how to implement. But I don't know how it will perform on such scale.

  • Do I pre-load that table "on startup" ? It might take a while.
  • Any way to load the table into some .dat file and have super efficient look-up at run time?
  • BTW, I am on Azure, not sure if using Storage Tables helps on lookups...


    binary searches are super fast. A binary search on 50M records only takes 27 comparisons to find the answer. Just load it into memory and use the Range lookup you linked.

    If you find it is slow, start optimizing:

  • Change the Range object to struct instead of class
  • Hand-code your own binary search algorithm that (a) implements the equality comparer directly instead of calling out to an IEqualityComparer and (b) uses pointers and other unsafe tricks to disable array bounds checking while doing the search.

  • The range lookup code you link to performs a binary search, so performance will be O(log n) . I don't think you can do better than that for a range lookup. A HashSet<T> 's lookup is O(1), but you cannot use that structure for a range lookup.

    5 million records is not really a huge amount. I suggest you implement a proof of concept with the code you link to on the hardware you will use in production, and measure performance.


    You could certainly put that in a sequential file and load it at startup. 50 MB will come off the disk in less than a second. And even if you had to parse it as a text file, you should be able to create the table in another second. 5 million records just isn't that large when you're processing them with a 2 GHz (or faster) processor.

    Binary search of the list is O(log n), so the maximum number of probes you'll do per search is 24. That's gonna be pretty darned quick.

    It should be easy enough to load test something like this. Just spin it up and then see how long it takes to do, say, 1,000,000 lookups. Something like:

    var clock = Stopwatch.StartNew();
    for (int i = 0; i < NumIterations; ++i)
    {
        int val = GetRandomValueToSearchFor(); // however you do that
        Ranges.BinarySearch(val, RangeComparer);
    }
    clock.Stop();
    // time per iteration is clock.TotalMilliseconds/NumIterations
    

    That'll let you figure out the absolute fastest you can query the thing. I suspect you'll be fine with thousands of transactions per second.

    链接地址: http://www.djcxy.com/p/78892.html

    上一篇: 值类型何时存储在堆栈(C#)中?

    下一篇: 内存范围查找+ 5M记录表