Is there a better implementation for keeping a count for unique integer pairs?

2018-06-25 22:15:36

This is in C++. I need to keep a count for every pair of numbers. The two numbers are of type "int". I sort the two numbers, so (n1 n2) pair is the same as (n2 n1) pair. I'm using the std::unordered_map as the container.

I have been using the elegant pairing function by Matthew Szudzik, Wolfram Research, Inc.. In my implementation, the function gives me a unique number of type "long" (64 bits on my machine) for every pair of two numbers of type "int". I use this long as my key for the unordered_map (std::unordered_map). Is there a better way to keep count of such pairs? By better I mean, faster and if possible with lesser memory usage.

Also, I don't need all the bits of long. Even though you can assume that the two numbers can range up to max value for 32 bits, I anticipate the max possible value of my pairing function to require at most 36 bits. If nothing else, at least is there a way to have just 36 bits as key for the unordered_map? (some other data type)

I thought of using bitset, but I'm not exactly sure if the std::hash will generate a unique key for any given bitset of 36 bits, which can be used as key for unordered_map.

I would greatly appreciate any thoughts, suggestions etc.

First of all I think you came with wrong assumption. For std::unordered_map and std::unordered_set hash does not have to be unique (and it cannot be in principle for data types like std::string for example), there should be low probability that 2 different keys will generate the same hash value. But if there is a collision it would not be end of the world, just access would be slower. I would generate 32bit hash from 2 numbers and if you have an idea of typical values just test for probability of hash collision and choose hash function accordingly.

For that to work you should use pair of 32bit numbers as a key in std::unordered_map and provide a proper hash function. Calculating unique 64bit key and use it with hash map is controversal as hash_map will then calculate another hash of this key, so it is possible you are making it slower.

About 36 bits key, this is not a good idea unless you have a special CPU that handles 36 bit data. Your data either will be aligned on 64bit boundary and you would not have any benefits of saving memory, or you will get penalty of unaligned data access otherwise. In first case you would just have extra code to get 36 bits from 64bit data (if processor supports it). In the second your code will be slower than 32 bit hash even if there are some collisions.

If that hash_map is a bottleneck you may consider different implementation of hash map like goog-sparsehash.sourceforge.net

Just my two cents, the pairing functions that you've got in the article are WAY more complicated than you actually need. Mapping 2 32 bit UNISIGNED values to 64 uniquely is easy. The following does that, and even handles the non-pair states, without hitting the math peripheral too heavily (if at all).

uint64_t map(uint32_t a, uint32_t b)
{
    uint64_t x = a+b;
    uint64_t y = abs((int32_t)(a-b));

    uint64_t ans = (x<<32)|(y);
    return ans;
}

void unwind(uint64_t map, uint32_t* a, uint32_t* b)
{
  uint64_t x = map>>32;
  uint64_t y = map&0xFFFFFFFFL;

  *a = (x+y)>>1;
  *b = (x-*a);
}

Another alternative:

uint64_t map(uint32_t a, uint32_t b)
{
  bool bb = a>b;
    uint64_t x = ((uint64_t)a)<<(32*(bb));
    uint64_t y = ((uint64_t)b)<<(32*!(bb));

    uint64_t ans = x|y;
    return ans;
}

void unwind(uint64_t map, uint32_t* a, uint32_t* b)
{

  *a = map>>32;
  *b = map&0xFFFFFFFF;
}

That works as a unique key. You can easily modify that to be a hash function provider for unordered map, though whether or not that will be faster than std::map is dependent on the number of values you've got.

NOTE: this will fail if the values a+b > 32 bits.

链接地址: http://www.djcxy.com/p/72538.html

上一篇: 如何从C ++的迭代器中调用类成员函数？

下一篇: 是否有更好的实现来保持唯一整数对的计数？