Obscuring database id's

2018-06-15 22:23:39

I have a table with a primary key that is auto increment. I want to have an image associated with the primary key but I don't want the primary key to be revealed. Would naming the images something like:

$filename = md5($primarykey + $secret_string) . '.jpg';

be a good solution?

I am worried that there could be a collision and a file be over written.

The other option of course is to generate a random string, check it doesnt exist as a file and store it in the database... but id prefer not to store additional data if its unnecessary.

The other option is a logical transformation youtube url style eg 1=a 2=b but with a randomised order eg 1=x 2=m... but then there is the chance of it being decoded... plus md5 would probably be lighter than any youtube url function.

I would guess I am dealing with over two million records so what is the likely hood of a collision? Which option would you pick or can you think of a better approach?

There's really two options you have:

Generate something & Verify no collisions

Generate something & Hope for no collisions

You can generally use the following options: - A hash - A randomly generated string - A UUID

Hash If you're choosing a hash, choose something with a low incidence of collisions. Also, when doing a hash consider why you want to obscure DB ids. It won't take long for somebody to figure out your hashes if you're hashing plain numbers, you absolutely need to salt it. The advantages of a salted hash is quick generation and low chance of collisions (in small cases absolutely no need to verify for these, so faster inserts). The downside is that any proper implementation will be SHA256 or better, which means it's long. You can do some hex-conversions if you want to save DB/Index space, that may be more then you want.

Random String This you can generate to any length that suits you, of any character set or numbers a-Z0-9. This also means "more" data in a shorter string that's used in URIs, REQUEST data, etc. The downside is that you have to check if it's in the database.

A UUID Like a hash, fast to generate, fairly low chance of collisions and can be modified to be "less" ugly then pure outputs.

My Suggestion Don't do it. I've had to deal with this before on a very large implementation that grew from being a very small implementation. Eventually you start doing "smart" things like creating totally unique identifiers (eg content type + your identifier) and start seeing some value in it, but then you have to deal with scale. Scaling this is very difficult. DBs are optimized for ids as primary keys, there's a surprising large amount of thought you would need to put into this if you wanted it to scale vertically. If you must, only use it for external client interactions.

Use a linear congruential generator. If you choose the values properly, then you will have a pseudorandom sequence with a very large period. No collisions, but note that this is just an obfuscation method and won't provide any real security (but I assume that is not what you are looking for).

I would guess I am dealing with over two million records so what is the likely hood of a collision?

According to Wikipedia, you'll need more than 2*10^19 records to get a 50% probability to have at least one collision, so I'd say you don't have to worry.

链接地址: http://www.djcxy.com/p/45158.html

上一篇: Git命令显示哪些特定文件被.gitignore忽略

下一篇: 模糊数据库ID