How to code a URL shortener?

I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to " http://www.example.org/abcdef ".

Edit: Due to the ongoing interest in this topic, I've published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :)

Instead of " abcdef " there can be any other string with six characters containing az, AZ and 0-9 . That makes 56~57 billion possible strings.

My approach:

I have a database table with three columns:

  • id, integer, auto-increment
  • long, string, the long URL the user entered
  • short, string, the shortened URL (or just the six characters)
  • I would then insert the long URL into the table. Then I would select the auto-increment value for " id " and build a hash of it. This hash should then be inserted as " short ". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.

    My idea:

    For " http://www.google.de/ " I get the auto-increment id 239472 . Then I do the following steps:

    short = '';
    if divisible by 2, add "a"+the result to short
    if divisible by 3, add "b"+the result to short
    ... until I have divisors for a-z and A-Z.
    

    That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?


    I would continue your "convert number to string" approach. However you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.

    Theoretical background

    You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:

  • There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
  • and for every y you must be able to find an x so that f(x) = y.
  • How to convert the ID to a shortened URL

  • Think of an alphabet we want to use. In your case that's [a-zA-Z0-9] . It contains 62 letters.
  • Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).

    For this example I will use 12510 (125 with a base of 10).

  • Now you have to convert 12510 to X62 (base 62).

    12510 = 2×621 + 1×620 = [2,1]

    This requires use of integer division and modulo. A pseudo-code example:

    digits = []
    
    while num > 0
      remainder = modulo(num, 62)
      digits.push(remainder)
      num = divide(num, 62)
    
    digits = digits.reverse
    

    Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:

    0  → a
    1  → b
    ...
    25 → z
    ...
    52 → 0
    61 → 9
    

    With 2 → c and 1 → b you will receive cb62 as the shortened URL.

    http://shor.ty/cb
    
  • How to resolve a shortened URL to the initial ID

    The reverse is even easier. You just do a reverse lookup in your alphabet.

  • e9a62 will be resolved to "4th, 61st, and 0th letter in alphabet".

    e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810

  • Now find your database-record with WHERE id = 19158 and do the redirect.

  • Some implementations (provided by commenters)

  • Ruby
  • Python
  • CoffeeScript
  • Haskell
  • Perl
  • C#

  • Why would you want to use a hash?
    You can just use a simple translation of your auto-increment value to an alphanumeric value. You can do that easily by using some base conversion. Say you character space (AZ,az,0-9 etc') has 40 characters, convert the id to a base-40 number and use the characters are the digits.


    public class UrlShortener {
        private static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
        private static final int    BASE     = ALPHABET.length();
    
        public static String encode(int num) {
            StringBuilder sb = new StringBuilder();
            while ( num > 0 ) {
                sb.append( ALPHABET.charAt( num % BASE ) );
                num /= BASE;
            }
            return sb.reverse().toString();   
        }
    
        public static int decode(String str) {
            int num = 0;
            for ( int i = 0; i < str.length(); i++ )
                num = num * BASE + ALPHABET.indexOf(str.charAt(i));
            return num;
        }   
    }
    
    链接地址: http://www.djcxy.com/p/12600.html

    上一篇: 用一次乘法提取位

    下一篇: 如何编码URL缩短器?