PHP Short Hash Like Url-Shortening Websites

PHP short hash like URL-shortening websites

URL shortening services rather use a auto incremented integer value (like a supplementary database ID) and encode that with Base64 or other encodings to have more information per character (64 instead of just 10 like digits).

Designing a Good Hash Function for Set Length URL Shortening in PHP

If I were you, I would make a case sensitive alphanumeric increment-er. Just increment, and assign the number to a database row. To check for bad words, just check against a black list. If it passes, great. If not, just increment again.

This way, instead of a hash algorithm, they're just in order. The first few would look like this:

id   | url
-------------------------
0000 | http://google.com
0001 | http://yahoo.com
0002 | http://example.com
...
000a | http://mail.google.com
000b | http://adobe.com
...
000A | http://microsof.com
...
0010 | http://w3.org
...
00a0 | http://youtube.com
...
00A0 | http://stackoverflow.com

And so on.

Here's a hint on how the function will work:
http://us3.php.net/manual/en/function.ord.php

BTW, my math might be wrong, but I think it's (10 + 26 + 26) ^ 4 = 14776336

Edit: Just for the fun and the challenge, I wrote an incrementer function. When the max is reached, it returns false, so just compare it to false (with ===) when using it.

http://pastebin.com/957KPn4p

Shorter hash strings without compromising security


  1. Generate a random N-character string.
  2. See if anything else already has that string as its shorturl in the database.
  3. If yes, go to 1. If no, store that string as the shorturl for the resource in the database.

There's no need to use hashing for url shorteners when you have a persistent datastore, because you're not actually encoding the long url, you're just associating a token with it.

Designing a URL Shortening service like TinyURL

The problem is that you are using the output of MD5 as a string of hexadecimal digits, and then base64 encoding that string. There's no reason to base64 encode that string - base64 encoding is intended for binary data. What you probably wanted to do is base64 the actual 128-bit binary value of the MD5 hash. Here is some Python code that does what I think you are trying to do:

import hashlib, base64

text = "www.yahoo.com"
text_utf8 = text.encode('utf8')
md5 = hashlib.md5(text_utf8).digest()
b64 = base64.b64encode(md5)
print(b64)

This gets the result GwNXftEE8WqtwApjnTPLRA which has the length you were expecting.

How do I create a URL shortener?

I would continue your "convert number to string" approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.

Theoretical background

You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:

  • There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
  • and for every y you must be able to find an x so that f(x) = y.

How to convert the ID to a shortened URL

  1. Think of an alphabet we want to use. In your case, that's [a-zA-Z0-9]. It contains 62 letters.
  2. Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).

    For this example, I will use 12510 (125 with a base of 10).

  3. Now you have to convert 12510 to X62 (base 62).

    12510 = 2×621 + 1×620 = [2,1]

    This requires the use of integer division and modulo. A pseudo-code example:

    digits = []

    while num > 0
    remainder = modulo(num, 62)
    digits.push(remainder)
    num = divide(num, 62)

    digits = digits.reverse

    Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:

    0  → a
    1 → b
    ...
    25 → z
    ...
    52 → 0
    61 → 9

    With 2 → c and 1 → b, you will receive cb62 as the shortened URL.

    http://shor.ty/cb

How to resolve a shortened URL to the initial ID

The reverse is even easier. You just do a reverse lookup in your alphabet.

  1. e9a62 will be resolved to "4th, 61st, and 0th letter in the alphabet".

    e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810

  2. Now find your database-record with WHERE id = 19158 and do the redirect.

Example implementations (provided by commenters)

  • C++
  • Python
  • Ruby
  • Haskell
  • C#
  • CoffeeScript
  • Perl

URL Shortening Site

I think you are quite on the right way.

One thing I would not do like you said, though, is about this part :

then use apache mod_rewrite and create
shorten url and then redirect.

I don't think I'd create an Apache RewriteRule, nor use mod_rewrite.



When receiving an short url, like short.com/MYID, Id would :

  • decrypt the "MYID" part to the id number in DB
  • fetch the URL from database
  • just redirect to that URL from some server code (like PHP, using the header function)

A bit like this I guess :

// fetch $urlFull from DB (corresponding to the MYID received in GET)
header('HTTP/1.x 301 Moved Permanently');
header('Location: ' . $urlFull);
die;



(edit) If by mod_rewrite you meant "transform short.com/MYID to short.com/id=MYID", oh, yes, in this case, of course !

I'm using something like this on one of my sites, btw :

RewriteEngine on
RewriteCond %{REQUEST_URI} !^/index.php
RewriteRule ^(.*)$ /index.php?hash=$1 [L]



Hope this helps :-)



Related Topics



Leave a reply



Submit