PHP short hash like URL-shortening websites
URL shortening services rather use a auto incremented integer value (like a supplementary database ID) and encode that with Base64 or other encodings to have more information per character (64 instead of just 10 like digits).
Designing a Good Hash Function for Set Length URL Shortening in PHP
If I were you, I would make a case sensitive alphanumeric increment-er. Just increment, and assign the number to a database row. To check for bad words, just check against a black list. If it passes, great. If not, just increment again.
This way, instead of a hash algorithm, they're just in order. The first few would look like this:
id | url
-------------------------
0000 | http://google.com
0001 | http://yahoo.com
0002 | http://example.com
...
000a | http://mail.google.com
000b | http://adobe.com
...
000A | http://microsof.com
...
0010 | http://w3.org
...
00a0 | http://youtube.com
...
00A0 | http://stackoverflow.com
And so on.
Here's a hint on how the function will work:
http://us3.php.net/manual/en/function.ord.php
BTW, my math might be wrong, but I think it's (10 + 26 + 26) ^ 4 = 14776336
Edit: Just for the fun and the challenge, I wrote an incrementer function. When the max is reached, it returns false, so just compare it to false (with ===) when using it.
http://pastebin.com/957KPn4p
Shorter hash strings without compromising security
- Generate a random N-character string.
- See if anything else already has that string as its shorturl in the database.
- If yes, go to 1. If no, store that string as the shorturl for the resource in the database.
There's no need to use hashing for url shorteners when you have a persistent datastore, because you're not actually encoding the long url, you're just associating a token with it.
Designing a URL Shortening service like TinyURL
The problem is that you are using the output of MD5 as a string of hexadecimal digits, and then base64 encoding that string. There's no reason to base64 encode that string - base64 encoding is intended for binary data. What you probably wanted to do is base64 the actual 128-bit binary value of the MD5 hash. Here is some Python code that does what I think you are trying to do:
import hashlib, base64
text = "www.yahoo.com"
text_utf8 = text.encode('utf8')
md5 = hashlib.md5(text_utf8).digest()
b64 = base64.b64encode(md5)
print(b64)
This gets the result GwNXftEE8WqtwApjnTPLRA
which has the length you were expecting.
How do I create a URL shortener?
I would continue your "convert number to string" approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
Theoretical background
You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:
- There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
- and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL
- Think of an alphabet we want to use. In your case, that's
[a-zA-Z0-9]
. It contains 62 letters. Take an auto-generated, unique numerical key (the auto-incremented
id
of a MySQL table for example).For this example, I will use 12510 (125 with a base of 10).
Now you have to convert 12510 to X62 (base 62).
12510 = 2×621 + 1×620 =
[2,1]
This requires the use of integer division and modulo. A pseudo-code example:
digits = []
while num > 0
remainder = modulo(num, 62)
digits.push(remainder)
num = divide(num, 62)
digits = digits.reverseNow map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a
1 → b
...
25 → z
...
52 → 0
61 → 9With 2 → c and 1 → b, you will receive cb62 as the shortened URL.
http://shor.ty/cb
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.
e9a62 will be resolved to "4th, 61st, and 0th letter in the alphabet".
e9a62 =
[4,61,0]
= 4×622 + 61×621 + 0×620 = 1915810Now find your database-record with
WHERE id = 19158
and do the redirect.
Example implementations (provided by commenters)
- C++
- Python
- Ruby
- Haskell
- C#
- CoffeeScript
- Perl
URL Shortening Site
I think you are quite on the right way.
One thing I would not do like you said, though, is about this part :
then use apache mod_rewrite and create
shorten url and then redirect.
I don't think I'd create an Apache RewriteRule, nor use mod_rewrite
.
When receiving an short url, like short.com/MYID
, Id would :
- decrypt the "MYID" part to the id number in DB
- fetch the URL from database
- just redirect to that URL from some server code (like PHP, using the
header
function)
A bit like this I guess :
// fetch $urlFull from DB (corresponding to the MYID received in GET)
header('HTTP/1.x 301 Moved Permanently');
header('Location: ' . $urlFull);
die;
(edit) If by mod_rewrite
you meant "transform short.com/MYID to short.com/id=MYID", oh, yes, in this case, of course !
I'm using something like this on one of my sites, btw :
RewriteEngine on
RewriteCond %{REQUEST_URI} !^/index.php
RewriteRule ^(.*)$ /index.php?hash=$1 [L]
Hope this helps :-)
Related Topics
Can't Access Global Variable Inside Function
How to Linkify Urls in a String With PHP
PHP: Check If an Array Contains All Array Values from Another Array
PHP Setcookie "Samesite=Strict"
Get the Date of Next Monday, Tuesday, etc
How to Get the Real Url After File_Get_Contents If Redirection Happens
PHP Check If File Contains a String
With "Magic Quotes" Disabled, Why Does PHP/Wordpress Continue to Auto-Escape My Post Data
Decode Gzipped Web Page Retrieved Via Curl in PHP
How to Loop Through an Associative Array and Get the Key
Converting MySQL Result Array to Json
Why Is Register_Globals So Bad
Does MySQL_Real_Escape_String() Fully Protect Against SQL Injection
Should I Use Curly Brackets or Concatenate Variables Within Strings