How to Implement a Short Url Like the Urls in Twitter

How to implement a short URL like the URLs in Twitter?

The easiest way is to:

  1. keep a database of all URLs
  2. when you insert a new URL into the database, find out the id of the auto-incrementing integer primary key.
  3. encode that integer into base 36 or 62 (digits + lowercase alpha or digits + mixed-case alpha). Voila! You have a short url!

Encoding to base 36/decoding from base 36 is simple in Ruby:

12341235.to_s(36)
#=> "7cik3"

"7cik3".to_i(36)
#=> 12341235

Encoding to base 62 is a bit tricker. Here's one way to do it:

module AnyBase
ENCODER = Hash.new do |h,k|
h[k] = Hash[ k.chars.map.with_index.to_a.map(&:reverse) ]
end
DECODER = Hash.new do |h,k|
h[k] = Hash[ k.chars.map.with_index.to_a ]
end
def self.encode( value, keys )
ring = ENCODER[keys]
base = keys.length
result = []
until value == 0
result << ring[ value % base ]
value /= base
end
result.reverse.join
end
def self.decode( string, keys )
ring = DECODER[keys]
base = keys.length
string.reverse.chars.with_index.inject(0) do |sum,(char,i)|
sum + ring[char] * base**i
end
end
end

...and here it is in action:

base36 = "0123456789abcdefghijklmnopqrstuvwxyz"
db_id = 12341235
p AnyBase.encode( db_id, base36 )
#=> "7cik3"
p AnyBase.decode( "7cik3", base36 )
#=> 12341235

base62 = [ *0..9, *'a'..'z', *'A'..'Z' ].join
p AnyBase.encode( db_id, base62 )
#=> "PMwb"
p AnyBase.decode( "PMwb", base62 )
#=> 12341235

Edit

If you want to avoid URLs that happen to be English words (for example, four-letter swear words) you can use a set of characters that does not include vowels:

base31 = ([*0..9,*'a'..'z'] - %w[a e i o u]).join
base52 = ([*0..9,*'a'..'z',*'A'..'Z'] - %w[a e i o u A E I O U]).join

However, with this you still have problems like AnyBase.encode(328059,base31) or AnyBase.encode(345055,base31) or AnyBase.encode(450324,base31). You may thus want to avoid vowel-like numbers as well:

base28 = ([*'0'..'9',*'a'..'z'] - %w[a e i o u 0 1 3]).join
base49 = ([*'0'..'9',*'a'..'z',*'A'..'Z'] - %w[a e i o u A E I O U 0 1 3]).join

This will also avoid the problem of "Is that a 0 or an O?" and "Is that a 1 or an I?".

Implement short urls (tinyurls) for twitter in c#?

I just published an article about doing this from bit.ly in a C# application.

Note that bit.ly requires a free login key that you will need in order for the code to work.

Is it possible to shorten url from Twitter API?

It isn't possible to shorten links using t.co through any means other than sending status updates or direct messages via Twitter. From the Twitter support site:

The link service at http://t.co is only used on links posted on Twitter and is not available as a general shortening service.

So, yes, you'll need to use some other shortening service.

How to crawl shortened urls and get the actual domain in python?

In order to extract domain name from the url, besides urlparse, you can use tldextract module:

>>> import tldextract
>>> urls = ['http://news.example.com',
'http://blog.example.com/eeaWdada5das',
'http://example.com/ewdaD585Jz']
>>> for url in urls:
... data = tldextract.extract(url)
... print '{0}.{1}'.format(data.domain, data.suffix)
...
example.com
example.com
example.com

UPD (example for com.mx):

>>> data = tldextract.extract('http://example.com.mx')
>>> print '{0}.{1}'.format(data.domain, data.suffix)
example.com.mx

Twitter API 1.1 - render twitter's t.co links

Thank you for your answers.

After analyzing the JSON in the suggested link (https://dev.twitter.com/docs/tweet-entities), I wrote a solution to the exposed problem:

// ...
$twitter_data = json_decode($json); // last line of the code in: http://stackoverflow.com/questions/12916539


// print the tweets, with the full URLs:
foreach ($twitter_data as $item) {
$text = $item->text;

foreach ($item->entities->urls as $url) {
$text = str_replace($url->url, $url->expanded_url, $text);
}
echo $text . '<br /><br />';

// optionally, here, the code from: http://stackoverflow.com/questions/15610968/
// can be added, too.
}

Long URLs from Twitter feeds without making additional API calls/HTTP requests

No, Twitter does not offer a urls entity in its RSS responses, nor does the include_entities option appear to work. You'll have to use a different response format e.g. JSON (with which you can use the include_entities option which includes an entities['urls'][n]['expanded_url'] object), or "unshorten" the URLs yourself after the fact.

Twitter auto shorten URL not working

It won't be visibly shortened in the compose window, but the compose window does detect URLs and adjusts the character count accordingly. Try pasting a huge long URL - it'll only use up 22 characters in the count.

Do note that Twitter shortens all URLs, even when "shortening" actually makes them longer. For example, "http://bit.ly" will use up 22 characters (not 19), not 13.



Related Topics



Leave a reply



Submit