How to Programmatically Generate Heroku-Like Subdomain Names

How can I programmatically generate Heroku-like subdomain names?

Engineer at the Heroku API team here: we went with the simplest approach to generate app names, which is basically what you suggested: keep arrays of adjectives and nouns in memory, pick an element from each at random and combine it with a random number from 1000 to 9999.

Not the most thrilling code I've written, but it's interesting to see what we had to do in order to scale this:

  • At first we were picking a name, trying to INSERT and then rescuing the uniqueness constraint error to pick a different name. This worked fine while we had a large pool of names (and a not-so-large set of apps using them), but at a certain scale we started to notice a lot of collisions during name generation.

    To make it more resilient we decided to pick several names and check which ones are still available with a single query. We obviously still need to check for errors and retry because of race conditions, but with so many apps in the table this is clearly more effective.

    It also has the added benefit of providing an easy hook for us to get an alert if our name pool is low (eg: if 1/3 of the random names are taken, send an alert).

  • The first time we had issues with collisions we just radically increased the size of our name pool by going from 2 digits to 4. With 61 adjectives and 74 nouns this took us from ~400k to ~40mi names (61 * 74 * 8999).

  • But by the time we were running 2 million apps we started receiving collision alerts again, and at a much higher rate than expected: About half of the names were colliding, what made no sense considering our pool size and amount of apps running.

    The culprit as you might have guessed is that rand is a pretty bad pseudorandom number generator. Picking random elements and numbers with SecureRandom instead radically lowered the amount of collisions, making it match what we expected in first place.

With so much work going to scale this approach we had to ask whether there's a better way to generate names in first place. Some of the ideas discussed were:

  • Make the name generation a function of the application id. This would be much faster and avoid the issue with collisions entirely, but on the downside it would waste a lot of names with deleted apps (and damn, we have A LOT of apps being created and deleted shortly after as part of different integration tests).

  • Another option to make name generation deterministic is to have the pool of available names in the database. This would make it easy to do things like only reusing a name 2 weeks after the app was deleted.

Excited to see what we'll do next time the collision alert triggers!

Hope this helps anyone working on friendly name generation out there.

Random fake names

As you specifically asked for Heroku, you can read up on their implementation here How can I programmatically generate Heroku-like subdomain names?.

Implement an efficient slack-like subdomain name suggestion

I would go with reversed approach: query the database for existing records using LIKE and then generate suggestions skipping already taken:

def alternatives(model, column, word, count)
taken = model.class.where("#{column} LIKE '%#{word}%'").pluck(column)
count.times.map! do |i|
generate_candidates_using_a_certain_strategy(i, taken)
end
end

Make a generate_candidates_using_a_certain_strategy to receive an array of already taken words to be skipped. There could be one possible glitch with race condition on two requests taking the same name, but I don’t think it might cause any problems, since you are always free to apologize when an actual creation will fail.

How to generate a random string in Ruby

(0...8).map { (65 + rand(26)).chr }.join

I spend too much time golfing.

(0...50).map { ('a'..'z').to_a[rand(26)] }.join

And a last one that's even more confusing, but more flexible and wastes fewer cycles:

o = [('a'..'z'), ('A'..'Z')].map(&:to_a).flatten
string = (0...50).map { o[rand(o.length)] }.join

If you want to generate some random text then use the following:

50.times.map { (0...(rand(10))).map { ('a'..'z').to_a[rand(26)] }.join }.join(" ")

this code generates 50 random word string with words length less than 10 characters and then join with space

Can I convert the socket.io client id into a slightly shorter one without running into collision risks?

RISKY SOLUTION

You can change directly the id of the sockets when they connect:

socket  = io.connect('http://localhost');

socket.on('connect', function() {
console.log(socket.io.engine.id); // old ID
socket.io.engine.id = 'new ID';
console.log(socket.io.engine.id); // new ID
});

SAFE SOLUTION

Or you can simply save sockets in an object in server side:

socket  = io.connect('http://localhost');
var clients = {};

socket.on('connect', function() {
clients[customId] = socket.id;
});

var lookup = clients[customId];

API to find Nouns in Sentence and Nearest Adjective in Meaning in Ruby

I don't know about Ruby, but to determine the part of speech of a word (like whether it's a noun) you need what's called a "part of speech tagger". For the second part, it sounds like WordNet will help you. WordNet is a database of English words (you didn't say what language you're interested in) with relationships like "similar in meaning", "more specific" (like "cat" is more specific than "animal"), "opposite in meaning", etc.



Related Topics



Leave a reply



Submit