Quick Selection of a Random Row from a Large Table in MySQL

quick selection of a random row from a large table in mysql

Grab all the id's, pick a random one from it, and retrieve the full row.

If you know the id's are sequential without holes, you can just grab the max and calculate a random id.

If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.

If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.

Don't do that, nor should you order by a GUID, it has the same problem.

MySQL - selecting random row from large table

I suspect you are seeing that small range of values because RAND() (in the WHERE clause) is being evaluated for every row in the table. And it's much more likely that PhotoID on the row is going to be greater than a lower value returned by the expression on the right side. So the query is returning a set that is more weighted to the lower PhotoID values. With the ORDER BY, you're going to get the lowest.

To get a more random distribution, you'd need to have RAND() evaluated just one time. Also, I'd prefer not to execute multiple queries (three separate SELECT statements) when I can get the work done in a single statement, and without user-defined variables.

To implement the algorithm it looks like you are attempting to achieve, I'd approach it something like this:

  SELECT t.photoid 
, ...
FROM photos t
JOIN ( SELECT m.min_id + RAND() * (max_id - min_id) AS _rand
FROM ( SELECT MIN(p.photoid) AS min_id
, MAX(p.photoid) AS max_id
FROM photos p
) m
) r
ON r._rand <= t.photoid
ORDER BY t.photoid
LIMIT 1

In MySQL, the inline views (derived tables in the MySQL parlance) will be materialized first, before the outer query. Since m returns a single row, the RAND() function in r will be evaluated only one time. And then the single value from the expression will be used in the outer query.

MySQL select 10 random rows from 600K rows fast

A great post handling several cases, from simple, to gaps, to non-uniform with gaps.

http://jan.kneschke.de/projects/mysql/order-by-rand/

For most general case, here is how you do it:

SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1

This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples

Generate random sample from huge table, with conditions

Add a column to your table and populate it with random numbers.

ALTER TABLE `table` ADD COLUMN rando FLOAT DEFAULT NULL;
UPDATE `table` SET rando = RAND() WHERE rando IS NULL;

Then do

SELECT * 
FROM `table`
WHERE rando > RAND() * 0.9
AND condition = 0
ORDER BY rando
LIMIT 5000

Do it again for condition = 1 and Bob's your uncle. It will pull rows in random order starting from a random row.

A couple of notes:

  • 0.9 is there to improve the chances you'll actually get 5000 rows and not some lesser number.
  • You may have to add LIMIT 1000 to the UPDATE statement and run it a whole bunch of times to populate the complete rando column: trying to update all the rows in a big table can generate a huge transaction and swamp your server for a long time.
  • If you need to generate another random sample, run the UPDATE or UPDATEs again.

Select a random row per group in a large table

I found out that going through all the entries take less time than this query. So I added a column as rule*max(id)+id and created an index on it (Should I use a view?).

I run the following query:

SELECT id,rule,temp FROM treenode where temp>? ORDER BY temp LIMIT 0,100000;

At the client go through all returned entries and fill a buffer. Whenever the rule changes I select a random item from the buffer and clear it (put index=0). Then I run the query again with ? as the value of the last returned temp value.

Select a random record from the mysql database, best method?

You will be totally fine with that many records. Even if you are in the thousands you will be fine. Sql queries almost always impress me with their speed, and for queries with so few records, you'll be fine.

how to select n random rows in mysql from subset of large table?

"So the question is, is the WHERE clause evaluated and processed before the ORDER BY RAND(), thereby making this an acceptable solution?"

Yes.

The order by is only a cursor that say in what order the data should be retrieved from set.



Related Topics



Leave a reply



Submit