MySQL Select 10 Random Rows from 600K Rows Fast

MySQL select 10 random rows from 600K rows fast

A great post handling several cases, from simple, to gaps, to non-uniform with gaps.

http://jan.kneschke.de/projects/mysql/order-by-rand/

For most general case, here is how you do it:

SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1

This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples

quick selection of a random row from a large table in mysql

Grab all the id's, pick a random one from it, and retrieve the full row.

If you know the id's are sequential without holes, you can just grab the max and calculate a random id.

If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.

If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.

Don't do that, nor should you order by a GUID, it has the same problem.

How can i optimize MySQL's ORDER BY RAND() function?

Try this:

SELECT  *
FROM (
SELECT @cnt := COUNT(*) + 1,
@lim := 10
FROM t_random
) vars
STRAIGHT_JOIN
(
SELECT r.*,
@lim := @lim - 1
FROM t_random r
WHERE (@cnt := @cnt - 1)
AND RAND(20090301) < @lim / @cnt
) i

This is especially efficient on MyISAM (since the COUNT(*) is instant), but even in InnoDB it's 10 times more efficient than ORDER BY RAND().

The main idea here is that we don't sort, but instead keep two variables and calculate the running probability of a row to be selected on the current step.

See this article in my blog for more detail:

  • Selecting random rows

Update:

If you need to select but a single random record, try this:

SELECT  aco.*
FROM (
SELECT minid + FLOOR((maxid - minid) * RAND()) AS randid
FROM (
SELECT MAX(ac_id) AS maxid, MIN(ac_id) AS minid
FROM accomodation
) q
) q2
JOIN accomodation aco
ON aco.ac_id =
COALESCE
(
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_id > randid
AND ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
),
(
SELECT accomodation.ac_id
FROM accomodation
WHERE ac_status != 'draft'
AND ac_images != 'b:0;'
AND NOT EXISTS
(
SELECT NULL
FROM accomodation_category
WHERE acat_id = ac_category
AND acat_slug = 'vendeglatohely'
)
ORDER BY
ac_id
LIMIT 1
)
)

This assumes your ac_id's are distributed more or less evenly.

How to get random rows in mysql

You could order the table by rand() and limit the results:

SELECT   id
FROM personel
WHERE id NOT IN (1, 2, 6)
ORDER BY rand()
LIMIT 5

How can I get random rows in MySQL (NO autoincrement)?

If your ids are truly random, you can just pick a random value and find the first id greater than or equal to that. And if your random value happens to be greater than any ids in the table, try again.

Ideally you pick the random value in your code, but unhex(md5(rand())) is a quick hack that should produce a random 16 byte string:

select id
from yourtable
where id >= unhex(md5(rand()))
order by id
limit 1

MySQL - selecting random row from large table

I suspect you are seeing that small range of values because RAND() (in the WHERE clause) is being evaluated for every row in the table. And it's much more likely that PhotoID on the row is going to be greater than a lower value returned by the expression on the right side. So the query is returning a set that is more weighted to the lower PhotoID values. With the ORDER BY, you're going to get the lowest.

To get a more random distribution, you'd need to have RAND() evaluated just one time. Also, I'd prefer not to execute multiple queries (three separate SELECT statements) when I can get the work done in a single statement, and without user-defined variables.

To implement the algorithm it looks like you are attempting to achieve, I'd approach it something like this:

  SELECT t.photoid 
, ...
FROM photos t
JOIN ( SELECT m.min_id + RAND() * (max_id - min_id) AS _rand
FROM ( SELECT MIN(p.photoid) AS min_id
, MAX(p.photoid) AS max_id
FROM photos p
) m
) r
ON r._rand <= t.photoid
ORDER BY t.photoid
LIMIT 1

In MySQL, the inline views (derived tables in the MySQL parlance) will be materialized first, before the outer query. Since m returns a single row, the RAND() function in r will be evaluated only one time. And then the single value from the expression will be used in the outer query.



Related Topics



Leave a reply



Submit