Selecting Random Rows in MySQL

MySQL select 10 random rows from 600K rows fast

A great post handling several cases, from simple, to gaps, to non-uniform with gaps.

http://jan.kneschke.de/projects/mysql/order-by-rand/

For most general case, here is how you do it:

SELECT name
FROM random AS r1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM random)) AS id)
AS r2
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 1

This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples

quick selection of a random row from a large table in mysql

Grab all the id's, pick a random one from it, and retrieve the full row.

If you know the id's are sequential without holes, you can just grab the max and calculate a random id.

If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.

If you order by random, you're going to have a terrible table-scan on your hands, and the word quick doesn't apply to such a solution.

Don't do that, nor should you order by a GUID, it has the same problem.

select two random rows in MySQL database

you could use

SELECT img_id, title, file_loc FROM images order by rand() limit 2

so you'd end up with

$query = $conn->prepare('SELECT img_id, title, file_loc FROM images order by rand() limit 2');
$query->execute();
$result = $query->fetchAll();

foreach($result as $row) {
echo 'title:' . $row['file_loc'] . '<br /><img src="' . $row['file_loc'] . '" />';
}

Note that order by rand() can be especially slow on large tables. See How can i optimize MySQL's ORDER BY RAND() function?
for ways to optimize it

How do I select random rows from a table with limit and no duplicates?

One method is to pass an argument to RAND() - that is called the seed. As explained in the documentation:

One implication of this behavior is that for equal argument values, RAND(N) returns the same value each time, and thus produces a repeatable sequence of column values.

So, consider, for example:

SELECT * FROM table_name ORDER BY RAND(123) LIMIT 100, 101;

The sort is repeatable as long as you give the same seed. Just make sure to change the seed when you start a sequence of searches.

Selecting random rows with MySQL

The ORDER BY RAND() solution that most people recommend doesn't scale to large tables, as you already know.

SET @r := (SELECT FLOOR(RAND() * (SELECT COUNT(*) FROM mytable)));
SET @sql := CONCAT('SELECT * FROM mytable LIMIT 1 OFFSET ', @r);
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

I cover this and other solutions in my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.


If you want to do this with PHP, you could do something like this (not tested):

<?php
$mysqli->begin_transaction();
$result = $mysqli->query("SELECT COUNT(*) FROM mytable")
$row = $result->fetch_row();
$count = $row[0];
$offset = mt_rand(0, $count);
$result = $mysqli->query("SELECT * FROM mytable LIMIT 1 OFFSET $offset");
...
$mysqli->commit();

MySQL - selecting random row from large table

I suspect you are seeing that small range of values because RAND() (in the WHERE clause) is being evaluated for every row in the table. And it's much more likely that PhotoID on the row is going to be greater than a lower value returned by the expression on the right side. So the query is returning a set that is more weighted to the lower PhotoID values. With the ORDER BY, you're going to get the lowest.

To get a more random distribution, you'd need to have RAND() evaluated just one time. Also, I'd prefer not to execute multiple queries (three separate SELECT statements) when I can get the work done in a single statement, and without user-defined variables.

To implement the algorithm it looks like you are attempting to achieve, I'd approach it something like this:

  SELECT t.photoid 
, ...
FROM photos t
JOIN ( SELECT m.min_id + RAND() * (max_id - min_id) AS _rand
FROM ( SELECT MIN(p.photoid) AS min_id
, MAX(p.photoid) AS max_id
FROM photos p
) m
) r
ON r._rand <= t.photoid
ORDER BY t.photoid
LIMIT 1

In MySQL, the inline views (derived tables in the MySQL parlance) will be materialized first, before the outer query. Since m returns a single row, the RAND() function in r will be evaluated only one time. And then the single value from the expression will be used in the outer query.

MYSQL select 2 random rows from each categories

Just fetch 2 per category as you described, and one random at the end. It is not one query, but one result-set, which might be what you need:

SELECT * FROM (SELECT * FROM questions WHERE category= 1 ORDER BY rand() limit 0,2) as t1
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 2 ORDER BY rand() limit 0,2) as t2
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 3 ORDER BY rand() limit 0,2) as t3
UNION
...

(The nested Select allows you to sort by rand() per category)
Nothing special so far - 2 random questions per category.

The tricky part now is to add the 15th, element WITHOUT selecting any of those you already have.

To achieve this with "one" call, You can do the following:

  • Take the subset of 14 questions you have selected like above.
  • Union this with a uncategorized set of random sorted things from the database. (limit 0,15)
  • Select all from this result, limit 0,15.

  • IF the first 14 elements of the LAST subquery are already selected - they will be removed due to UNION, and a independent 15th element is guaranteed.

  • If the final inner query selects 15 distinct questions as well, the outer limit 0,15 will only take the first of them into the result.

Something like:

SELECT * FROM (
SELECT * FROM (SELECT * FROM questions WHERE category= 1 ORDER BY rand() limit 0,2) as t1
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 2 ORDER BY rand() limit 0,2) as t2
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 3 ORDER BY rand() limit 0,2) as t3
UNION
...
UNION
SELECT * FROM (SELECT * FROM questions ORDER BY rand() LIMIT 0,15) as t8
) AS tx LIMIT 0,15

This is somewhat ugly, but should exactly do what you need: 2 random questions from EACH category, and finally a random question that has NOT been selected already from ANY category. A total of 15 questions at any time.

(Sidenode: You could as well run a second query, using NOT IN () to dissallow already selected questions after determining the 14 questions for the 7 categories.)

Edit: Unfortunately SQL Fiddle is not working at the moment. Here's some fiddle code:

CREATE TABLE questions (id int(10), category int(10), question varchar(20));

INSERT INTO questions (id, category, question)VALUES(1,1,"Q1");
INSERT INTO questions (id, category, question)VALUES(2,1,"Q2");
INSERT INTO questions (id, category, question)VALUES(3,1,"Q3");
INSERT INTO questions (id, category, question)VALUES(4,2,"Q4");
INSERT INTO questions (id, category, question)VALUES(5,2,"Q5");
INSERT INTO questions (id, category, question)VALUES(6,2,"Q6");
INSERT INTO questions (id, category, question)VALUES(7,3,"Q7");
INSERT INTO questions (id, category, question)VALUES(8,3,"Q8");
INSERT INTO questions (id, category, question)VALUES(9,3,"Q9");
INSERT INTO questions (id, category, question)VALUES(10,4,"Q10");
INSERT INTO questions (id, category, question)VALUES(11,4,"Q11");
INSERT INTO questions (id, category, question)VALUES(12,4,"Q12");
INSERT INTO questions (id, category, question)VALUES(13,5,"Q13");
INSERT INTO questions (id, category, question)VALUES(14,5,"Q14");
INSERT INTO questions (id, category, question)VALUES(15,5,"Q15");
INSERT INTO questions (id, category, question)VALUES(16,6,"Q16");
INSERT INTO questions (id, category, question)VALUES(17,6,"Q17");
INSERT INTO questions (id, category, question)VALUES(18,6,"Q18");
INSERT INTO questions (id, category, question)VALUES(19,7,"Q19");
INSERT INTO questions (id, category, question)VALUES(20,7,"Q20");
INSERT INTO questions (id, category, question)VALUES(21,7,"Q21");

Query

SELECT * FROM (
SELECT * FROM (SELECT * FROM questions WHERE category= 1 ORDER BY rand() limit 0,2) as t1
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 2 ORDER BY rand() limit 0,2) as t2
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 3 ORDER BY rand() limit 0,2) as t3
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 4 ORDER BY rand() limit 0,2) as t4
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 5 ORDER BY rand() limit 0,2) as t5
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 6 ORDER BY rand() limit 0,2) as t6
UNION
SELECT * FROM (SELECT * FROM questions WHERE category= 7 ORDER BY rand() limit 0,2) as t7
UNION
SELECT * FROM (SELECT * FROM questions ORDER BY rand() LIMIT 0,15) as t8
) AS tx LIMIT 0,15

the example data contains 3 questions per type, leading to the result that the 15th question (last row) is ALWAYS the one remaining from a category.



Related Topics



Leave a reply



Submit