SQL Populate Table with Random Data

Populate tables with random data

You can use statements like the ones below to create sample data in your tables. The two statements below will put 1000 rows in each table. You can adjust the LEVEL on the first insert if you want more or less records.

INSERT INTO trip
SELECT CAST (ROWNUM AS INTEGER) AS id_trip,
CAST (ROUND (DBMS_RANDOM.VALUE (1, 100)) AS INTEGER) AS driver_mark
FROM DUAL
CONNECT BY LEVEL <= 1000;

INSERT INTO facturi
SELECT CAST (ROWNUM AS INTEGER)
AS id_factura,
ROUND (DBMS_RANDOM.VALUE (1, 1000), 2)
AS price,
CASE WHEN DBMS_RANDOM.VALUE > .5 THEN ROUND (DBMS_RANDOM.VALUE (1, 1000), 2) END
AS discount,
t.id_trip
AS id_trip
FROM DUAL d
LEFT JOIN ( SELECT id_trip
FROM trip
ORDER BY DBMS_RANDOM.VALUE) t
ON (1 = 1);

SQL Populate table with random data from a selected list of records

Just find your database's RANDOM function. For sql-server:

INSERT INTO your_table (position, action, sync)
SELECT
CAST(RAND() * 3 AS INT) + 1,
CASE CAST(RAND() * 8 AS INT)
WHEN 0 THEN 'Weighlifting'
WHEN 1 THEN 'Shoutout'
WHEN ...
END,
CAST(RAND() * 4 AS INT)

I'm assuming the id column is set to auto-increment and will take care of itself.

The RAND function, as most random number generator functions, will return a number in [0, 1). Multiply, add and floor the result accordingly, to get an integer in the desired range.

I suggest you consider creating a table with the possible values for the "action" column and replacing the column with a FK to that table. It will make it simpler both to populate and maintain this table.

Populate table with random data

Assuming your Areas table has 50 records, with row ids 1-50, I would just look into using the RAND function.

Something like this seems to work:

SELECT ROUND(((50 - 1 -1) * RAND() + 1), 0) as AreakKey,
ROUND(((100 - 1 -1) * RAND() + 1), 1) as Reading,
DATEADD(mm,-3,GETDATE()) +
(
ABS(
CAST(
CAST( NewID() AS BINARY(8) ) AS INT
)
)
%
CAST(
(GETDATE() - DATEADD(mm,-3,GETDATE())) AS INT
)
) as ReadingDateTime

And here's some SQL Fiddle.

Good luck.

Generate 100 rows with random data using SQL query

INSERT into tasks SELECT generate_series(1,100) AS id, 
md5(random()::text) AS business_name,
md5(random()::text) AS meta_title,
md5(random()::text) AS status,
md5(random()::text) AS title,
md5(random()::text) AS type

How to generate random data for association table?

Simple

To insert every possible combination with an even chance of 1 percent:

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM product p
JOIN atribute a ON random() >= 0.99;

More sophisitcated & faster

This offers more control over minimum and maximum number of attributes.

And it's a lot faster for your particular setup with gap-less attribute IDs (from 1 to 10000):

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM (SELECT product_id, trunc(random() * 10)::int AS ct FROM product) p
CROSS JOIN LATERAL (
SELECT DISTINCT 1 + trunc(random() * 10000) AS atribut_id
FROM generate_series (0, p.ct) g
) a;

This generates 0 to 10 completely random associations per product.

2 + trunc(random() * 6)::int AS ct would generate 3 to 8 associations. (Note the 0-based generate_series()!)

Generate the number of attributes (ct) in a subquery to avoid single evaluation of random() resulting in the same number for all products.

We can optimize performance for gap-less IDs quite a bit. Instead of walking through all 10000 attribute for every product, just generate random numbers for the given range of IDs. Much faster.

Throw in DISTINCT to eliminate (unlikely) duplicates. Hardly matters for 10 our of 10000, but we can't allow duplicates. (So it's possible to get fewer associations in rare cases.)

We could even work with a few gaps. There are quite a few subtleties to selecting random rows. See:

  • Best way to select random rows PostgreSQL

Asides

You could also populate all three tables with random data in a single query using data-modifying CTEs. Simple example for a 1:n relationship:

  • How can I generate fake data that adheres to the PRIMARY KEY-FOREIGN KEY constraints of my schema?

Same principle for your case, jut two CTEs ...

Related:

  • How to implement a many-to-many relationship in PostgreSQL?

"Attributes" are spelled like this.

SQL Populate table with random data

I dont know exactly if this fits the requirement for a "random description", and it's not clear if you want to generate the full data: but, for example, this generates 10 records with consecutive ids and random texts:

  test=#  SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;

id | descr
----+----------------------------------
1 | 65c141ee1fdeb269d2e393cb1d3e1c09
2 | 269638b9061149e9228d1b2718cb035e
3 | 020bce01ba6a6623702c4da1bc6d556e
4 | 18fad4813efe3dcdb388d7d8c4b6d3b4
5 | a7859b3bcf7ff11f921ceef58dc1e5b5
6 | 63691d4a20f7f23843503349c32aa08c
7 | ca317278d40f2f3ac81224f6996d1c57
8 | bb4a284e1c53775a02ebd6ec91bbb847
9 | b444b5ea7966cd76174a618ec0bb9901
10 | 800495c53976f60641fb4d486be61dc6
(10 rows)

Populate a table column randomly with values from a list in PostgreSQL

Postgres thinks it is doing you a favor by running random() only once -- forgetting that random() is a volatile function.

You can fix this using case:

update mytable
set sold_status = (case (random()*5)::int when 0 then 'Available' when 1 then 'On Hold' when 2 then 'Partial Downpayment' when 3 then 'Sold/Unavailable' when 4 then 'Token Recieved' end);

Here is a db<>fiddle.

Populate random data from another table

SETUP

Let's start by assuming your tables an data are the following ones.
Note that I assume that dataset1 has a primary key (it can be a composite one, but, for the sake of simplicity, let's make it an integer):

CREATE TABLE dataset1
(
id INTEGER PRIMARY KEY,
column4 TEXT
) ;

CREATE TABLE dataset2
(
column1 TEXT
) ;

We fill both tables with sample data

INSERT INTO dataset1
(id, column4)
SELECT
i, 'column 4 for id ' || i
FROM
generate_series(101, 120) AS s(i);

INSERT INTO dataset2
(column1)
SELECT
'SOMETHING ' || i
FROM
generate_series (1001, 1020) AS s(i) ;

Sanity check:

SELECT count(DISTINCT column4) FROM dataset1 ;

| count |
| ----: |
| 20 |


Case 1: number of rows in dataset1 <= rows in dataset2

We'll perform a complete shuffling. Values from dataset2 will be used once, and no more than once.

EXPLANATION

In order to make an update that shuffles all the values from column4 in a
random fashion, we need some intermediate steps.

First, for the dataset1, we need to create a list (relation) of tuples (id, rn), that
are just:

(id_1,   1),
(id_2, 2),
(id_3, 3),
...
(id_20, 20)

Where id_1, ..., id_20 are the ids present on dataset1.
They can be of any type, they need not be consecutive, and they can be composite.

For the dataset2, we need to create another list of (column_1,rn), that looks like:

(column1_1,  17),
(column1_2, 3),
(column1_3, 11),
...
(column1_20, 15)

In this case, the second column contains all the values 1 .. 20, but shuffled.

Once we have the two relations, we JOIN them ON ... rn. This, in practice, produces yet another list of tuples with (id, column1), where the pairing has been done randomly. We use these pairs to update dataset1.

THE REAL QUERY

This can all be done (clearly, I hope) by using some CTE (WITH statement) to hold the intermediate relations:

WITH original_keys AS
(
-- This creates tuples (id, rn),
-- where rn increases from 1 to number or rows
SELECT
id,
row_number() OVER () AS rn
FROM
dataset1
)
, shuffled_data AS
(
-- This creates tuples (column1, rn)
-- where rn moves between 1 and number of rows, but is randomly shuffled
SELECT
column1,
-- The next statement is what *shuffles* all the data
row_number() OVER (ORDER BY random()) AS rn
FROM
dataset2
)
-- You update your dataset1
-- with the shuffled data, linking back to the original keys
UPDATE
dataset1
SET
column4 = shuffled_data.column1
FROM
shuffled_data
JOIN original_keys ON original_keys.rn = shuffled_data.rn
WHERE
dataset1.id = original_keys.id ;

Note that the trick is performed by means of:

row_number() OVER (ORDER BY random()) AS rn

The row_number() window function that produces as many consecutive numbers as there are rows, starting from 1.
These numbers are randomly shuffled because the OVER clause takes all the data and sorts it randomly.

CHECKS

We can check again:

SELECT count(DISTINCT column4) FROM dataset1 ;

| count |
| ----: |
| 20 |
SELECT * FROM dataset1 ;

id | column4
--: | :-------------
101 | SOMETHING 1016
102 | SOMETHING 1009
103 | SOMETHING 1003
...
118 | SOMETHING 1012
119 | SOMETHING 1017
120 | SOMETHING 1011

ALTERNATIVE

Note that this can also be done with subqueries, by simple substitution, instead of CTEs. That might improve performance in some occasions:

UPDATE
dataset1
SET
column4 = shuffled_data.column1
FROM
(SELECT
column1,
row_number() OVER (ORDER BY random()) AS rn
FROM
dataset2
) AS shuffled_data
JOIN
(SELECT
id,
row_number() OVER () AS rn
FROM
dataset1
) AS original_keys ON original_keys.rn = shuffled_data.rn
WHERE
dataset1.id = original_keys.id ;

And again...

SELECT * FROM dataset1;

id | column4
--: | :-------------
101 | SOMETHING 1011
102 | SOMETHING 1018
103 | SOMETHING 1007
...
118 | SOMETHING 1020
119 | SOMETHING 1002
120 | SOMETHING 1016

You can check the whole setup and experiment at dbfiddle here

NOTE: if you do this with very large datasets, don't expect it to be extremely fast. Shuffling a very big deck of cards is expensive.



Case 2: number of rows in dataset1 > rows in dataset2

In this case, values for column4 can be repeated several times.

The easiest possibility I can think of (probably, not an efficient one, but easy to understand) is to create a function random_column1, marked as VOLATILE:

CREATE FUNCTION random_column1() 
RETURNS TEXT
VOLATILE -- important!
LANGUAGE SQL
AS
$$
SELECT
column1
FROM
dataset2
ORDER BY
random()
LIMIT
1 ;
$$ ;

And use it to update:

UPDATE
dataset1
SET
column4 = random_column1();

This way, some values from dataset2 might not be used at all, whereas others will be used more than once.

dbfiddle here



Related Topics



Leave a reply



Submit