SQL Populate Table with Random Data

Populate tables with random data

You can use statements like the ones below to create sample data in your tables. The two statements below will put 1000 rows in each table. You can adjust the LEVEL on the first insert if you want more or less records.

INSERT INTO trip
        SELECT CAST (ROWNUM AS INTEGER)                                 AS id_trip,
               CAST (ROUND (DBMS_RANDOM.VALUE (1, 100)) AS INTEGER)     AS driver_mark
          FROM DUAL
    CONNECT BY LEVEL <= 1000;

INSERT INTO facturi
    SELECT CAST (ROWNUM AS INTEGER)
               AS id_factura,
           ROUND (DBMS_RANDOM.VALUE (1, 1000), 2)
               AS price,
           CASE WHEN DBMS_RANDOM.VALUE > .5 THEN ROUND (DBMS_RANDOM.VALUE (1, 1000), 2) END
               AS discount,
           t.id_trip
               AS id_trip
      FROM DUAL  d
           LEFT JOIN (  SELECT id_trip
                          FROM trip
                      ORDER BY DBMS_RANDOM.VALUE) t
               ON (1 = 1);

SQL Populate table with random data from a selected list of records

Just find your database's RANDOM function. For sql-server:

INSERT INTO your_table (position, action, sync)
SELECT
   CAST(RAND() * 3 AS INT) + 1, 
   CASE CAST(RAND() * 8 AS INT) 
        WHEN 0 THEN 'Weighlifting'
        WHEN 1 THEN 'Shoutout'
        WHEN ...
   END,
   CAST(RAND() * 4 AS INT)

I'm assuming the id column is set to auto-increment and will take care of itself.

The RAND function, as most random number generator functions, will return a number in [0, 1). Multiply, add and floor the result accordingly, to get an integer in the desired range.

I suggest you consider creating a table with the possible values for the "action" column and replacing the column with a FK to that table. It will make it simpler both to populate and maintain this table.

Populate table with random data

Assuming your Areas table has 50 records, with row ids 1-50, I would just look into using the RAND function.

Something like this seems to work:

SELECT ROUND(((50 - 1 -1) * RAND() + 1), 0) as AreakKey,
    ROUND(((100 - 1 -1) * RAND() + 1), 1) as Reading,
    DATEADD(mm,-3,GETDATE()) +
(
ABS(
    CAST(
        CAST( NewID() AS BINARY(8) ) AS INT
    )
)
%
CAST(
    (GETDATE() - DATEADD(mm,-3,GETDATE())) AS INT
    )
) as ReadingDateTime

And here's some SQL Fiddle.

Good luck.

Generate 100 rows with random data using SQL query

INSERT into tasks SELECT generate_series(1,100) AS id, 
md5(random()::text) AS business_name, 
md5(random()::text) AS meta_title,
md5(random()::text) AS status,
md5(random()::text) AS title,
md5(random()::text) AS type

How to generate random data for association table?

Simple

To insert every possible combination with an even chance of 1 percent:

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM   product p
JOIN   atribute a ON random() >= 0.99;

More sophisitcated & faster

This offers more control over minimum and maximum number of attributes.

And it's a lot faster for your particular setup with gap-less attribute IDs (from 1 to 10000):

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM  (SELECT product_id, trunc(random() * 10)::int AS ct FROM product) p
CROSS  JOIN LATERAL (
   SELECT DISTINCT 1 + trunc(random() * 10000) AS atribut_id
   FROM   generate_series (0, p.ct) g
   ) a;

This generates 0 to 10 completely random associations per product.

2 + trunc(random() * 6)::int AS ct would generate 3 to 8 associations. (Note the 0-based generate_series()!)

Generate the number of attributes (ct) in a subquery to avoid single evaluation of random() resulting in the same number for all products.

We can optimize performance for gap-less IDs quite a bit. Instead of walking through all 10000 attribute for every product, just generate random numbers for the given range of IDs. Much faster.

Throw in DISTINCT to eliminate (unlikely) duplicates. Hardly matters for 10 our of 10000, but we can't allow duplicates. (So it's possible to get fewer associations in rare cases.)

We could even work with a few gaps. There are quite a few subtleties to selecting random rows. See:

Best way to select random rows PostgreSQL

Asides

You could also populate all three tables with random data in a single query using data-modifying CTEs. Simple example for a 1:n relationship:

How can I generate fake data that adheres to the PRIMARY KEY-FOREIGN KEY constraints of my schema?

Same principle for your case, jut two CTEs ...

How to implement a many-to-many relationship in PostgreSQL?

"Attributes" are spelled like this.

SQL Populate table with random data

I dont know exactly if this fits the requirement for a "random description", and it's not clear if you want to generate the full data: but, for example, this generates 10 records with consecutive ids and random texts:

  test=#  SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;

     id |               descr
    ----+----------------------------------
      1 | 65c141ee1fdeb269d2e393cb1d3e1c09
      2 | 269638b9061149e9228d1b2718cb035e
      3 | 020bce01ba6a6623702c4da1bc6d556e
      4 | 18fad4813efe3dcdb388d7d8c4b6d3b4
      5 | a7859b3bcf7ff11f921ceef58dc1e5b5
      6 | 63691d4a20f7f23843503349c32aa08c
      7 | ca317278d40f2f3ac81224f6996d1c57
      8 | bb4a284e1c53775a02ebd6ec91bbb847
      9 | b444b5ea7966cd76174a618ec0bb9901
     10 | 800495c53976f60641fb4d486be61dc6
    (10 rows)

Populate a table column randomly with values from a list in PostgreSQL

Postgres thinks it is doing you a favor by running random() only once -- forgetting that random() is a volatile function.

You can fix this using case:

update mytable
    set sold_status = (case (random()*5)::int when 0 then 'Available' when 1 then 'On Hold' when 2 then 'Partial Downpayment' when 3 then 'Sold/Unavailable' when 4 then 'Token Recieved' end);

Here is a db<>fiddle.

Populate random data from another table

SETUP

Let's start by assuming your tables an data are the following ones.
Note that I assume that dataset1 has a primary key (it can be a composite one, but, for the sake of simplicity, let's make it an integer):

CREATE TABLE dataset1
(
     id INTEGER PRIMARY KEY,
     column4 TEXT
) ;

CREATE TABLE dataset2
(
    column1 TEXT
) ;

We fill both tables with sample data

INSERT INTO dataset1
    (id, column4)
SELECT
    i, 'column 4 for id ' || i
FROM
    generate_series(101, 120) AS s(i);

INSERT INTO dataset2
    (column1)
SELECT
    'SOMETHING ' || i
FROM 
    generate_series (1001, 1020) AS s(i) ;

Sanity check:

SELECT count(DISTINCT column4) FROM dataset1 ;


| count |
| ----: |
|    20 |

Case 1: number of rows in dataset1 <= rows in dataset2

We'll perform a complete shuffling. Values from dataset2 will be used once, and no more than once.

EXPLANATION

In order to make an update that shuffles all the values from column4 in a
random fashion, we need some intermediate steps.

First, for the dataset1, we need to create a list (relation) of tuples (id, rn), that
are just:

(id_1,   1),
(id_2,   2),
(id_3,   3),
...
(id_20, 20)

Where id_1, ..., id_20 are the ids present on dataset1.
They can be of any type, they need not be consecutive, and they can be composite.

For the dataset2, we need to create another list of (column_1,rn), that looks like:

(column1_1,  17),
(column1_2,   3),
(column1_3,  11),
...
(column1_20, 15)

In this case, the second column contains all the values 1 .. 20, but shuffled.

Once we have the two relations, we JOIN them ON ... rn. This, in practice, produces yet another list of tuples with (id, column1), where the pairing has been done randomly. We use these pairs to update dataset1.

THE REAL QUERY

This can all be done (clearly, I hope) by using some CTE (WITH statement) to hold the intermediate relations:

WITH original_keys AS
(
    -- This creates tuples (id, rn), 
    -- where rn increases from 1 to number or rows
    SELECT 
        id, 
        row_number() OVER  () AS rn
    FROM 
        dataset1
)
, shuffled_data AS
(
    -- This creates tuples (column1, rn)
    -- where rn moves between 1 and number of rows, but is randomly shuffled
    SELECT 
        column1,
        -- The next statement is what *shuffles* all the data
        row_number() OVER  (ORDER BY random()) AS rn
    FROM 
        dataset2
)
-- You update your dataset1
-- with the shuffled data, linking back to the original keys
UPDATE
    dataset1
SET
    column4 = shuffled_data.column1
FROM
    shuffled_data
    JOIN original_keys ON original_keys.rn = shuffled_data.rn
WHERE
    dataset1.id = original_keys.id ;

Note that the trick is performed by means of:

row_number() OVER (ORDER BY random()) AS rn

The row_number() window function that produces as many consecutive numbers as there are rows, starting from 1.
These numbers are randomly shuffled because the OVER clause takes all the data and sorts it randomly.

CHECKS

We can check again:

SELECT count(DISTINCT column4) FROM dataset1 ;


| count |
| ----: |
|    20 |

SELECT * FROM dataset1 ;


 id | column4       
--: | :-------------
101 | SOMETHING 1016
102 | SOMETHING 1009
103 | SOMETHING 1003
...
118 | SOMETHING 1012
119 | SOMETHING 1017
120 | SOMETHING 1011

ALTERNATIVE

Note that this can also be done with subqueries, by simple substitution, instead of CTEs. That might improve performance in some occasions:

UPDATE
    dataset1
SET
    column4 = shuffled_data.column1
FROM
    (SELECT 
        column1,
        row_number() OVER  (ORDER BY random()) AS rn
    FROM 
        dataset2
    ) AS shuffled_data
    JOIN 
    (SELECT 
        id, 
        row_number() OVER  () AS rn
    FROM 
        dataset1
    ) AS original_keys ON original_keys.rn = shuffled_data.rn
WHERE
    dataset1.id = original_keys.id ;

And again...

SELECT * FROM dataset1;


 id | column4       
--: | :-------------
101 | SOMETHING 1011
102 | SOMETHING 1018
103 | SOMETHING 1007
...
118 | SOMETHING 1020
119 | SOMETHING 1002
120 | SOMETHING 1016

You can check the whole setup and experiment at dbfiddle here

NOTE: if you do this with very large datasets, don't expect it to be extremely fast. Shuffling a very big deck of cards is expensive.

Case 2: number of rows in dataset1 > rows in dataset2

In this case, values for column4 can be repeated several times.

The easiest possibility I can think of (probably, not an efficient one, but easy to understand) is to create a function random_column1, marked as VOLATILE:

CREATE FUNCTION random_column1() 
    RETURNS TEXT
    VOLATILE      -- important!
    LANGUAGE SQL
AS
$$
    SELECT
        column1
    FROM
        dataset2
    ORDER BY
        random()
    LIMIT
        1 ;
$$ ;

And use it to update:

UPDATE
    dataset1
SET
    column4 = random_column1();

This way, some values from dataset2 might not be used at all, whereas others will be used more than once.

dbfiddle here