Populate tables with random data
You can use statements like the ones below to create sample data in your tables. The two statements below will put 1000 rows in each table. You can adjust the LEVEL
on the first insert if you want more or less records.
INSERT INTO trip
SELECT CAST (ROWNUM AS INTEGER) AS id_trip,
CAST (ROUND (DBMS_RANDOM.VALUE (1, 100)) AS INTEGER) AS driver_mark
FROM DUAL
CONNECT BY LEVEL <= 1000;
INSERT INTO facturi
SELECT CAST (ROWNUM AS INTEGER)
AS id_factura,
ROUND (DBMS_RANDOM.VALUE (1, 1000), 2)
AS price,
CASE WHEN DBMS_RANDOM.VALUE > .5 THEN ROUND (DBMS_RANDOM.VALUE (1, 1000), 2) END
AS discount,
t.id_trip
AS id_trip
FROM DUAL d
LEFT JOIN ( SELECT id_trip
FROM trip
ORDER BY DBMS_RANDOM.VALUE) t
ON (1 = 1);
SQL Populate table with random data from a selected list of records
Just find your database's RANDOM function. For sql-server:
INSERT INTO your_table (position, action, sync)
SELECT
CAST(RAND() * 3 AS INT) + 1,
CASE CAST(RAND() * 8 AS INT)
WHEN 0 THEN 'Weighlifting'
WHEN 1 THEN 'Shoutout'
WHEN ...
END,
CAST(RAND() * 4 AS INT)
I'm assuming the id column is set to auto-increment and will take care of itself.
The RAND function, as most random number generator functions, will return a number in [0, 1). Multiply, add and floor the result accordingly, to get an integer in the desired range.
I suggest you consider creating a table with the possible values for the "action" column and replacing the column with a FK to that table. It will make it simpler both to populate and maintain this table.
Populate table with random data
Assuming your Areas table has 50 records, with row ids 1-50, I would just look into using the RAND
function.
Something like this seems to work:
SELECT ROUND(((50 - 1 -1) * RAND() + 1), 0) as AreakKey,
ROUND(((100 - 1 -1) * RAND() + 1), 1) as Reading,
DATEADD(mm,-3,GETDATE()) +
(
ABS(
CAST(
CAST( NewID() AS BINARY(8) ) AS INT
)
)
%
CAST(
(GETDATE() - DATEADD(mm,-3,GETDATE())) AS INT
)
) as ReadingDateTime
And here's some SQL Fiddle.
Good luck.
Generate 100 rows with random data using SQL query
INSERT into tasks SELECT generate_series(1,100) AS id,
md5(random()::text) AS business_name,
md5(random()::text) AS meta_title,
md5(random()::text) AS status,
md5(random()::text) AS title,
md5(random()::text) AS type
How to generate random data for association table?
Simple
To insert every possible combination with an even chance of 1 percent:
INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM product p
JOIN atribute a ON random() >= 0.99;
More sophisitcated & faster
This offers more control over minimum and maximum number of attributes.
And it's a lot faster for your particular setup with gap-less attribute IDs (from 1 to 10000):
INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM (SELECT product_id, trunc(random() * 10)::int AS ct FROM product) p
CROSS JOIN LATERAL (
SELECT DISTINCT 1 + trunc(random() * 10000) AS atribut_id
FROM generate_series (0, p.ct) g
) a;
This generates 0 to 10 completely random associations per product.2 + trunc(random() * 6)::int AS ct
would generate 3 to 8 associations. (Note the 0-based generate_series()
!)
Generate the number of attributes (ct
) in a subquery to avoid single evaluation of random()
resulting in the same number for all products.
We can optimize performance for gap-less IDs quite a bit. Instead of walking through all 10000 attribute for every product, just generate random numbers for the given range of IDs. Much faster.
Throw in DISTINCT
to eliminate (unlikely) duplicates. Hardly matters for 10 our of 10000, but we can't allow duplicates. (So it's possible to get fewer associations in rare cases.)
We could even work with a few gaps. There are quite a few subtleties to selecting random rows. See:
- Best way to select random rows PostgreSQL
Asides
You could also populate all three tables with random data in a single query using data-modifying CTEs. Simple example for a 1:n relationship:
- How can I generate fake data that adheres to the PRIMARY KEY-FOREIGN KEY constraints of my schema?
Same principle for your case, jut two CTEs ...
Related:
- How to implement a many-to-many relationship in PostgreSQL?
"Attributes" are spelled like this.
SQL Populate table with random data
I dont know exactly if this fits the requirement for a "random description", and it's not clear if you want to generate the full data: but, for example, this generates 10 records with consecutive ids and random texts:
test=# SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;
id | descr
----+----------------------------------
1 | 65c141ee1fdeb269d2e393cb1d3e1c09
2 | 269638b9061149e9228d1b2718cb035e
3 | 020bce01ba6a6623702c4da1bc6d556e
4 | 18fad4813efe3dcdb388d7d8c4b6d3b4
5 | a7859b3bcf7ff11f921ceef58dc1e5b5
6 | 63691d4a20f7f23843503349c32aa08c
7 | ca317278d40f2f3ac81224f6996d1c57
8 | bb4a284e1c53775a02ebd6ec91bbb847
9 | b444b5ea7966cd76174a618ec0bb9901
10 | 800495c53976f60641fb4d486be61dc6
(10 rows)
Populate a table column randomly with values from a list in PostgreSQL
Postgres thinks it is doing you a favor by running random()
only once -- forgetting that random()
is a volatile function.
You can fix this using case
:
update mytable
set sold_status = (case (random()*5)::int when 0 then 'Available' when 1 then 'On Hold' when 2 then 'Partial Downpayment' when 3 then 'Sold/Unavailable' when 4 then 'Token Recieved' end);
Here is a db<>fiddle.
Populate random data from another table
SETUP
Let's start by assuming your tables an data are the following ones.
Note that I assume that dataset1
has a primary key (it can be a composite one, but, for the sake of simplicity, let's make it an integer):
CREATE TABLE dataset1
(
id INTEGER PRIMARY KEY,
column4 TEXT
) ;
CREATE TABLE dataset2
(
column1 TEXT
) ;
We fill both tables with sample data
INSERT INTO dataset1
(id, column4)
SELECT
i, 'column 4 for id ' || i
FROM
generate_series(101, 120) AS s(i);
INSERT INTO dataset2
(column1)
SELECT
'SOMETHING ' || i
FROM
generate_series (1001, 1020) AS s(i) ;
Sanity check:
SELECT count(DISTINCT column4) FROM dataset1 ;
| count |
| ----: |
| 20 |
Case 1: number of rows in dataset1 <= rows in dataset2
We'll perform a complete shuffling. Values from dataset2 will be used once, and no more than once.
EXPLANATION
In order to make an update that shuffles all the values from column4
in a
random fashion, we need some intermediate steps.
First, for the dataset1
, we need to create a list (relation) of tuples (id, rn)
, that
are just:
(id_1, 1),
(id_2, 2),
(id_3, 3),
...
(id_20, 20)
Where id_1
, ..., id_20
are the ids present on dataset1
.
They can be of any type, they need not be consecutive, and they can be composite.
For the dataset2
, we need to create another list of (column_1,rn)
, that looks like:
(column1_1, 17),
(column1_2, 3),
(column1_3, 11),
...
(column1_20, 15)
In this case, the second column contains all the values 1 .. 20, but shuffled.
Once we have the two relations, we JOIN
them ON ... rn
. This, in practice, produces yet another list of tuples with (id, column1)
, where the pairing has been done randomly. We use these pairs to update dataset1
.
THE REAL QUERY
This can all be done (clearly, I hope) by using some CTE (WITH
statement) to hold the intermediate relations:
WITH original_keys AS
(
-- This creates tuples (id, rn),
-- where rn increases from 1 to number or rows
SELECT
id,
row_number() OVER () AS rn
FROM
dataset1
)
, shuffled_data AS
(
-- This creates tuples (column1, rn)
-- where rn moves between 1 and number of rows, but is randomly shuffled
SELECT
column1,
-- The next statement is what *shuffles* all the data
row_number() OVER (ORDER BY random()) AS rn
FROM
dataset2
)
-- You update your dataset1
-- with the shuffled data, linking back to the original keys
UPDATE
dataset1
SET
column4 = shuffled_data.column1
FROM
shuffled_data
JOIN original_keys ON original_keys.rn = shuffled_data.rn
WHERE
dataset1.id = original_keys.id ;
Note that the trick is performed by means of:
row_number() OVER (ORDER BY random()) AS rn
The row_number()
window function that produces as many consecutive numbers as there are rows, starting from 1.
These numbers are randomly shuffled because the OVER
clause takes all the data and sorts it randomly.
CHECKS
We can check again:
SELECT count(DISTINCT column4) FROM dataset1 ;
| count |
| ----: |
| 20 |
SELECT * FROM dataset1 ;
id | column4
--: | :-------------
101 | SOMETHING 1016
102 | SOMETHING 1009
103 | SOMETHING 1003
...
118 | SOMETHING 1012
119 | SOMETHING 1017
120 | SOMETHING 1011
ALTERNATIVE
Note that this can also be done with subqueries, by simple substitution, instead of CTEs. That might improve performance in some occasions:
UPDATE
dataset1
SET
column4 = shuffled_data.column1
FROM
(SELECT
column1,
row_number() OVER (ORDER BY random()) AS rn
FROM
dataset2
) AS shuffled_data
JOIN
(SELECT
id,
row_number() OVER () AS rn
FROM
dataset1
) AS original_keys ON original_keys.rn = shuffled_data.rn
WHERE
dataset1.id = original_keys.id ;
And again...
SELECT * FROM dataset1;
id | column4
--: | :-------------
101 | SOMETHING 1011
102 | SOMETHING 1018
103 | SOMETHING 1007
...
118 | SOMETHING 1020
119 | SOMETHING 1002
120 | SOMETHING 1016
You can check the whole setup and experiment at dbfiddle here
NOTE: if you do this with very large datasets, don't expect it to be extremely fast. Shuffling a very big deck of cards is expensive.
Case 2: number of rows in dataset1 > rows in dataset2
In this case, values for column4
can be repeated several times.
The easiest possibility I can think of (probably, not an efficient one, but easy to understand) is to create a function random_column1
, marked as VOLATILE
:
CREATE FUNCTION random_column1()
RETURNS TEXT
VOLATILE -- important!
LANGUAGE SQL
AS
$$
SELECT
column1
FROM
dataset2
ORDER BY
random()
LIMIT
1 ;
$$ ;
And use it to update:
UPDATE
dataset1
SET
column4 = random_column1();
This way, some values from dataset2
might not be used at all, whereas others will be used more than once.
dbfiddle here
Related Topics
How to Return a Incremental Group Number Per Group in SQL
Find SQL Records Containing Similar Strings
Regular Expression to Match All Comments in a T-SQL Script
Is There a Postgres Closest Operator
Generate Default Values in a Cte Upsert Using Postgresql 9.3
SQL Server Query Time Out Depending on Where Clause
Check If Table Exists and If It Doesn't Exist, Create It in SQL Server 2008
Do I Need to Create Indexes on Foreign Keys on Oracle
How to Compare 2 Rows from the Same Table (SQL Server)
How to Retrieve Decimals When Rounding an Average in SQL
Postgresql Join with Array Type with Array Elements Order, How to Implement
Finding Node Order in Xml Document in SQL Server
SQL Server Unique Composite Key of Two Field with Second Field Auto-Increment
T-SQL Calculate Moving Average
SQL Server - Lack of Natural Join/X Join Y Using(Field)
How to Compare Dates in SQL Server