Create a Dataframe with Random Numbers in Each Column

Pandas: create new column in df with random integers from range

One solution is to use numpy.random.randint:

import numpy as np
df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0])

Or if the numbers are non-consecutive (albeit slower), you can use this:

df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0])

In order to make the results reproducible you can set the seed with numpy.random.seed (e.g. np.random.seed(42))

How to create a DataFrame of random integers with Pandas?

numpy.random.randint accepts a third argument (size) , in which you can specify the size of the output array. You can use this to create your DataFrame -

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

Here - np.random.randint(0,100,size=(100, 4)) - creates an output array of size (100,4) with random integer elements between [0,100) .


Demo -

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

which produces:

     A   B   C   D
0 45 88 44 92
1 62 34 2 86
2 85 65 11 31
3 74 43 42 56
4 90 38 34 93
5 0 94 45 10
6 58 23 23 60
.. .. .. .. ..

Create a dataframe with random numbers in each column

You are looking for replicate:

data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))

These are the top few rows:

  X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 1 0 1 0 0 1 1 1 0
2 0 0 0 1 0 1 0 0 1 0
3 0 1 1 1 1 0 1 1 1 1
4 0 0 0 1 1 1 1 1 1 0
5 1 0 1 0 1 1 0 1 1 0
6 0 1 1 1 1 1 0 1 1 1

If you do the same command without wrapping it in data.frame(), you will have a matrix. Matrices are faster to work with, so you might want to investigate whether they are suitable for your problem.

How can I create a dataframe with random numbers columns, each with a different range?

n = 3
df = pd.DataFrame(dict(
A=np.random.randint(1, 6, size=n),
B=np.random.randint(1, 9, size=n),
C=np.random.randint(4, 11, size=n)
))

df

A B C
0 3 5 6
1 1 7 6
2 1 1 4

Or

df = pd.DataFrame(
np.random.rand(3, 3) * [5, 8, 7] + [1, 1, 4],
columns=list('ABC')
).astype(int)

df

A B C
0 3 6 10
1 3 5 7
2 4 6 7

How to create a dataframe with different random numbers on each column?

here the solution
each iteration you should random again to assign new value for each column

yuju= pd.DataFrame()                                                   

for i in range(1990,2020):
yuju[i]= [random.uniform(65.5,140.5) for i in range(20)]

yuju

output

          1990        1991        1992        1993        1994        1995        1996        1997  ...        
0 73.117785 104.158470 76.704672 136.295814 106.008801 88.129275 96.843800 118.172649 ... 106.08
1 77.146977 131.584449 112.781430 113.071448 118.806880 140.301281 132.196554 136.222878 ... 74.85
2 67.976294 90.571586 137.313729 126.388545 134.941530 119.544528 119.692859 124.883332 ... 82.48
3 76.577618 102.765745 137.014399 84.696234 70.087628 86.180974 121.070030 87.991356 ... 71.67
4 104.675987 134.869611 120.221701 69.652423 105.650834 107.308007 122.372708 80.037225 ... 90.58
5 107.093326 124.649323 138.961846 84.312784 98.964176 87.691698 120.426266 79.888018 ... 97.46
6 97.375159 97.607740 119.027947 77.545403 81.365235 119.204719 75.426836 132.545121 ... 120.15
7 81.099338 94.315767 123.389789 85.734648 134.746295 99.196135 65.963834 72.895016 ... 135.63
8 129.577824 118.482358 137.838454 83.338883 68.603851 138.657750 85.155046 73.311065 ... 91.12
9 129.321333 134.598491 138.810883 119.487502 75.794849 125.314185 118.499014 126.969947 ... 74.86
10 122.704160 118.282868 114.196318 69.668442 112.237553 68.953530 115.395672 114.560736 ... 88.21
11 112.653109 109.635751 78.470715 81.973892 111.413094 76.918852 76.318205 129.423737 ... 103.06
12 80.984595 136.170595 83.258407 112.248942 96.730922 84.922575 104.984614 127.646325 ... 103.24
13 82.658896 97.066191 95.096705 107.757428 93.767250 93.958438 115.113325 98.931509 ... 105.32
14 85.173060 77.257117 72.668875 87.061919 130.088992 80.001858 104.526423 85.237558 ... 87.86
15 68.428850 79.948204 107.060400 92.962859 133.393354 93.806838 99.258857 138.314982 ... 86.80
16 115.105281 110.567551 119.868457 139.482290 103.235046 128.805920 140.131489 107.568099 ... 98.16
17 71.318147 119.965667 97.135972 90.174975 125.738171 115.655945 86.333461 114.574965 ... 134.80
18 134.000260 121.417473 104.832999 129.277671 139.932955 122.623911 92.369881 109.523118 ... 137.47
19 104.444951 111.712214 130.602922 119.446700 88.256841 110.316280 74.611164 88.364896 ... 115.32

Column with random, increasing numbers in pandas

Generate all random numbers, slice it properly based on the group sizes, sort each slice, and assign back. First we need to sort the DataFrame so that assignment occurs properly.

import numpy as np
import pandas as pd

df = df.sort_values('RecordID')

arr = np.array_split(np.random.randint(1, 100, len(df)),
df.groupby('RecordID').size().cumsum()[:-1])

df['Random_Value'] = np.sort(arr, axis=1).ravel()

Output

  RecordID  number_of_days  Random_Value
0 id1 1 19
5 id1 2 41
1 id2 1 53
6 id2 2 56
2 id3 1 33
7 id3 2 68
3 id4 1 57
8 id4 2 67
4 id5 1 39
9 id5 2 49

As always, it's best to avoid groupby.apply(lambda x: ... as this is a slow loop over the groups.

N = 10000
df = pd.DataFrame({"RecordID": list(range(N))*10,
"number_of_days": np.repeat(range(10), N)})

def ALollz(df):
df = df.sort_values(['RecordID', 'number_of_days'])

arr = np.array_split(np.random.randint(1, 100, len(df)),
df.groupby('RecordID').size().cumsum()[:-1])

df['Random_Value'] = np.sort(arr, axis=1).ravel()

return df

%timeit ALollz(df)
#54 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df.assign(random_value=df.groupby('RecordID').transform(lambda x: np.sort(np.random.randint(1,100, len(x))))).sort_values('RecordID')
#15.9 s ± 124 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.groupby('RecordID').apply(lambda x: pd.Series(np.sort(np.random.randint(1,100, len(x))))).reset_index()
#1.23 s ± 25.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Dataframe filled with random numbers based on user input of columns and rows

From your code data1 is the output of random.see(56) and does not depend on user input at all. I think you want:

import numpy as np
import pandas as pd

arg1 = int(input('a number1:'))

arg2 = int(input('a number2:'))

# set the seed
np.random.seed(56)

df = pd.DataFrame(np.random.randint(0,100, size=(arg1, arg2)))
print(df)

add a different random number to every cell in a pandas dataframe

df + np.random.rand(*df.shape) / 10000.0

OR

Let's use applymap:

df = pd.DataFrame(1.0, index=[1,2,3,4,5], columns=list('ABC') )

df.applymap(lambda x: x + np.random.rand()/10000.0)

output:

                                                   A  \
1 [[1.00006953418, 1.00009164785, 1.00003177706]...
2 [[1.00007291245, 1.00004186046, 1.00006935173]...
3 [[1.00000490127, 1.0000633115, 1.00004117181],...
4 [[1.00007159622, 1.0000559506, 1.00007038891],...
5 [[1.00000980335, 1.00004760836, 1.00004214422]...

B \
1 [[1.00000320322, 1.00006981682, 1.00008912557]...
2 [[1.00007443802, 1.00009270815, 1.00007225764]...
3 [[1.00001371778, 1.00001512412, 1.00007986851]...
4 [[1.00005883343, 1.00007936509, 1.00009523334]...
5 [[1.00009329606, 1.00003174878, 1.00006187704]...

C
1 [[1.00005894836, 1.00006592776, 1.0000171843],...
2 [[1.00009085391, 1.00006606979, 1.00001755092]...
3 [[1.00009736701, 1.00007240762, 1.00004558753]...
4 [[1.00003981393, 1.00007505714, 1.00007209959]...
5 [[1.0000031608, 1.00009372917, 1.00001960112],...

How can I generate a random number for each group of values in a column in Python?

Get all unique values for Row in a separate dataframe, it will hold the rows for unique value for Row column.

import random
>>randomDF = df.drop_duplicates(ignore_index=True)
>>randomDF
Row
0 1
1 2

Now that you have unique rows, create a list of columns you want, and use numpy to generate random array of required shape, and assign it back to randomDF for the required columns.

>>import numpy as np
>>probCols = ['Prob A', 'Prob B', 'Prob C']
>>randomDF[probCols] = np.random.random((randomDF.shape[0], len(probCols)))
>>randomDF
Row Prob A Prob B Prob C
0 1 0.152064 0.391139 0.242061
1 2 0.963488 0.020088 0.710162

Now you have the required dataframe, just need to merge it back to original dataframe:

df = df.merge(randomDF, on=['Row'])

Output:

   Row    Prob A    Prob B    Prob C
0 1 0.152064 0.391139 0.242061
1 1 0.152064 0.391139 0.242061
2 2 0.963488 0.020088 0.710162
3 2 0.963488 0.020088 0.710162
4 2 0.963488 0.020088 0.710162
5 2 0.963488 0.020088 0.710162

And if you just want two digits after decimal, you can even consider wrapping random number generation inside numpy round function:

np.round(np.random.random((randomDF.shape[0], len(probCols))), 2)

In this case, output looks something like this:

   Row  Prob A  Prob B  Prob C
0 1 0.70 0.87 0.89
1 1 0.70 0.87 0.89
2 2 0.37 0.69 0.66
3 2 0.37 0.69 0.66
4 2 0.37 0.69 0.66
5 2 0.37 0.69 0.66


Related Topics



Leave a reply



Submit