Sample Random Rows in Dataframe

Random row selection in Pandas dataframe

Something like this?

import random

def some(x, n):
    return x.ix[random.sample(x.index, n)]

Note: As of Pandas v0.20.0, ix has been deprecated in favour of loc for label based indexing.

Sample random rows in dataframe

First make some data:

> df = data.frame(matrix(rnorm(20), nrow=10))
> df
           X1         X2
1   0.7091409 -1.4061361
2  -1.1334614 -0.1973846
3   2.3343391 -0.4385071
4  -0.9040278 -0.6593677
5   0.4180331 -1.2592415
6   0.7572246 -0.5463655
7  -0.8996483  0.4231117
8  -1.0356774 -0.1640883
9  -0.3983045  0.7157506
10 -0.9060305  2.3234110

Then select some rows at random:

> df[sample(nrow(df), 3), ]
           X1         X2
9  -0.3983045  0.7157506
2  -1.1334614 -0.1973846
10 -0.9060305  2.3234110

How can I select a sequence of random rows from a pandas DataFrame?

Choose a random row n and then take the n to n+5 rows

n = random.randint(0, rows_in_dataframe-5)

five_random_consecutive_rows = dataframe[n:n+5]

Random selection of a row from a pandas DataFrame with weights

You should scale the weight so it matches the expected distribution:

weights = {-1:0.1, 0:0.4, 1:0.5}

scaled_weights = (pd.Series(weights) / df.label.value_counts(normalize=True))

df.sample(n=1, weights=df.label.map(scaled_weights) )

Test distribution with 10000 samples

(df.sample(n=10000, replace=True, random_state=1,
           weights=df.label.map(scaled_weights))
   .label.value_counts(normalize=True)
)

Output:

 1    0.5060
 0    0.3979
-1    0.0961
Name: label, dtype: float64

Populate Pandas dataframe with random sample from another dataframe if condition is met, when columns to be assigned are not independent

Here's one approach:

(i) get the sample sizes from df2 with groupby + size.

(ii) use groupby + apply where we use a lambda function to sample items from df1 with the sample sizes obtained from (i) for each unique "B".

(iii) assign these sampled values to df2 (since "B" is not unique, we sorted df2 by "B" to make the rows align)

cols = ['C','D']
sample_sizes = df2.groupby('B')[cols].size()

df2 = df2.sort_values(by='B')
df2[cols] = (df1[df1['B'].isin(sample_sizes.index)]
             .groupby('B')[cols]
             .apply(lambda g: g.sample(sample_sizes[g.name], replace=True))
             .droplevel(1).reset_index(drop=True))
df2 = df2.sort_index()

One sample:

   A  B   C    D
0  5  1   5  0.6
1  5  2  10  0.7
2  6  1  12  0.6
3  6  2  11  0.5
4  6  3   4  0.1

Randomly select rows from Pandas DataFrame based on multiple criteria

I am not sure did I get the question right or not, but at least this answer will help other to give you a answer
If this is not what you are looking for, please give me shot

import pandas as pd
#your dataframe  
maindf = {'PM Owner': ['A', 'B','C','A','E','F'], 'Risk Tier': [1,3,1,1,1,2],'sam' :['A0','B0','C0','D0','E0','F0']}
Maindf = pd.DataFrame(data=maindf)
 

#what you are looking for
filterdf = {'PM Owner': ['A'  ], 'Risk Tier': [ 1 ]}
Filterdf = pd.DataFrame(data=filterdf)

 
#Filtering
NewMaindf= (Maindf[Maindf[['PM Owner','Risk Tier']].astype(str).sum(axis = 1).isin(
                Filterdf[['PM Owner','Risk Tier']].astype(str).sum(axis = 1))])
#Just one sample
print( (NewMaindf).sample())
#whole dataset after filtering
print( (NewMaindf) )

Result :

 PM Owner  Risk Tier sam
3        A          1  D0
  PM Owner  Risk Tier sam
0        A          1  A0
3        A          1  D0

How to randomly sample multiple consecutive rows of a dataframe in R?

df <- mtcars
df$row_nm <- seq(nrow(df))

set.seed(7)

sample_seq <- function(n, N) {
  i <- sample(seq(N), size = 1)
  
  ifelse(
    test = i + (seq(n) - 1) <= N,
    yes = i + (seq(n) - 1),
    no = i + (seq(n) - 1) - N
  )
}

replica <- replicate(n = 5, sample_seq(n = 10, N = nrow(df)))

# result
lapply(seq(ncol(replica)), function(x) df[replica[, x], ])
#> [[1]]
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb row_nm
#> Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4     10
#> Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4     11
#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3     12
#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3     13
#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3     14
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4     15
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4     16
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4     17
#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1     18
#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2     19
#> 
#> [[2]]
#>                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb row_nm
#> Honda Civic      30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2     19
#> Toyota Corolla   33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1     20
#> Toyota Corona    21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1     21
#> Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2     22
#> AMC Javelin      15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2     23
#> Camaro Z28       13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4     24
#> Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2     25
#> Fiat X1-9        27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1     26
#> Porsche 914-2    26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2     27
#> Lotus Europa     30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2     28
#> 
#> [[3]]
#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb row_nm
#> Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8     31
#> Volvo 142E        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2     32
#> Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4      1
#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4      2
#> Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1      3
#> Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      4
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2      5
#> Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1      6
#> Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4      7
#> Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2      8
#> 
#> [[4]]
#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb row_nm
#> Lotus Europa      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2     28
#> Ford Pantera L    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4     29
#> Ferrari Dino      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6     30
#> Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8     31
#> Volvo 142E        21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2     32
#> Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4      1
#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4      2
#> Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1      3
#> Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      4
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2      5
#> 
#> [[5]]
#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb row_nm
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4      7
#> Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2      8
#> Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2      9
#> Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4     10
#> Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4     11
#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3     12
#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3     13
#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3     14
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4     15
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4     16

^{Created on 2022-01-24 by the reprex package (v2.0.1)}

Randomly sample rows based on year-month

Use DataFrame.groupby per years and months or month periods and use custom lambda function with DataFrame.sample:

df1 = (df.groupby([df['daate'].dt.year, df['daate'].dt.month], group_keys=False)
         .apply(lambda x: x.sample(n=10)))

Or:

df1 = (df.groupby(df['daate'].dt.to_period('m'), group_keys=False)
         .apply(lambda x: x.sample(n=10)))

Sample:

data = {'daate':pd.date_range('2019-01-01', '2020-01-22'),
        'tweets':np.random.choice(["aaa", "bbb", "ccc", "ddd"], 387)
        }

df = pd.DataFrame(data)


df1 = (df.groupby([df['daate'].dt.year, df['daate'].dt.month], group_keys=False)
         .apply(lambda x: x.sample(n=10)))
print (df1)
          date tweets      daate
9   2019-01-10    bbb 2019-01-10
29  2019-01-30    ddd 2019-01-30
17  2019-01-18    ccc 2019-01-18
12  2019-01-13    ccc 2019-01-13
20  2019-01-21    ddd 2019-01-21
..         ...    ...        ...
381 2020-01-17    bbb 2020-01-17
375 2020-01-11    aaa 2020-01-11
373 2020-01-09    bbb 2020-01-09
368 2020-01-04    aaa 2020-01-04
382 2020-01-18    bbb 2020-01-18

[130 rows x 3 columns]