Shuffle Dataframe Rows

Shuffle DataFrame rows

The idiomatic way to do this with Pandas is to use the .sample method of your data frame to sample all rows without replacement:

df.sample(frac=1)

The frac keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means to return all rows (in random order).

Note:
If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.

df = df.sample(frac=1).reset_index(drop=True)

Here, specifying drop=True prevents .reset_index from creating a column containing the old index entries.

Follow-up note: Although it may not look like the above operation is in-place, python/pandas is smart enough not to do another malloc for the shuffled object. That is, even though the reference object has changed (by which I mean id(df_old) is not the same as id(df_new)), the underlying C object is still the same. To show that this is indeed the case, you could run a simple memory profiler:

$ python3 -m memory_profiler .\test.py
Filename: .\test.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     68.5 MiB     68.5 MiB   @profile
     6                             def shuffle():
     7    847.8 MiB    779.3 MiB       df = pd.DataFrame(np.random.randn(100, 1000000))
     8    847.9 MiB      0.1 MiB       df = df.sample(frac=1).reset_index(drop=True)

Trying to shuffle rows in Panda DataFrame

Something like this where you just return the shuffled df, and use pd.concat on a list of these.

sales_to_do = pd.DataFrame({'id':[1,2], 'name':['bob','mike']})

def randomize(df):
    return df.sample(frac=1)
    
df_shuffled = pd.concat([randomize(sales_to_do) for x in range(15)])

df_shuffled.to_excel(r'C:\Users\Alex\Desktop\Output1.xlsx', index=False, header=True)

Shuffle rows in dataframe by specific colum value

IIUC, you can select even indices shuffle, and add the odd indices using numpy:

import numpy as np

order = np.arange(0,len(df), 2)
np.random.shuffle(order)
order = np.vstack([order, order+1]).ravel('F')

df2 = df.iloc[order]

example output:

    Video      Frames  Feature1  Feature2  Label
2       0  frame2.jpg  feature1  feature2      0
3       0  frame3.jpg  feature1  feature2      0
0       0  frame0.jpg  feature1  feature2      0
1       0  frame1.jpg  feature1  feature2      0
6       1  frame2.jpg  feature1  feature2      1
7       1  frame3.jpg  feature1  feature2      1
8       2  frame0.jpg  feature1  feature2      0
9       2  frame1.jpg  feature1  feature2      0
10      2  frame2.jpg  feature1  feature2      0
11      2  frame3.jpg  feature1  feature2      0
4       1  frame0.jpg  feature1  feature2      1
5       1  frame1.jpg  feature1  feature2      1

Shuffle rows in a dataframe based on a condition using R

You could try this:

library(purrr)
library(tidyr)
library(dplyr)

df %>% 
  split(f = as.factor(.$ClassNr)) %>% 
  map_dfr(~sample(.x$Name)) %>% 
  pivot_longer(everything(),
               names_to = "ClassNr",
               values_to = "Name")

returning (for example)

# A tibble: 6 x 2
  ClassNr Name
  <chr>   <chr>
1 1       Ana  
2 2       Ella 
3 3       Sarah
4 1       Maria
5 2       Hanne
6 3       Liam

We first split the data into groups based on the ClassNr. That's the split-part. Now we have three lists (one list for every class).
Next we take every list and sample the elements, which is basically shuffling each list independently and bind the result together as dataframe.
Finally we bring this dataframe into a long format.

Note: This approach will most likely fail if there are different numbers of names in each class.

How to shuffle a pandas dataframe randomly by row

You can achieve this by using the sample method and apply it to axis # 1.
This will shuffle the elements in a row:

df = df.sample(frac=1, axis=1).reset_index(drop=True)

How ever your desired dataframe looks completely randomised, which can be done by shuffling by row and then by column:

df = df.sample(frac=1, axis=1).sample(frac=1).reset_index(drop=True)

Edit:

import numpy as np
df = df.apply(np.random.permutation, axis=1)

Shuffle rows of a large csv

Because you read in your data using Pandas, you can also do the randomisation in a different way using pd.sample:

df = pd.read_csv('sentiment_train.csv', header= 0, delimiter=",", usecols=[0,5])
df.columns=['target', 'text']
df1 = df.sample(n=100000)

If this fails, it might be good to check out the amount of unique values and how frequent they appear. If the first 1,599,999 are 0 and the last is only 4, then the chances are that you won't get any 4.

Pandas - How do you randomize the rows of a dataframe

You can shuffle the index if it is a number:

df = pd.DataFrame(['A','B','C','D','E','F','G','H','I','j'],columns = ['Data'])

arr = np.arange(len(df))
out = np.random.permutation(arr) # random shuffle

df.ix[out]

Shuffle Dataframe Rows