How to Randomize (Or Permute) a Dataframe Rowwise and Columnwise

R: Shuffle dataframe columnwise

You might want to just sample the column-names. Something like:

names(df) <- names(df)[sample(ncol(df))]

How to shuffle a dataframe column wise, but independent of rows?

something like:

t(apply(df1, 1, function(x) { sample(x, length(x)) } ))

This will give you the result in matrix form. If you have factors, a mix of numeric and characters etc, be aware that this will coerce everything to character.

reshuffle the sequence of rows in data frame

If you want to sample (but keep) the same order of the rows then you can just sample the rows.

df <- data.frame(x=1:8, y=1:8, z=1:8)
df[sample(1:nrow(df)),]

which will produce

  x y z
2 2 2 2
3 3 3 3
4 4 4 4
6 6 6 6
5 5 5 5
8 8 8 8
7 7 7 7
1 1 1 1

If you rows should be sampled individually for each row then you can do something like

lapply(df, function(x) { sample(x)})

which results in

$x
[1] 3 1 4 6 5 2 8 7

$y
[1] 2 5 6 3 4 8 7 1

$z
[1] 6 1 8 3 2 7 4 5

How to permute a dataframe columnwise with paird colume in R?

We get the sample on the sequence of rows, and use that as row index to modify the values of 'C2', 'C3'

i1 <- sample(seq_len(nrow(df1)))
df1[c("C2", "C3")] <- df1[i1, c("C2", "C3")]

-output

df1
# C1 C2 C3 C4
#R1 a 0 27 8
#R2 b 1 15 5
#R3 c 1 39 2
#R4 d 0 30 1
#R5 e 1 10 4

data

df1 <- structure(list(C1 = c("a", "b", "c", "d", "e"), C2 = c(1L, 0L, 
1L, 0L, 1L), C3 = c(15L, 30L, 10L, 27L, 39L), C4 = c(8L, 5L,
2L, 1L, 4L)), class = "data.frame", row.names = c("R1", "R2",
"R3", "R4", "R5"))

Random change the order of rows in a data frame

We can use sample

df$name[sample(nrow(df))]

Shuffle DataFrame rows

The idiomatic way to do this with Pandas is to use the .sample method of your data frame to sample all rows without replacement:

df.sample(frac=1)

The frac keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means to return all rows (in random order).


Note:
If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.

df = df.sample(frac=1).reset_index(drop=True)

Here, specifying drop=True prevents .reset_index from creating a column containing the old index entries.

Follow-up note: Although it may not look like the above operation is in-place, python/pandas is smart enough not to do another malloc for the shuffled object. That is, even though the reference object has changed (by which I mean id(df_old) is not the same as id(df_new)), the underlying C object is still the same. To show that this is indeed the case, you could run a simple memory profiler:

$ python3 -m memory_profiler .\test.py
Filename: .\test.py

Line # Mem usage Increment Line Contents
================================================
5 68.5 MiB 68.5 MiB @profile
6 def shuffle():
7 847.8 MiB 779.3 MiB df = pd.DataFrame(np.random.randn(100, 1000000))
8 847.9 MiB 0.1 MiB df = df.sample(frac=1).reset_index(drop=True)

Randomizing/Shuffling rows in a dataframe in pandas

Edit: I misunderstood the question, which was just to shuffle rows and not all the table (right?)

I think using dataframes does not make lots of sense, because columns names become useless. So you can just use 2D numpy arrays :

In [1]: A
Out[1]:
array([[11, 'Blue', 'Mon'],
[8, 'Red', 'Tues'],
[10, 'Green', 'Wed'],
[15, 'Yellow', 'Thurs'],
[11, 'Black', 'Fri']], dtype=object)

In [2]: _ = [np.random.shuffle(i) for i in A] # shuffle in-place, so return None

In [3]: A
Out[3]:
array([['Mon', 11, 'Blue'],
[8, 'Tues', 'Red'],
['Wed', 10, 'Green'],
['Thurs', 15, 'Yellow'],
[11, 'Black', 'Fri']], dtype=object)

And if you want to keep dataframe :

In [4]: pd.DataFrame(A, columns=data.columns)
Out[4]:
Number color day
0 Mon 11 Blue
1 8 Tues Red
2 Wed 10 Green
3 Thurs 15 Yellow
4 11 Black Fri

Here a function to shuffle rows and columns:

import numpy as np
import pandas as pd

def shuffle(df):
col = df.columns
val = df.values
shape = val.shape
val_flat = val.flatten()
np.random.shuffle(val_flat)
return pd.DataFrame(val_flat.reshape(shape),columns=col)

In [2]: data
Out[2]:
Number color day
0 11 Blue Mon
1 8 Red Tues
2 10 Green Wed
3 15 Yellow Thurs
4 11 Black Fri

In [3]: shuffle(data)
Out[3]:
Number color day
0 Fri Wed Yellow
1 Thurs Black Red
2 Green Blue 11
3 11 8 10
4 Mon Tues 15

Hope this helps



Related Topics



Leave a reply



Submit