R: Shuffle dataframe columnwise
You might want to just sample the column-names. Something like:
names(df) <- names(df)[sample(ncol(df))]
How to shuffle a dataframe column wise, but independent of rows?
something like:
t(apply(df1, 1, function(x) { sample(x, length(x)) } ))
This will give you the result in matrix form. If you have factors, a mix of numeric and characters etc, be aware that this will coerce everything to character.
reshuffle the sequence of rows in data frame
If you want to sample (but keep) the same order of the rows then you can just sample the rows.
df <- data.frame(x=1:8, y=1:8, z=1:8)
df[sample(1:nrow(df)),]
which will produce
x y z
2 2 2 2
3 3 3 3
4 4 4 4
6 6 6 6
5 5 5 5
8 8 8 8
7 7 7 7
1 1 1 1
If you rows should be sampled individually for each row then you can do something like
lapply(df, function(x) { sample(x)})
which results in
$x
[1] 3 1 4 6 5 2 8 7
$y
[1] 2 5 6 3 4 8 7 1
$z
[1] 6 1 8 3 2 7 4 5
How to permute a dataframe columnwise with paird colume in R?
We get the sample
on the sequence of rows, and use that as row index to modify the values of 'C2', 'C3'
i1 <- sample(seq_len(nrow(df1)))
df1[c("C2", "C3")] <- df1[i1, c("C2", "C3")]
-output
df1
# C1 C2 C3 C4
#R1 a 0 27 8
#R2 b 1 15 5
#R3 c 1 39 2
#R4 d 0 30 1
#R5 e 1 10 4
data
df1 <- structure(list(C1 = c("a", "b", "c", "d", "e"), C2 = c(1L, 0L,
1L, 0L, 1L), C3 = c(15L, 30L, 10L, 27L, 39L), C4 = c(8L, 5L,
2L, 1L, 4L)), class = "data.frame", row.names = c("R1", "R2",
"R3", "R4", "R5"))
Random change the order of rows in a data frame
We can use sample
df$name[sample(nrow(df))]
Shuffle DataFrame rows
The idiomatic way to do this with Pandas is to use the .sample
method of your data frame to sample all rows without replacement:
df.sample(frac=1)
The frac
keyword argument specifies the fraction of rows to return in the random sample, so frac=1
means to return all rows (in random order).
Note:
If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.
df = df.sample(frac=1).reset_index(drop=True)
Here, specifying drop=True
prevents .reset_index
from creating a column containing the old index entries.
Follow-up note: Although it may not look like the above operation is in-place, python/pandas is smart enough not to do another malloc for the shuffled object. That is, even though the reference object has changed (by which I mean id(df_old)
is not the same as id(df_new)
), the underlying C object is still the same. To show that this is indeed the case, you could run a simple memory profiler:
$ python3 -m memory_profiler .\test.py
Filename: .\test.py
Line # Mem usage Increment Line Contents
================================================
5 68.5 MiB 68.5 MiB @profile
6 def shuffle():
7 847.8 MiB 779.3 MiB df = pd.DataFrame(np.random.randn(100, 1000000))
8 847.9 MiB 0.1 MiB df = df.sample(frac=1).reset_index(drop=True)
Randomizing/Shuffling rows in a dataframe in pandas
Edit: I misunderstood the question, which was just to shuffle rows and not all the table (right?)
I think using dataframes does not make lots of sense, because columns names become useless. So you can just use 2D numpy arrays :
In [1]: A
Out[1]:
array([[11, 'Blue', 'Mon'],
[8, 'Red', 'Tues'],
[10, 'Green', 'Wed'],
[15, 'Yellow', 'Thurs'],
[11, 'Black', 'Fri']], dtype=object)
In [2]: _ = [np.random.shuffle(i) for i in A] # shuffle in-place, so return None
In [3]: A
Out[3]:
array([['Mon', 11, 'Blue'],
[8, 'Tues', 'Red'],
['Wed', 10, 'Green'],
['Thurs', 15, 'Yellow'],
[11, 'Black', 'Fri']], dtype=object)
And if you want to keep dataframe :
In [4]: pd.DataFrame(A, columns=data.columns)
Out[4]:
Number color day
0 Mon 11 Blue
1 8 Tues Red
2 Wed 10 Green
3 Thurs 15 Yellow
4 11 Black Fri
Here a function to shuffle rows and columns:
import numpy as np
import pandas as pd
def shuffle(df):
col = df.columns
val = df.values
shape = val.shape
val_flat = val.flatten()
np.random.shuffle(val_flat)
return pd.DataFrame(val_flat.reshape(shape),columns=col)
In [2]: data
Out[2]:
Number color day
0 11 Blue Mon
1 8 Red Tues
2 10 Green Wed
3 15 Yellow Thurs
4 11 Black Fri
In [3]: shuffle(data)
Out[3]:
Number color day
0 Fri Wed Yellow
1 Thurs Black Red
2 Green Blue 11
3 11 8 10
4 Mon Tues 15
Hope this helps
Related Topics
R - How to Get Row & Column Subscripts of Matched Elements from a Distance Matrix
Block-Diagonal Binding of Matrices
Data.Frame Without Ruining Column Names
Finding Out Which Functions Are Called Within a Given Function
Finding the Maximum Value for Each Row Among 3 Columns in R
Differencebetween [ ] and [[ ]] in R
Fill Na in a Time Series Only to a Limited Number
Using Different Scales as Fill Based on Factor
Render Dropdown for Single Column in Dt Shiny
How to Scrape Tables Inside a Comment Tag in HTML with R
Number of Significant Digits in Dplyr Summarise
How to Show Only Part of the Plot Area of Polar Ggplot with Facet
Install Rtools on R Version 3.0.2
How to Reorder a Legend in Ggplot2
How to Produce Different Geom_Vline in Different Facets in R