How do I split a data frame among columns, say at every nth column?
To split Mydata
every 4 columns. We can use explicitly use split.default
:
split.default(Mydata, rep(1:3, each = 4))
The "default" method can split a data frame by columns. Just set the grouping variable by your need.
For balanced grouping, gl
is handy (see ?gl
). We can use gl(3, 4)
instead of rep(1:3, 4)
in the above, which avoids type conversion from "integer" to "factor".
In general, use gl(ncol(Mydata) / n, n)
for "every n columns" (n
must divide ncol(Mydata)
).
Subset dataframe by every n column (R)
Here's a simple way. Say you have a dataframe df
with 400 columns and you want a subset of dataframes each 4 columns wide. Then create a vector that represents the first df
column number of each subset dataframe. Then use lapply
to generate a list of the subset dataframes:
col1 <- seq(1, 397, 4)
listdfs <- lapply(col1, function(x) df[ , x:(x+3)])
Let me know if it works.
Test case:
m <- matrix(1:120, ncol = 12)
df <- as.data.frame(m)
col1 <- seq(1, 10, 3)
listdfs <- lapply(col1, function(x) df[ , x:(x+2)])
Automatically stack every nth column of a dataframe
Here are some alternatives. No packages are used.
1) aperm Create a 3d array a
, permute the dimensions and reshape into a matrix m
and then convert that to a data frame. This one only works if all values are of the same type. (2) and (3) do not have this limitation.
k <- 3
nr <- nrow(DF)
nc <- ncol(DF)
unames <- unique(names(DF))
a <- array(as.matrix(DF), c(nr, k, nc/k))
m <- matrix(aperm(a, c(1, 3, 2)),, k, dimnames = list(NULL, unames))
as.data.frame(m, stringsAsFactors = FALSE)
giving:
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8
If we are in the situation given in the question's EDIT then replace unames
with the following where DF2 is DF with the revised names as per Note at end:
unames <- unique(sub("\\d*$", "", names(DF2)))
2) lapply This generalizes the code in the question. unames
is defined above:
L <- lapply(split(as.list(DF), names(DF)), unlist)
as.data.frame(L, stringsAsFactors = FALSE)
giving:
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8
With the input shown in the question's EDIT it could be done like this where DF2
is given reproducibly in the Note at the end.
names0 <- sub("\\d*$", "", names(DF2)) # names without the trailing digits
L <- lapply(split(as.list(DF2), names0), unlist)
as.data.frame(L, stringsAsFactors = FALSE)
3) reshape nc
and unames
are from above. varying
is a list with k
components such as that the ith component contains the index vector c(i, i+k, ...)
. It seems that reshape
does not like duplicated names so we have given it setNames(DF, 1:nc)
as the input. This solution does have the advantage of also generating the index vectors time
and id
which relate the output to the input data.
varying <- split(1:nc, names(DF))
reshape(setNames(DF, 1:nc), dir = "long", varying = varying, v.names = unames)
giving:
time A B C id
1.1 1 a1 b1 c1 1
2.1 1 a2 b2 c2 2
3.1 1 a3 b3 c3 3
4.1 1 a4 b4 c4 4
1.2 2 a5 b5 c5 1
2.2 2 a6 b6 c6 2
3.2 2 a7 b7 c7 3
4.2 2 a8 b8 c8 4
With the input shown in the question's EDIT it actually simplifies. We no longer need to use setNames(DF, 1:nc)
but can just use the data frame as is as input. Also, we can use varying=TRUE
(also see @thelatemail's comment) instead of calculating a complex argument for varying
. The input DF2
is as shown in the Note at the end and names0
is as in (2) above.
reshape(DF2, dir = "long", varying = TRUE, v.names = unique(names0))
Note:
Lines <- " A B C A B C
1 a1 b1 c1 a5 b5 c5
2 a2 b2 c2 a6 b6 c6
3 a3 b3 c3 a7 b7 c7
4 a4 b4 c4 a8 b8 c8"
DF <- read.table(text = Lines, as.is = TRUE, check.names = FALSE)
DF2 <- setNames(DF, c("A1", "B1", "C1", "A2", "B2", "C2")) # test input
Upate: A number of simplifications. Also added DF2
in Note at end and discuss in each alternative how to modify the code to deal with it. (A general method might be just to reduce DF2 to DF as I discussed in the comments below.)
Split column in a Pandas Dataframe into n number of columns
Let's try it with stack
+ str.split
+ unstack
+ join
.
The idea is to split each column by ^
and expand the split characters into a separate column. stack
helps us do a single str.split
on a Series object and unstack
creates a DataFrame with the same index as the original.
tmp = df.stack().str.split('^', expand=True).unstack(level=1).sort_index(level=1, axis=1)
tmp.columns = [f'{y}_{x+1}' for x, y in tmp.columns]
out = df.join(tmp).dropna(how='all', axis=1).fillna('')
Output:
column_name_1 column_name_2 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_4 column_name_2_1 column_name_2_2
0 a^b^c^d j a b c d j
1 e^f^g k^l e f g k l
2 h^i m h i m
Split one row after every 3rd column and transport those 3 columns as a new row in r
Try
as.data.frame(matrix(unlist(df, use.names=FALSE),ncol=3, byrow=TRUE))
# V1 V2 V3
#1 1 2 3
#2 4 5 6
#3 7 8 9
#4 10 11 12
Or you could directly use matrix
on df
as.data.frame(matrix(df, ncol=3, byrow=TRUE))
How to split the column values of dataframe into multiple columns
Starting from your Dataframe :
>>> import pandas as pd
>>> df = pd.DataFrame({'PLUGS\nDESIGN\nGEAR': ['700\nDaewoo 8000 Gearless', '300\nHyundai 4400 Gearless', '600\nSTX 2600 Gearless', '200\nB170 \nGeared', '362 Wenchong 1700 Mk II \nGeared', '252\nRichMax 1550 Gearless'], },
... index = [0, 1, 2, 3, 4, 5])
>>> df
PLUGS\nDESIGN\nGEAR
0 700\nDaewoo 8000 Gearless
1 300\nHyundai 4400 Gearless
2 600\nSTX 2600 Gearless
3 200\nB170 \nGeared
4 362 Wenchong 1700 Mk II \nGeared
5 252\nRichMax 1550 Gearless
You can indeed use the split
method on several separators, here \n
and space
:
>>> df = pd.DataFrame(df['PLUGS\nDESIGN\nGEAR'].str.split('\n| '))
PLUGS\nDESIGN\nGEAR
0 [700, Daewoo, 8000, , Gearless]
1 [300, Hyundai, 4400, , Gearless]
2 [600, STX, 2600, , Gearless]
3 [200, B170, , Geared]
4 [362, Wenchong, 1700, Mk, II, , Geared]
5 [252, RichMax, 1550, , Gearless]
Then, you can assign the first and last element to the correct column, and the rest to the DESIGN
column :
>>> df['PLUGS'] = df['PLUGS\nDESIGN\nGEAR'].str[0]
>>> df['DESIGN'] = df['PLUGS\nDESIGN\nGEAR'].str[1:-1]
>>> df['GEAR'] = df['PLUGS\nDESIGN\nGEAR'].str[-1]
>>> df
PLUGS\nDESIGN\nGEAR PLUGS DESIGN GEAR
0 [700, Daewoo, 8000, , Gearless] 700 [Daewoo, 8000, ] Gearless
1 [300, Hyundai, 4400, , Gearless] 300 [Hyundai, 4400, ] Gearless
2 [600, STX, 2600, , Gearless] 600 [STX, 2600, ] Gearless
3 [200, B170, , Geared] 200 [B170, ] Geared
4 [362, Wenchong, 1700, Mk, II, , Geared] 362 [Wenchong, 1700, Mk, II, ] Geared
5 [252, RichMax, 1550, , Gearless] 252 [RichMax, 1550, ] Gearless
The last thing to do is to improve the DESIGN
column to map it as a string instead of a list using the join
method, and drop the PLUGS\nDESIGN\nGEAR
column like so :
>>> df['DESIGN'] = df['DESIGN'].apply(lambda x: ' '.join(map(str, x)))
>>> df.drop(['PLUGS\nDESIGN\nGEAR'], axis=1)
PLUGS DESIGN GEAR
0 700 Daewoo 8000 Gearless
1 300 Hyundai 4400 Gearless
2 600 STX 2600 Gearless
3 200 B170 Geared
4 362 Wenchong 1700 Mk II Geared
5 252 RichMax 1550 Gearless
Splitting nth elements in a string in a pandas dataframe
You can use .str.split
with expand=True
:
df[["5th element", "7th element"]] = df["read_name"].str.split(":", expand=True)[[4, 6]].astype(int)
How to split data.frame to equal columns
Here's a way, not so pretty, but this is an ugly question :D
library(tibble)
library(dplyr)
df1 <- matrix(c(names(df),rep('',6 - ncol(df)%%6)) %>% unlist, ncol=6,byrow=T) %>% as_tibble %>% rowid_to_column()
df2 <- matrix(c(df ,rep('',6 - ncol(df)%%6)) %>% unlist, ncol=6,byrow=T) %>% as_tibble %>% rowid_to_column()
bind_rows(df1,df2) %>% arrange(rowid) %>% select(-1) %>% setNames(.[1,]) %>% slice(-1)
# # A tibble: 3 x 6
# a b c d e f
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 2 3 4 5 6
# 2 g h i j
# 3 7 8 9 10
Related Topics
Rscript Could Not Find Function
Error in If/While (Condition):Argument Is Not Interpretable as Logical
Let Ggplot2 Histogram Show Classwise Percentages on Y Axis
Store Arrangegrob to Object, Does Not Create Printable Object
Match Two Columns with Two Other Columns
Dygraph in R Multiple Plots at Once
Ggplot and R: Two Variables Over Time
Extract Hyperlink from Excel File in R
Group Vector on Conditional Sum
Write.Csv() a List of Unequally Sized Data.Frames
Enclosing Variables Within for Loop
The Rolling Regression in R Using Roll Apply
R: Compare All the Columns Pairwise in Matrix
How to Replace Certain Values in a Specific Rows and Columns with Na in R
Documentation for Special Variables in Ggplot (..Count.., ..Density.., etc.)