How to Split a Data Frame Among Columns, Say at Every Nth Column

How do I split a data frame among columns, say at every nth column?

To split Mydata every 4 columns. We can use explicitly use split.default:

split.default(Mydata, rep(1:3, each = 4))

The "default" method can split a data frame by columns. Just set the grouping variable by your need.


For balanced grouping, gl is handy (see ?gl). We can use gl(3, 4) instead of rep(1:3, 4) in the above, which avoids type conversion from "integer" to "factor".

In general, use gl(ncol(Mydata) / n, n) for "every n columns" (n must divide ncol(Mydata)).

Subset dataframe by every n column (R)

Here's a simple way. Say you have a dataframe df with 400 columns and you want a subset of dataframes each 4 columns wide. Then create a vector that represents the first df column number of each subset dataframe. Then use lapply to generate a list of the subset dataframes:

col1 <- seq(1, 397, 4)
listdfs <- lapply(col1, function(x) df[ , x:(x+3)])

Let me know if it works.

Test case:

m <- matrix(1:120, ncol = 12)
df <- as.data.frame(m)
col1 <- seq(1, 10, 3)
listdfs <- lapply(col1, function(x) df[ , x:(x+2)])

Automatically stack every nth column of a dataframe

Here are some alternatives. No packages are used.

1) aperm Create a 3d array a, permute the dimensions and reshape into a matrix m and then convert that to a data frame. This one only works if all values are of the same type. (2) and (3) do not have this limitation.

k <- 3
nr <- nrow(DF)
nc <- ncol(DF)
unames <- unique(names(DF))

a <- array(as.matrix(DF), c(nr, k, nc/k))
m <- matrix(aperm(a, c(1, 3, 2)),, k, dimnames = list(NULL, unames))
as.data.frame(m, stringsAsFactors = FALSE)

giving:

   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8

If we are in the situation given in the question's EDIT then replace unames with the following where DF2 is DF with the revised names as per Note at end:

unames <- unique(sub("\\d*$", "", names(DF2)))

2) lapply This generalizes the code in the question. unames is defined above:

L <- lapply(split(as.list(DF), names(DF)), unlist)
as.data.frame(L, stringsAsFactors = FALSE)

giving:

   A  B  C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
4 a4 b4 c4
5 a5 b5 c5
6 a6 b6 c6
7 a7 b7 c7
8 a8 b8 c8

With the input shown in the question's EDIT it could be done like this where DF2 is given reproducibly in the Note at the end.

names0 <- sub("\\d*$", "", names(DF2))   # names without the trailing digits
L <- lapply(split(as.list(DF2), names0), unlist)
as.data.frame(L, stringsAsFactors = FALSE)

3) reshape nc and unames are from above. varying is a list with k components such as that the ith component contains the index vector c(i, i+k, ...). It seems that reshape does not like duplicated names so we have given it setNames(DF, 1:nc) as the input. This solution does have the advantage of also generating the index vectors time and id which relate the output to the input data.

varying <- split(1:nc, names(DF))
reshape(setNames(DF, 1:nc), dir = "long", varying = varying, v.names = unames)

giving:

    time  A  B  C id
1.1 1 a1 b1 c1 1
2.1 1 a2 b2 c2 2
3.1 1 a3 b3 c3 3
4.1 1 a4 b4 c4 4
1.2 2 a5 b5 c5 1
2.2 2 a6 b6 c6 2
3.2 2 a7 b7 c7 3
4.2 2 a8 b8 c8 4

With the input shown in the question's EDIT it actually simplifies. We no longer need to use setNames(DF, 1:nc) but can just use the data frame as is as input. Also, we can use varying=TRUE (also see @thelatemail's comment) instead of calculating a complex argument for varying. The input DF2 is as shown in the Note at the end and names0 is as in (2) above.

reshape(DF2, dir = "long", varying = TRUE, v.names = unique(names0))

Note:

Lines <- "      A      B      C      A      B      C 
1 a1 b1 c1 a5 b5 c5
2 a2 b2 c2 a6 b6 c6
3 a3 b3 c3 a7 b7 c7
4 a4 b4 c4 a8 b8 c8"
DF <- read.table(text = Lines, as.is = TRUE, check.names = FALSE)

DF2 <- setNames(DF, c("A1", "B1", "C1", "A2", "B2", "C2")) # test input

Upate: A number of simplifications. Also added DF2 in Note at end and discuss in each alternative how to modify the code to deal with it. (A general method might be just to reduce DF2 to DF as I discussed in the comments below.)

Split column in a Pandas Dataframe into n number of columns

Let's try it with stack + str.split + unstack + join.

The idea is to split each column by ^ and expand the split characters into a separate column. stack helps us do a single str.split on a Series object and unstack creates a DataFrame with the same index as the original.

tmp = df.stack().str.split('^', expand=True).unstack(level=1).sort_index(level=1, axis=1)
tmp.columns = [f'{y}_{x+1}' for x, y in tmp.columns]
out = df.join(tmp).dropna(how='all', axis=1).fillna('')

Output:

  column_name_1 column_name_2 column_name_1_1 column_name_1_2 column_name_1_3 column_name_1_4 column_name_2_1 column_name_2_2  
0 a^b^c^d j a b c d j
1 e^f^g k^l e f g k l
2 h^i m h i m

Split one row after every 3rd column and transport those 3 columns as a new row in r

Try

as.data.frame(matrix(unlist(df, use.names=FALSE),ncol=3, byrow=TRUE))
# V1 V2 V3
#1 1 2 3
#2 4 5 6
#3 7 8 9
#4 10 11 12

Or you could directly use matrix on df

 as.data.frame(matrix(df, ncol=3, byrow=TRUE))

How to split the column values of dataframe into multiple columns

Starting from your Dataframe :

>>> import pandas as pd

>>> df = pd.DataFrame({'PLUGS\nDESIGN\nGEAR': ['700\nDaewoo 8000 Gearless', '300\nHyundai 4400 Gearless', '600\nSTX 2600 Gearless', '200\nB170 \nGeared', '362 Wenchong 1700 Mk II \nGeared', '252\nRichMax 1550 Gearless'], },
... index = [0, 1, 2, 3, 4, 5])
>>> df
PLUGS\nDESIGN\nGEAR
0 700\nDaewoo 8000 Gearless
1 300\nHyundai 4400 Gearless
2 600\nSTX 2600 Gearless
3 200\nB170 \nGeared
4 362 Wenchong 1700 Mk II \nGeared
5 252\nRichMax 1550 Gearless

You can indeed use the split method on several separators, here \n and space:

>>> df = pd.DataFrame(df['PLUGS\nDESIGN\nGEAR'].str.split('\n| '))
PLUGS\nDESIGN\nGEAR
0 [700, Daewoo, 8000, , Gearless]
1 [300, Hyundai, 4400, , Gearless]
2 [600, STX, 2600, , Gearless]
3 [200, B170, , Geared]
4 [362, Wenchong, 1700, Mk, II, , Geared]
5 [252, RichMax, 1550, , Gearless]

Then, you can assign the first and last element to the correct column, and the rest to the DESIGN column :

>>> df['PLUGS'] = df['PLUGS\nDESIGN\nGEAR'].str[0]
>>> df['DESIGN'] = df['PLUGS\nDESIGN\nGEAR'].str[1:-1]
>>> df['GEAR'] = df['PLUGS\nDESIGN\nGEAR'].str[-1]
>>> df
PLUGS\nDESIGN\nGEAR PLUGS DESIGN GEAR
0 [700, Daewoo, 8000, , Gearless] 700 [Daewoo, 8000, ] Gearless
1 [300, Hyundai, 4400, , Gearless] 300 [Hyundai, 4400, ] Gearless
2 [600, STX, 2600, , Gearless] 600 [STX, 2600, ] Gearless
3 [200, B170, , Geared] 200 [B170, ] Geared
4 [362, Wenchong, 1700, Mk, II, , Geared] 362 [Wenchong, 1700, Mk, II, ] Geared
5 [252, RichMax, 1550, , Gearless] 252 [RichMax, 1550, ] Gearless

The last thing to do is to improve the DESIGN column to map it as a string instead of a list using the join method, and drop the PLUGS\nDESIGN\nGEAR column like so :

>>> df['DESIGN'] = df['DESIGN'].apply(lambda x: ' '.join(map(str, x)))
>>> df.drop(['PLUGS\nDESIGN\nGEAR'], axis=1)
PLUGS DESIGN GEAR
0 700 Daewoo 8000 Gearless
1 300 Hyundai 4400 Gearless
2 600 STX 2600 Gearless
3 200 B170 Geared
4 362 Wenchong 1700 Mk II Geared
5 252 RichMax 1550 Gearless

Splitting nth elements in a string in a pandas dataframe

You can use .str.split with expand=True:

df[["5th element", "7th element"]] = df["read_name"].str.split(":", expand=True)[[4, 6]].astype(int)

How to split data.frame to equal columns

Here's a way, not so pretty, but this is an ugly question :D

library(tibble)
library(dplyr)
df1 <- matrix(c(names(df),rep('',6 - ncol(df)%%6)) %>% unlist, ncol=6,byrow=T) %>% as_tibble %>% rowid_to_column()
df2 <- matrix(c(df ,rep('',6 - ncol(df)%%6)) %>% unlist, ncol=6,byrow=T) %>% as_tibble %>% rowid_to_column()
bind_rows(df1,df2) %>% arrange(rowid) %>% select(-1) %>% setNames(.[1,]) %>% slice(-1)

# # A tibble: 3 x 6
# a b c d e f
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 2 3 4 5 6
# 2 g h i j
# 3 7 8 9 10


Related Topics



Leave a reply



Submit