Interleave Columns of Two Data Frames

Interleave columns of two data frames

You got the first step right, which is cbinding. Let's say your data frames are d1 with n columns A1, A2, ...,An and d1 with n columns B1, B2, ..., Bn. Then d <- cbind(d1, d2)
will give you the data frame containing all the information you wanted, and you just need to re-order its columns. This amounts to generating a vector (1, (n+1), 2, (n+2), ..., n, 2n) to index the data frame columns. You can do this as s <- rep(1:n, each = 2) + (0:1) * n. So finally, your desired data frame is d[s].

Interleave 2 Dataframes on certain columns

If columns names are same, only difference is some new columns names in one of DataFrame is possible use:

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 AB11 RU15 P1 AJ11 RU25 P2 None
2 DD2 Switch2 P3 AD2 Switch3 P2 MTP
3 AB12 RU18 P2 AB11 RU35 P2 None
4 DD3 Switch3 P5 AD3 Switch4 P6 MTP
5 AB13 RU19 P3 AB11 RU40 P4 None

Speed
0 1000.0
1 NaN
2 1000.0
3 NaN
4 1000.0
5 NaN

More general solution working for any columns names is use DataFrame.align before for prevent correct align columns for each DataFrame:

print (df1)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 Switch2 AD1 P2 MTP 1000 Switch1 DD1
1 Switch3 AD2 P2 MTP 1000 Switch2 DD2
2 Switch4 AD3 P6 MTP 1000 Switch3 DD3

StartPort
0 P1
1 P3
2 P5

print (df2)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 RU25 AJ11 P2 NaN NaN RU15 AB11
1 RU35 AB11 P2 NaN NaN RU18 AB12
2 RU40 AB11 P4 NaN NaN RU19 AB13

StartPort
0 P1
1 P2
2 P3

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=df1.columns)
print (df3)
EndDevice EndLocation EndPort LinkType Speed StartDevice StartLocation \
0 Switch2 AD1 P2 MTP 1000.0 Switch1 DD1
1 RU25 AJ11 P2 NaN NaN RU15 AB11
2 Switch3 AD2 P2 MTP 1000.0 Switch2 DD2
3 RU35 AB11 P2 NaN NaN RU18 AB12
4 Switch4 AD3 P6 MTP 1000.0 Switch3 DD3
5 RU40 AB11 P4 NaN NaN RU19 AB13

StartPort
0 P1
1 P1
2 P3
3 P2
4 P5
5 P3

Another idea with Index.union and DataFrame.reindex:

cols = df1.columns.union(df2.columns, sort=False)

df1 = df1.reindex(cols, axis=1)
df2 = df2.reindex(cols, axis=1)
print (df1)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 DD2 Switch2 P3 AD2 Switch3 P2 MTP
2 DD3 Switch3 P5 AD3 Switch4 P6 MTP

Speed
0 1000
1 1000
2 1000

print (df2)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 AB11 RU15 P1 AJ11 RU25 P2 NaN
1 AB12 RU18 P2 AB11 RU35 P2 NaN
2 AB13 RU19 P3 AB11 RU40 P4 NaN

Speed
0 NaN
1 NaN
2 NaN

df3 = pd.DataFrame(interleave([df1.values, df2.values]), columns=cols)
print (df3)
StartLocation StartDevice StartPort EndLocation EndDevice EndPort LinkType \
0 DD1 Switch1 P1 AD1 Switch2 P2 MTP
1 AB11 RU15 P1 AJ11 RU25 P2 NaN
2 DD2 Switch2 P3 AD2 Switch3 P2 MTP
3 AB12 RU18 P2 AB11 RU35 P2 NaN
4 DD3 Switch3 P5 AD3 Switch4 P6 MTP
5 AB13 RU19 P3 AB11 RU40 P4 NaN

Speed
0 1000.0
1 NaN
2 1000.0
3 NaN
4 1000.0
5 NaN

R interleave two data frames with same column names

An option is to cbind and use setcolorder on the ordered column names concatenate and then use make.unique if the intention is to identify the before/after on the duplicate column names

library(data.table)
out <- setcolorder(cbind(dt1, dt2), order(c(names(dt1), names(dt2))))[]
setnames(out, make.unique(names(out)))[]
out[, setdiff(names(dt1), names(dt2)) := NULL][]
# a.before a.after c.before c.after d
#1: 1 a 1 a 1
#2: 2 b 2 b 2
#3: 3 c 3 c 3

If we need to specifically use before/after

out <- setcolorder(cbind(dt1, dt2), order(c(names(dt1), names(dt2))))[]    
out[, setdiff(names(dt1), names(dt2)) := NULL][]
i1 <- duplicated(names(out), fromLast = TRUE)
i2 <- duplicated(names(out))
names(out)[i1] <- paste0(names(out)[i1], ".before")
names(out)[i2] <- paste0(names(out)[i2], ".after")

out
# a.before a.after c.before c.after d
#1: 1 a 1 a 1
#2: 2 b 2 b 2
#3: 3 c 3 c 3

How can I interleave rows from 2 data frames together?

Assign row numbers to each data frame independently, then bind the rows and sort/arrange by row number and data frame id. In this example, row numbers are trivial since the ids are sequential and act as row number. But in the general case, row numbers should be used.

Here's an example using dplyr:

df1 %>%
mutate(row_number = row_number()) %>%
bind_rows(df2 %>% mutate(row_number = row_number())) %>%
arrange(row_number, df)

Output:

      df    id     chr row_number
(dbl) (int) (chr) (int)
1 1 1 puppies 1
2 2 1 kitties 1
3 1 2 puppies 2
4 2 2 kitties 2
5 1 3 puppies 3
6 2 3 kitties 3
7 1 4 puppies 4
8 2 4 kitties 4
9 1 5 puppies 5
10 2 5 kitties 5

Interleaving two data.frames of data.frames in R

We need to use Map for interleaveing corresponding elements of both lists

library(gdata)
out_lst <- Map(interleave, new_list, second_list)

Or another option is map2 from purrr

library(purrr)
out_lst <- map2(new_list, second_list, interleave)

Interweave two dataframes

Using pd.concat to combine the DataFrames, and toolz.interleave reorder the columns:

from toolz import interleave

pd.concat([d1, d2], axis=1)[list(interleave([d1, d2]))]

The resulting output is as expected:

   0  3  1  4  2
a 1 0 1 0 1
b 1 0 1 0 1
c 1 0 1 0 1

Combining two dataframes with alternating column position

We can use the matrix route to bind the column names into a dim structure and then concatenate (c)

library(dplyr)
bind_cols(df1, df2) %>%
dplyr::select(all_of(c(matrix(names(.), ncol = 3, byrow = TRUE))))

-output

# A tibble: 4 × 6
b b_B a a_A c c_C
<int> <int> <int> <int> <int> <int>
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4

Interleave two columns of a data.frame

First let's create some data:

dd = data.frame(x = 1:10, y = LETTERS[1:10])

Next, we need to make sure the y column is a character and not a factor (otherwise, it will be converted to a numeric)

dd$y = as.character(dd$y)

Then we transpose the data frame and convert to a vector:

as.vector(t(dd))

However, a more pertinent question is why you would want to do this.

Pandas - Interleave / Zip two DataFrames by row

You can sort the index after concatenating and then reset the index i.e

import pandas as pd

df1 = pd.DataFrame([['a','b','c'], ['d','e','f']])
df2 = pd.DataFrame([['A','B','C'], ['D','E','F']])

concat_df = pd.concat([df1,df2]).sort_index().reset_index(drop=True)

Output :


0 1 2
0 a b c
1 A B C
2 d e f
3 D E F

EDIT (OmerB) : Incase of keeping the order regardless of the index value then.

import pandas as pd
df1 = pd.DataFrame([['a','b','c'], ['d','e','f']]).reset_index()
df2 = pd.DataFrame([['A','B','C'], ['D','E','F']]).reset_index()

concat_df = pd.concat([df1,df2]).sort_index().set_index('index')


Related Topics



Leave a reply



Submit