How to Obtain All Combinations of the Columns of a Data Frame Taken by 2

How to get all combinations of 2 from a grouped column in a data frame

You can do :

library(dplyr)

data <- input %>%
group_by(col1) %>%
summarise(col2 = t(combn(col2, 2)))
cbind(data[1], data.frame(data$col2))

# col1 X1 X2
# <dbl> <chr> <chr>
#1 1 A B
#2 1 A C
#3 1 B C
#4 2 E F

How to get all possible combination of column from the data frame?

I have made a function to do this which comes in handy whenever I need it:

make_combinations <- function(x) {

l <- length(x)
mylist <- lapply(2:l, function(y) {
combn(x, y, simplify = FALSE)
})
mylist

}

results <- make_combinations(colnames(data))
results[[1]]
# [[1]]
# [1] "s1" "s2"
#
# [[2]]
# [1] "s1" "s3"
#
# [[3]]
# [1] "s1" "s4"
#
# [[4]]
# [1] "s1" "s5"
#
# [[5]]
# [1] "s1" "s6"
#
# [[6]]
# [1] "s1" "s7"
#and so on...

The function outputs a list, where each element is another list with all the 2-way, 3-way, 4-way... combinations. In your case it has 9 elements starting from the 2-way combinations all the way to the 10-way combination.

Create column from all possible combination of two columns in dataframe based on groupby in Python

This is close what need - added all combinations and if one element per group is created tuple with same values:

from  itertools import  combinations

df = (df.groupby(['id','group'])['log']
.apply(lambda x: list(combinations(x, 2)) if len(x) > 1 else [(*x, *x)])
.explode()
.reset_index(name='comb'))
print (df)
id group comb
0 10 UU1Q (23.0, 12.0)
1 10 UU2Q (15.0, 15.0)
2 11 UU1Q (29.8, 33.0)
3 11 UU1Q (29.8, 44.0)
4 11 UU1Q (33.0, 44.0)
5 11 UU2Q (17.0, 17.0)
6 11 UU3Q (35.6, 35.6)
7 13 UU2Q (17.77, 19.9)
8 13 UU2Q (17.77, 55.0)
9 13 UU2Q (19.9, 55.0)
10 14 UU3Q (33.0, 33.0)
11 15 UU3Q (22.0, 22.0)

Or is possible create same values tuples of first rows per ['id','group'] and join to DataFrame df1 filled by combinations:

from  itertools import  combinations

df1 = (df.groupby(['id','group'])['log']
.apply(lambda x: list(combinations(x, 2)))
.explode()
.dropna()
.reset_index(name='comb'))

df2 = df.groupby(['id','group']).head(1).copy()
df2['comb'] = df2.pop('log').map(lambda x: (x,x))

df = pd.concat([df2, df1]).sort_values(['id','group'], ignore_index=True)
print (df)
id group comb
0 10 UU1Q (23.0, 23.0)
1 10 UU1Q (23.0, 12.0)
2 10 UU2Q (15.0, 15.0)
3 11 UU1Q (29.8, 29.8)
4 11 UU1Q (29.8, 33.0)
5 11 UU1Q (29.8, 44.0)
6 11 UU1Q (33.0, 44.0)
7 11 UU2Q (17.0, 17.0)
8 11 UU3Q (35.6, 35.6)
9 13 UU2Q (17.77, 17.77)
10 13 UU2Q (17.77, 19.9)
11 13 UU2Q (17.77, 55.0)
12 13 UU2Q (19.9, 55.0)
13 14 UU3Q (33.0, 33.0)
14 15 UU3Q (22.0, 22.0)

How to create multiple combinations of columns from a pandas dataframe?

If you can separate column names to lists according to their types, then your problem becomes a question of finding the Cartesian product of these lists. Once you find the Cartesian product, you can iterate over it and filter your DataFrame with a combination of column names (there are (3 choose 1) * 1 * (2 choose 1) * (2 choose 1) * (4 choose 1) * 1 = 48 of them).

A_cols = ['A1','A2','A3']
B_cols = ['B1']
C_cols = ['C1','C2']
D_cols = ['D1','D2']
E_cols = ['E1','E2','E3','E4']
F_cols = ['F1']

# column_combos is length 48
column_combos = pd.MultiIndex.from_product([A_cols,B_cols,C_cols,D_cols,E_cols,F_cols])
# out is a dictionary of 48 DataFrames
out = {';'.join(cols): df[[*cols]] for cols in column_combos}


Related Topics



Leave a reply



Submit