How to get all combinations of 2 from a grouped column in a data frame
You can do :
library(dplyr)
data <- input %>%
group_by(col1) %>%
summarise(col2 = t(combn(col2, 2)))
cbind(data[1], data.frame(data$col2))
# col1 X1 X2
# <dbl> <chr> <chr>
#1 1 A B
#2 1 A C
#3 1 B C
#4 2 E F
How to get all possible combination of column from the data frame?
I have made a function to do this which comes in handy whenever I need it:
make_combinations <- function(x) {
l <- length(x)
mylist <- lapply(2:l, function(y) {
combn(x, y, simplify = FALSE)
})
mylist
}
results <- make_combinations(colnames(data))
results[[1]]
# [[1]]
# [1] "s1" "s2"
#
# [[2]]
# [1] "s1" "s3"
#
# [[3]]
# [1] "s1" "s4"
#
# [[4]]
# [1] "s1" "s5"
#
# [[5]]
# [1] "s1" "s6"
#
# [[6]]
# [1] "s1" "s7"
#and so on...
The function outputs a list, where each element is another list with all the 2-way, 3-way, 4-way... combinations. In your case it has 9 elements starting from the 2-way combinations all the way to the 10-way combination.
Create column from all possible combination of two columns in dataframe based on groupby in Python
This is close what need - added all combinations and if one element per group is created tuple with same values:
from itertools import combinations
df = (df.groupby(['id','group'])['log']
.apply(lambda x: list(combinations(x, 2)) if len(x) > 1 else [(*x, *x)])
.explode()
.reset_index(name='comb'))
print (df)
id group comb
0 10 UU1Q (23.0, 12.0)
1 10 UU2Q (15.0, 15.0)
2 11 UU1Q (29.8, 33.0)
3 11 UU1Q (29.8, 44.0)
4 11 UU1Q (33.0, 44.0)
5 11 UU2Q (17.0, 17.0)
6 11 UU3Q (35.6, 35.6)
7 13 UU2Q (17.77, 19.9)
8 13 UU2Q (17.77, 55.0)
9 13 UU2Q (19.9, 55.0)
10 14 UU3Q (33.0, 33.0)
11 15 UU3Q (22.0, 22.0)
Or is possible create same values tuples of first rows per ['id','group'] and join to DataFrame df1
filled by combinations:
from itertools import combinations
df1 = (df.groupby(['id','group'])['log']
.apply(lambda x: list(combinations(x, 2)))
.explode()
.dropna()
.reset_index(name='comb'))
df2 = df.groupby(['id','group']).head(1).copy()
df2['comb'] = df2.pop('log').map(lambda x: (x,x))
df = pd.concat([df2, df1]).sort_values(['id','group'], ignore_index=True)
print (df)
id group comb
0 10 UU1Q (23.0, 23.0)
1 10 UU1Q (23.0, 12.0)
2 10 UU2Q (15.0, 15.0)
3 11 UU1Q (29.8, 29.8)
4 11 UU1Q (29.8, 33.0)
5 11 UU1Q (29.8, 44.0)
6 11 UU1Q (33.0, 44.0)
7 11 UU2Q (17.0, 17.0)
8 11 UU3Q (35.6, 35.6)
9 13 UU2Q (17.77, 17.77)
10 13 UU2Q (17.77, 19.9)
11 13 UU2Q (17.77, 55.0)
12 13 UU2Q (19.9, 55.0)
13 14 UU3Q (33.0, 33.0)
14 15 UU3Q (22.0, 22.0)
How to create multiple combinations of columns from a pandas dataframe?
If you can separate column names to lists according to their types, then your problem becomes a question of finding the Cartesian product of these lists. Once you find the Cartesian product, you can iterate over it and filter your DataFrame with a combination of column names (there are (3 choose 1) * 1 * (2 choose 1) * (2 choose 1) * (4 choose 1) * 1 = 48
of them).
A_cols = ['A1','A2','A3']
B_cols = ['B1']
C_cols = ['C1','C2']
D_cols = ['D1','D2']
E_cols = ['E1','E2','E3','E4']
F_cols = ['F1']
# column_combos is length 48
column_combos = pd.MultiIndex.from_product([A_cols,B_cols,C_cols,D_cols,E_cols,F_cols])
# out is a dictionary of 48 DataFrames
out = {';'.join(cols): df[[*cols]] for cols in column_combos}
Related Topics
R Shiny: Multiple Use in UI of Same Renderui in Server
How to Custom or Display Modebar in Plotly
R Shiny - Checkboxes and Action Button Combination Issue
Choose Specific Number with Probability
How to Extract Text from R's Help Command
Object Not Found Error with Ggplot2 When Adding Shape Aesthetic
Warnings When Running an Lmer in R
Add Multiple Curves/Functions to One Ggplot Through Looping
Sum Columns by Group (Row Names) in a Matrix
Drawing Minor Ticks (Not Grid Ticks) in Ggplot2 in a Date Format Axis
How to Render Custom Map Tiles Created with Gdal2Tiles in Leaflet for R
How to Select Dropdown Box Using Rselenium
Select a Sequence of Columns: ':' Works But Not 'Seq'
The Difference Between & and && in R
How to Plot a Boxplot with Correctly Spaced Continuous X-Axis Values in Ggplot2