Finding unique combinations irrespective of position
Maybe something like that
indx <- !duplicated(t(apply(df, 1, sort))) # finds non - duplicates in sorted rows
df[indx, ] # selects only the non - duplicates according to that index
# a b c
# 1 1 2 3
# 3 3 1 4
Get Unique List of Combinations of Strings, regardless of Order
df$grp <- interaction(do.call(pmin, df[1:2]), do.call(pmax, df[1:2]))
df
# col1 col2 grp
# 1 a b a.b
# 2 c d c.d
# 3 g h g.h
# 4 d c c.d
# 5 e f e.f
# 6 b a a.b
# 7 f e e.f
# 8 h g g.h
If you want numbers, you can then do
df$grp <- as.integer(df$grp)
df
# col1 col2 grp
# 1 a b 1
# 2 c d 6
# 3 g h 16
# 4 d c 6
# 5 e f 11
# 6 b a 1
# 7 f e 11
# 8 h g 16
Count unique combinations regardless of column order
Another solution, using .groupby
:
x = (
df1.groupby(df1.apply(lambda x: tuple(sorted(x)), axis=1))
.agg(A=("A", "first"), B=("B", "first"), count=("B", "size"))
.reset_index(drop=True)
)
print(x)
Prints:
A B count
0 cat bunny 1
1 bunny mouse 2
2 dog cat 3
3 mouse dog 1
Counting unique combinations of values across multiple columns regardless of order?
Assuming the character /
doesn't show up in any of the offer names, you can do:
select count(distinct offer_combo) as distinct_offers
from (
select listagg(offer, '/') within group (order by offer) as offer_combo
from (
select customer_id, offer_1 as offer from t
union all select customer_id, offer_2 from t
union all select customer_id, offer_3 from t
) x
group by customer_id
) y
Result:
DISTINCT_OFFERS
---------------
2
See running example at db<>fiddle.
Get unique combinations of elements from a python list
You need itertools.combinations
:
>>> from itertools import combinations
>>> L = [1, 2, 3, 4]
>>> [",".join(map(str, comb)) for comb in combinations(L, 3)]
['1,2,3', '1,2,4', '1,3,4', '2,3,4']
How to get every unique digit combination regardless of digit placement
Here's a code I wrote once:
function kPn(k, values, repetition) {
var retVal=[];
var n=(Array.isArray(values))?n=values.length:values;
var list=[];
for(var i=0;i<n;i++) {
list.push(i);
retVal.push([i]);
}
for(var i=2;i<=k;i++) {
var tempRetVal=[];
for(var rv=0;rv<retVal.length;rv++)
for(var l=0;l<list.length;l++) {
if(repetition||!(retVal[rv].includes(list[l]))) {
var retValItem=retVal[rv].slice();
retValItem.push(list[l]);
tempRetVal.push(retValItem);
}
}
retVal=tempRetVal;
}
if(!Array.isArray(values)) values=list;
var permutations=retVal;
var retVal=[];
for(var i=0;i<permutations.length;i++) {
tempSet=[];
for(var j=0;j<permutations[i].length;j++)
tempSet.push(values[permutations[i][j]]);
retVal.push(tempSet);
}
return retVal;
}
k: how many values you want,
values: array of values, and
repetition: true|flase.
example:
kPn(3, ["a","b","c"], false);
returns:
(6) [Array(3), Array(3), Array(3), Array(3), Array(3), Array(3)]
0: (3) ["a", "b", "c"]
1: (3) ["a", "c", "b"]
2: (3) ["b", "a", "c"]
3: (3) ["b", "c", "a"]
4: (3) ["c", "a", "b"]
5: (3) ["c", "b", "a"]
length: 6
__proto__: Array(0)
Creating a df of unique combinations of columns in R where order doesn't matter
A base R method is to create all the combination of political_spectrum_values
taking 3 at a time using expand.grid
, sort
them by row and select unique rows.
df <- expand.grid(first_person = political_spectrum_values,
second_person = political_spectrum_values,
third_person = political_spectrum_values)
df[] <- t(apply(df, 1, sort))
unique(df)
If needed as a single string
unique(apply(df, 1, function(x) paste0(sort(x), collapse = "_")))
Create unique combinations regardless of subset size
You could use a recursive function to "brute force" the packing combinations and get the best fit out of those:
def pack(sizes,bound,subset=[]):
if not sizes: # all sizes used
yield [subset] # return current subset
return
if sizes and not subset: # start new subset
i,m = max(enumerate(sizes),key=lambda s:s[1])
subset = [m] # using largest size
sizes = sizes[:i]+sizes[i+1:] # (to avoid repeats)
used = sum(subset)
for i,size in enumerate(sizes): # add to current subset
if subset and size>subset[-1]: # non-increasing order
continue # (to avoid repeats)
if used + size <= bound:
yield from pack(sizes[:i]+sizes[i+1:],bound,subset+[size])
if sizes:
for p in pack(sizes,bound): # add more subsets
yield [subset,*p]
def bestFit(sizes,bound):
packs = pack(sizes,bound)
return min(packs,key = lambda p : bound*len(p)-sum(sizes))
output:
for p in pack([1,2,3,4,5],8):
print(p,8*len(p)-sum(map(sum,p)))
[[5, 1], [4], [3, 2]] 9
[[5, 2, 1], [4, 3]] 1
[[5, 2], [4, 3, 1]] 1
[[5, 2], [4], [3, 1]] 9
[[5, 3], [4, 2, 1]] 1
[[5, 3], [4], [2, 1]] 9
[[5], [4, 1], [3, 2]] 9
[[5], [4, 2], [3, 1]] 9
[[5], [4, 3], [2, 1]] 9
[[5], [4], [3, 2, 1]] 9
[[5], [4], [3], [2, 1]] 17
print(*bestFit([1,2,3,4,5],8))
# [5, 2, 1] [4, 3]
print(*bestFit([1,2,3,4,5,6,7,8,9],18))
# [9, 1] [8, 4, 3, 2] [7, 6, 5]
This will take exponentially longer as your list of sizes gets larger but it may be enough if you only have very small inputs
Related Topics
Knitr Inline Chunk Options (No Evaluation) or Just Render Highlighted Code
Split a File Path into Folder Names Vector
Add Columns to a Reactive Data Frame in Shiny and Update Them
Is There a Fast Parser for Date
As.Posixct Gives an Unexpected Timezone
Maintaining an Input/Output Log in R
Na Matches Na, But Is Not Equal to Na. Why
Assigning and Removing Objects in a Loop: Eval(Parse(Paste(
Find the Nearest X,Y Coordinate Using R
R Reshape2 'Aggregation Function Missing: Defaulting to Length'