Create unique identifier from the interchangeable combination of two variables
You could do:
labels <- apply(df[, c("col1", "col2")], 1, sort)
df$id <- as.numeric(factor(apply(labels, 2, function(x) paste(x, collapse=""))))
Get Unique List of Combinations of Strings, regardless of Order
df$grp <- interaction(do.call(pmin, df[1:2]), do.call(pmax, df[1:2]))
df
# col1 col2 grp
# 1 a b a.b
# 2 c d c.d
# 3 g h g.h
# 4 d c c.d
# 5 e f e.f
# 6 b a a.b
# 7 f e e.f
# 8 h g g.h
If you want numbers, you can then do
df$grp <- as.integer(df$grp)
df
# col1 col2 grp
# 1 a b 1
# 2 c d 6
# 3 g h 16
# 4 d c 6
# 5 e f 11
# 6 b a 1
# 7 f e 11
# 8 h g 16
R - make a unique list of two columns interchangeably
Here is a base R way.
inx <- !duplicated(t(apply(df, 1, sort)))
df[inx, ]
One-liner:
df[!duplicated(t(apply(df, 1, sort))), ]
# col1 col2
#1 a 1
#3 bar foo
SQL to select unique combination of two columns having interchangeable values
Something like this usually works:
Select distinct case when dep < arr then dep else arr end as col1,
case when dep < arr then arr else dep end as col2
From flights
is there a way to group by two variables which interchange in R
We can create two new variables based on pmin/pmax
to get the group_indices
library(dplyr)
df %>%
mutate(ID_new = pmin(ID, ID2), ID2_new = pmax(ID, ID2)) %>%
mutate(group = group_indices(., ID_new, ID2_new)) %>%
select(-ends_with('new'))
# ID ID2 group
#1 102 167 1
#2 102 167 1
#3 167 102 1
#4 143 148 2
#5 143 148 2
#6 148 143 2
#7 148 143 2
In the devel
version of dplyr
, we can use cur_group_id
after creating a group
library(stringr)
df %>%
group_by(grp = str_c(pmin(ID, ID2), pmax(ID, ID2))) %>%
mutate(group = cur_group_id()) %>%
ungroup %>%
select(-grp)
Postgresql enforce unique two-way combination of columns
A variation on Neil's solution which doesn't need an extension is:
create table friendz (
from_id int,
to_id int
);
create unique index ifriendz on friendz(greatest(from_id,to_id), least(from_id,to_id));
Neil's solution lets you use an arbitrary number of columns though.
We're both relying on using expressions to build the index which is documented
https://www.postgresql.org/docs/current/indexes-expressional.html
Add unique constraint to combination of two columns
Once you have removed your duplicate(s):
ALTER TABLE dbo.yourtablename
ADD CONSTRAINT uq_yourtablename UNIQUE(column1, column2);
or
CREATE UNIQUE INDEX uq_yourtablename
ON dbo.yourtablename(column1, column2);
Of course, it can often be better to check for this violation first, before just letting SQL Server try to insert the row and returning an exception (exceptions are expensive).
Performance impact of different error handling techniques
Checking for potential constraint violations before entering TRY/CATCH
If you want to prevent exceptions from bubbling up to the application, without making changes to the application, you can use an INSTEAD OF
trigger:
CREATE TRIGGER dbo.BlockDuplicatesYourTable
ON dbo.YourTable
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
IF NOT EXISTS (SELECT 1 FROM inserted AS i
INNER JOIN dbo.YourTable AS t
ON i.column1 = t.column1
AND i.column2 = t.column2
)
BEGIN
INSERT dbo.YourTable(column1, column2, ...)
SELECT column1, column2, ... FROM inserted;
END
ELSE
BEGIN
PRINT 'Did nothing.';
END
END
GO
But if you don't tell the user they didn't perform the insert, they're going to wonder why the data isn't there and no exception was reported.
EDIT here is an example that does exactly what you're asking for, even using the same names as your question, and proves it. You should try it out before assuming the above ideas only treat one column or the other as opposed to the combination...
USE tempdb;
GO
CREATE TABLE dbo.Person
(
ID INT IDENTITY(1,1) PRIMARY KEY,
Name NVARCHAR(32),
Active BIT,
PersonNumber INT
);
GO
ALTER TABLE dbo.Person
ADD CONSTRAINT uq_Person UNIQUE(PersonNumber, Active);
GO
-- succeeds:
INSERT dbo.Person(Name, Active, PersonNumber)
VALUES(N'foo', 1, 22);
GO
-- succeeds:
INSERT dbo.Person(Name, Active, PersonNumber)
VALUES(N'foo', 0, 22);
GO
-- fails:
INSERT dbo.Person(Name, Active, PersonNumber)
VALUES(N'foo', 1, 22);
GO
Data in the table after all of this:
ID Name Active PersonNumber
---- ------ ------ ------------
1 foo 1 22
2 foo 0 22
Error message on the last insert:
Msg 2627, Level 14, State 1, Line 3
Violation of UNIQUE KEY constraint 'uq_Person'. Cannot insert duplicate key in object 'dbo.Person'.
The statement has been terminated.
Also I blogged more recently about a solution to applying a unique constraint to two columns in either order:
- Enforce a Unique Constraint Where Order Does Not Matter
permutations with unique values
class unique_element:
def __init__(self,value,occurrences):
self.value = value
self.occurrences = occurrences
def perm_unique(elements):
eset=set(elements)
listunique = [unique_element(i,elements.count(i)) for i in eset]
u=len(elements)
return perm_unique_helper(listunique,[0]*u,u-1)
def perm_unique_helper(listunique,result_list,d):
if d < 0:
yield tuple(result_list)
else:
for i in listunique:
if i.occurrences > 0:
result_list[d]=i.value
i.occurrences-=1
for g in perm_unique_helper(listunique,result_list,d-1):
yield g
i.occurrences+=1
a = list(perm_unique([1,1,2]))
print(a)
result:
[(2, 1, 1), (1, 2, 1), (1, 1, 2)]
EDIT (how this works):
I rewrote the above program to be longer but more readable.
I usually have a hard time explaining how something works, but let me try.
In order to understand how this works, you have to understand a similar but simpler program that would yield all permutations with repetitions.
def permutations_with_replacement(elements,n):
return permutations_helper(elements,[0]*n,n-1)#this is generator
def permutations_helper(elements,result_list,d):
if d<0:
yield tuple(result_list)
else:
for i in elements:
result_list[d]=i
all_permutations = permutations_helper(elements,result_list,d-1)#this is generator
for g in all_permutations:
yield g
This program is obviously much simpler:
d stands for depth in permutations_helper and has two functions. One function is the stopping condition of our recursive algorithm, and the other is for the result list that is passed around.
Instead of returning each result, we yield it. If there were no function/operator yield
we would have to push the result in some queue at the point of the stopping condition. But this way, once the stopping condition is met, the result is propagated through all stacks up to the caller. That is the purpose offor g in perm_unique_helper(listunique,result_list,d-1): yield g
so each result is propagated up to caller.
Back to the original program:
we have a list of unique elements. Before we can use each element, we have to check how many of them are still available to push onto result_list. Working with this program is very similar to permutations_with_replacement
. The difference is that each element cannot be repeated more times than it is in perm_unique_helper.
Related Topics
Given a Set of Random Numbers Drawn from a Continuous Univariate Distribution, Find the Distribution
Using Predict with a List of Lm() Objects
Ggplot2 Theme with No Axes or Grid
Subtract a Constant Vector from Each Row in a Matrix in R
Multiple Colour Scales in One Stacked Bar Plot Using Ggplot
Create Sections Through a Loop with Knitr
Applying the Same Factor Levels to Multiple Variables in an R Data Frame
Dynamically Adjust Height And/Or Width of Shiny-Plotly Output Based on Window Size
When Using Ggplot in R, How to Remove Margins Surrounding the Plot Area
Combining Pivoted Rows in R by Common Value
Can't Change Fonts in Ggplot/Geom_Text
How to Use Outlier Tests in R Code
How to Preserve Base Data Frame Rownames Upon Filtering in Dplyr Chain
How to Draw Gridlines Using Abline() That Are Behind the Data
In R, How to Subset a Data.Frame by Values from Another Data.Frame