Making pairs of words based on one column
in two steps
$ sort -k2 file > file.s
$ join -j2 file.s{,} | awk '!(a[$2,$3]++ + a[$3,$2]++){print $2,$3,$1}'
A C ID.1
A D ID.1
C D ID.1
B E ID.2
How to select sub-strings based on the presence of word pairs? Python
I believe there is a bug in your code.
else:
return ''
This means if the 1st comparison is not a match, 'func' will return immediately. That might be why the code does not return any matches.
A sample working code is below:
# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
for first_two in first_twos:
if first_two in sentence:
s = sentence[sentence.index(first_two):]
return s
return ''
df['Segments'] = a.apply(func)
And the output:
df:
{
'First2': ['can I', 'should it', 'what does'],
'Segments': ['what does it say? ', 'should it say more?', ''],
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string. ' ]
}
Combining columns and count combinations (pairs)
If we always write the earliest letter in the alphabet first in the pair :=
assignment, the code will produce the desired result. We'll use ifelse()
to decide whether to write V1
before V2
as follows.
library(data.table)
set.seed(126)
dt <- data.table(V1 = sample(LETTERS[1:4], 30, replace = T),
V2 = sample(LETTERS[1:4], 30, replace = T))
# adjusted version where first letter always < second letter
#Exclude rows with the same name
dt <- dt[V1 != V2]
#Create pairs by combining V1 and V2
dt[, pair := ifelse(V1 < V2,paste(V1, V2, sep="_"), paste(V2, V1, sep = "_"))]
#Count the pairs
dt[, .N, by=.(pair)]
...and the output:
> #Count the pairs
> dt[, .N, by=.(pair)]
pair N
1: A_C 3
2: B_C 9
3: C_D 5
4: A_B 4
5: B_D 3
6: A_D 1
>
Trying to match strings from multiple columns and create pair list where matches are found
Based on the update, we may filter
after splitting the column in 'df1', then create a sequence index and reshape to 'long' format
library(dplyr)
library(tidyr)
df1 %>%
separate(values, into = c('values1', 'values2')) %>%
filter(if_all(everything(), ~ .x %in% df2$values)) %>%
mutate(paired = row_number()) %>%
pivot_longer(cols = -paired, values_to = 'value', names_to = NULL) %>%
select(value, paired)
-output
# A tibble: 6 × 2
value paired
<chr> <int>
1 apples 1
2 x 1
3 oranges 2
4 z 2
5 bananas 3
6 y 3
How generate all pairs of values, from the result of a groupby, in a pandas dataframe
Its simple use itertools combinations inside apply and stack i.e
from itertools import combinations
ndf = df.groupby('ID')['words'].apply(lambda x : list(combinations(x.values,2)))
.apply(pd.Series).stack().reset_index(level=0,name='words')
ID words
0 1 (word1, word2)
1 1 (word1, word3)
2 1 (word2, word3)
0 2 (word4, word5)
0 3 (word6, word7)
1 3 (word6, word8)
2 3 (word6, word9)
3 3 (word7, word8)
4 3 (word7, word9)
5 3 (word8, word9)
To match you exact output further we have to do
sdf = pd.concat([ndf['ID'],ndf['words'].apply(pd.Series)],1).set_axis(['ID','WordsA','WordsB'],1,inplace=False)
ID WordsA WordsB
0 1 word1 word2
1 1 word1 word3
2 1 word2 word3
0 2 word4 word5
0 3 word6 word7
1 3 word6 word8
2 3 word6 word9
3 3 word7 word8
4 3 word7 word9
5 3 word8 word9
To convert it to a one line we can do :
combo = df.groupby('ID')['words'].apply(combinations,2)\
.apply(list).apply(pd.Series)\
.stack().apply(pd.Series)\
.set_axis(['WordsA','WordsB'],1,inplace=False)\
.reset_index(level=0)
Creating a 'Rough Match' Function
You don't need VBA for this. Enter this in D1 as an array formula with ctrl-shift-enter:
=SUM(COUNTIF(A1,"*"&B1:C1&"*"))>0
The asterisks are wildcards, and the array formula, in effect, loops through each cell in B1:C1
. So the formula says to count the instances of B1 or C1, preceded and followed by any text, found in A1.
I need to create unique word pairs either in R or excel
(You don't really need second list to do that, one is enough)
cities <- list("London", "Paris", "Kyiv", "Geneva", "Tokyo")
combn(cities, 2, paste, collapse = "-")
# [1] "London-Paris" "London-Kyiv" "London-Geneva" "London-Tokyo" "Paris-Kyiv"
# [6] "Paris-Geneva" "Paris-Tokyo" "Kyiv-Geneva" "Kyiv-Tokyo" "Geneva-Tokyo"
Related Topics
Write Script to Create Multiple Users with Pre-Defined Passwords
Error While Running Parallel Make
How to Include Debug Information with Nasm
Linux: Triggering Shell Command on File Save
How to Do Simple Arithmetic in Sed Addresses
Using Command Substitution Inside a Sed Script, with Arguments
Docker: Permission Denied to Local MySQL Volume
Notify Gpio Interrupt to User Space from a Kernel Module
Conversion from Ebcdic to Utf8 in Linux
Lapack/Blas/Openblas Proper Installation from Source - Replace System Libraries with New Ones
How to Change The Desktop Wallpaper on Linux from Within a Shell/Bash Script
Postgres Copy Command, Binary File
How to Disable Floating Point Unit (Fpu)
Cron Job Mysteriously Stopped Running
Getting Root Privileges in Ansible