Making Pairs of Words Based on One Column

Making pairs of words based on one column

in two steps

$ sort -k2 file > file.s
$ join -j2 file.s{,} | awk '!(a[$2,$3]++ + a[$3,$2]++){print $2,$3,$1}'

A C ID.1
A D ID.1
C D ID.1
B E ID.2

How to select sub-strings based on the presence of word pairs? Python

I believe there is a bug in your code.

else:
    return ''

This means if the 1st comparison is not a match, 'func' will return immediately. That might be why the code does not return any matches.

A sample working code is below:

# The function seems to loop over all r's but only over the first b:
def func(sentence, first_twos=b):
    for first_two in first_twos:
        if first_two in sentence:
            s = sentence[sentence.index(first_two):]
            return s
    return ''

df['Segments'] = a.apply(func)

And the output:

df:   
{   
'First2': ['can I', 'should it', 'what does'],   
'Segments': ['what does it say? ', 'should it say more?', ''],   
'Sentence': ['If this is a string what does it say? ', 'And this is a string, should it say more?', 'This is yet another string.  '  ]  
}

Combining columns and count combinations (pairs)

If we always write the earliest letter in the alphabet first in the pair := assignment, the code will produce the desired result. We'll use ifelse() to decide whether to write V1 before V2 as follows.

library(data.table)
set.seed(126)
dt <- data.table(V1 = sample(LETTERS[1:4], 30, replace = T),
                 V2 = sample(LETTERS[1:4], 30, replace = T))

# adjusted version where first letter always < second letter

#Exclude rows with the same name 
dt <- dt[V1 != V2]

#Create pairs by combining V1 and V2
dt[, pair := ifelse(V1 < V2,paste(V1, V2, sep="_"), paste(V2, V1, sep = "_"))]

#Count the pairs 
dt[, .N, by=.(pair)]

...and the output:

> #Count the pairs 
> dt[, .N, by=.(pair)]
   pair N
1:  A_C 3
2:  B_C 9
3:  C_D 5
4:  A_B 4
5:  B_D 3
6:  A_D 1
>

Trying to match strings from multiple columns and create pair list where matches are found

Based on the update, we may filter after splitting the column in 'df1', then create a sequence index and reshape to 'long' format

library(dplyr)
library(tidyr)
df1  %>% 
  separate(values, into = c('values1', 'values2')) %>% 
  filter(if_all(everything(), ~ .x %in% df2$values)) %>%
  mutate(paired = row_number()) %>% 
  pivot_longer(cols = -paired, values_to = 'value', names_to = NULL) %>%
  select(value, paired)

-output

# A tibble: 6 × 2
  value   paired
  <chr>    <int>
1 apples       1
2 x            1
3 oranges      2
4 z            2
5 bananas      3
6 y            3

How generate all pairs of values, from the result of a groupby, in a pandas dataframe

Its simple use itertools combinations inside apply and stack i.e

from itertools import combinations
ndf = df.groupby('ID')['words'].apply(lambda x : list(combinations(x.values,2)))
                          .apply(pd.Series).stack().reset_index(level=0,name='words')

 ID           words
0   1  (word1, word2)
1   1  (word1, word3)
2   1  (word2, word3)
0   2  (word4, word5)
0   3  (word6, word7)
1   3  (word6, word8)
2   3  (word6, word9)
3   3  (word7, word8)
4   3  (word7, word9)
5   3  (word8, word9)

To match you exact output further we have to do

sdf = pd.concat([ndf['ID'],ndf['words'].apply(pd.Series)],1).set_axis(['ID','WordsA','WordsB'],1,inplace=False)

   ID WordsA WordsB
0   1  word1  word2
1   1  word1  word3
2   1  word2  word3
0   2  word4  word5
0   3  word6  word7
1   3  word6  word8
2   3  word6  word9
3   3  word7  word8
4   3  word7  word9
5   3  word8  word9

To convert it to a one line we can do :

combo = df.groupby('ID')['words'].apply(combinations,2)\
                     .apply(list).apply(pd.Series)\
                     .stack().apply(pd.Series)\
                     .set_axis(['WordsA','WordsB'],1,inplace=False)\
                     .reset_index(level=0)

Creating a 'Rough Match' Function

You don't need VBA for this. Enter this in D1 as an array formula with ctrl-shift-enter:

=SUM(COUNTIF(A1,"*"&B1:C1&"*"))>0

The asterisks are wildcards, and the array formula, in effect, loops through each cell in B1:C1. So the formula says to count the instances of B1 or C1, preceded and followed by any text, found in A1.

I need to create unique word pairs either in R or excel

(You don't really need second list to do that, one is enough)

cities  <- list("London", "Paris", "Kyiv", "Geneva", "Tokyo")

combn(cities, 2, paste, collapse = "-")

# [1] "London-Paris"  "London-Kyiv"   "London-Geneva" "London-Tokyo"  "Paris-Kyiv"   
# [6] "Paris-Geneva"  "Paris-Tokyo"   "Kyiv-Geneva"   "Kyiv-Tokyo"    "Geneva-Tokyo"

Making Pairs of Words Based on One Column