Selecting Multiple Columns in Data Frame Using Partial Column Name

insert multiple columns based on column name with partial match

If Python 3.8+, then

result = pd.concat([df1[col]
if (candidate := df2.loc[:, df2.columns.str.startswith(col)]).empty
else candidate
for col in df1],
axis=1)

For each column of df1, we look for candidate columns in df2 that startswith the column name in df1. If such column(s) exist, put the candidate to the result, else keep the column in df1.

to get

  id ab? op   ab? 1 xy  cd efab? cba      efab? 1 efab? 2 lm fab? 4   fab? po
0 1 green red 1 L husband son None 1 9 England
1 2 red yellow 2 XL wife grandparent son 2 10 Scotland
2 3 blue None 3 M husband son None 3 5 Wales
3 4 None None 4 L None None None 4 3 NA

if 3.8-,

cols = []
for col in df1:
candidate = df2.loc[:, df2.columns.str.startswith(col)]
cols.append(df1[col] if candidate.empty else candidate)

result = pd.concat(cols, axis=1)

How to select DataFrame columns based on partial matching?

Your solution using map is very good. If you really want to use str.contains, it is possible to convert Index objects to Series (which have the str.contains method):

In [1]: df
Out[1]:
x y z
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9

In [2]: df.columns.to_series().str.contains('x')
Out[2]:
x True
y False
z False
dtype: bool

In [3]: df[df.columns[df.columns.to_series().str.contains('x')]]
Out[3]:
x
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9

UPDATE I just read your last paragraph. From the documentation, str.contains allows you to pass a regex by default (str.contains('^myregex'))

how to choose columns based on specific names of the columns in a dataframe

You can use grep/grepl to match column names by a pattern. If your dataframe is called df.

df[grepl('mean|std', names(df))]

Or in dplyr you can use select :

library(dplyr)
df %>% select(matches('mean|std'))

Select columns by multiple partial string match from a pandas DataFrame

You can use this 1 line expression:

   recharge_cols = [i for i in list(df) if 'rech' in i and '6' in i]

Selecting multiple columns in data frame using partial column name

You can use "|" for "or" in grep

grep("red|blue", DF, value=T)
# [1] "red_balloons" "red_balls" "blue_balls" "red_horses"

Subset Columns based on partial matching of column names in the same data frame

You could try:

v <- unique(substr(names(eatable), 0, 5))
lapply(v, function(x) eatable[grepl(x, names(eatable))])

Or using map() + select_()

library(tidyverse)
map(v, ~select_(eatable, ~matches(.)))

Which gives:

#[[1]]
# fruits_area fruits_production
#1 12 100
#2 33 250
#3 660 510
#
#[[2]]
# vegetables_area vegetable_production
#1 26 324
#2 40 580
#3 43 581

Should you want to make it into a function:

checkExpression <- function(df, l = 5) {
v <- unique(substr(names(df), 0, l))
lapply(v, function(x) df[grepl(x, names(df))])
}

Then simply use:

checkExpression(eatable, 5)

Get multiple column value based on partial matching with another column value for pandas dataframe

Give this a try I think it should be able to handle a few millions of rows.

def list_check(emails_list, email_match):
match_indexes = [i for i, s in enumerate(emails_list) if email_match in s]
return [emails_list[index] for index in match_indexes]

# Parse main_url to get domain column
df['domain'] = list(map(lambda x: x.split('//')[1], df['main_url']))

# Apply list_check to your dataframe using emails and domain columns
df['emails'] = list(map(lambda x, y: list_check(x, y), df['emails'], df['domain']))

# Drop domain column
df.drop(columns=['domain'], inplace=True)

list_check function checks whether your match string is in the emails list and gets indexes of matches, then gets values from the emails list using matched indexes and returns those values in a list.

Output:

output df

source for getting matched indexes

subset pandas df columns with partial string match OR match before ? using lists of names

You can form a dynamic regex for each df lists:

df_lists = [df1_lst, df2_lst, df3_lst]

result = [df.filter(regex=fr"\b({'|'.join(names)})\??") for names in df_lists]

e.g., for the first list, the regex is \b(ab|cd)\?? i.e. look for either ab or cd but they should be standalone from the left side (\b) and there might be an optional ? afterwards.

The desired entries are in the result list e.g.

>>> result[1]

efab? cba efab? 1 efab? 2
0 husband son None
1 wife grandparent son
2 husband son None
3 None None None

R subset data.frame by column names using partial string match from another list

# Specify `interesting.list` items manually
df[,grep("P3170|C453", x=names(df))]
#> P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1 1 3 5

# Use paste to create pattern from lots of items in `interesting.list`
il <- c("P3170", "C453")
df[,grep(paste(il, collapse = "|"), x=names(df))]
#> P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1 1 3 5

Example data:

n <- c("P3170.Tp2" , "P3189.Tn10" ,"C453.Tn7" ,"F678.Tc23" ,"P3170.Tn10")
df <- data.frame(1,2,3,4,5)
names(df) <- n
Created on 2021-10-20 by the reprex package (v2.0.1)

R: find number of columns 0 per row for a group of column names with a partial string match

First filter the data to keep only the numeric columns.

Use split.default to divide the data into groups so that you have all the 'A' columns in one group, 'B' in another and so on. Within each group return TRUE if a row has a single value which is greater than 0, sum all the values together from all the groups to get final count.

tmp <- Filter(is.numeric, df)

rowSums(sapply(split.default(tmp, sub('_.*', '', names(tmp))),
function(x) rowSums(x) > 0))

#[1] 0 1 3 3


Related Topics



Leave a reply



Submit