insert multiple columns based on column name with partial match
If Python 3.8+, then
result = pd.concat([df1[col]
if (candidate := df2.loc[:, df2.columns.str.startswith(col)]).empty
else candidate
for col in df1],
axis=1)
For each column of df1
, we look for candidate
columns in df2
that startswith
the column name in df1
. If such column(s) exist, put the candidate to the result, else keep the column in df1
.
to get
id ab? op ab? 1 xy cd efab? cba efab? 1 efab? 2 lm fab? 4 fab? po
0 1 green red 1 L husband son None 1 9 England
1 2 red yellow 2 XL wife grandparent son 2 10 Scotland
2 3 blue None 3 M husband son None 3 5 Wales
3 4 None None 4 L None None None 4 3 NA
if 3.8-,
cols = []
for col in df1:
candidate = df2.loc[:, df2.columns.str.startswith(col)]
cols.append(df1[col] if candidate.empty else candidate)
result = pd.concat(cols, axis=1)
How to select DataFrame columns based on partial matching?
Your solution using map
is very good. If you really want to use str.contains, it is possible to convert Index objects to Series (which have the str.contains
method):
In [1]: df
Out[1]:
x y z
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
In [2]: df.columns.to_series().str.contains('x')
Out[2]:
x True
y False
z False
dtype: bool
In [3]: df[df.columns[df.columns.to_series().str.contains('x')]]
Out[3]:
x
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
UPDATE I just read your last paragraph. From the documentation, str.contains
allows you to pass a regex by default (str.contains('^myregex')
)
how to choose columns based on specific names of the columns in a dataframe
You can use grep
/grepl
to match column names by a pattern. If your dataframe is called df
.
df[grepl('mean|std', names(df))]
Or in dplyr
you can use select
:
library(dplyr)
df %>% select(matches('mean|std'))
Select columns by multiple partial string match from a pandas DataFrame
You can use this 1 line expression:
recharge_cols = [i for i in list(df) if 'rech' in i and '6' in i]
Selecting multiple columns in data frame using partial column name
You can use "|" for "or" in grep
grep("red|blue", DF, value=T)
# [1] "red_balloons" "red_balls" "blue_balls" "red_horses"
Subset Columns based on partial matching of column names in the same data frame
You could try:
v <- unique(substr(names(eatable), 0, 5))
lapply(v, function(x) eatable[grepl(x, names(eatable))])
Or using map()
+ select_()
library(tidyverse)
map(v, ~select_(eatable, ~matches(.)))
Which gives:
#[[1]]
# fruits_area fruits_production
#1 12 100
#2 33 250
#3 660 510
#
#[[2]]
# vegetables_area vegetable_production
#1 26 324
#2 40 580
#3 43 581
Should you want to make it into a function:
checkExpression <- function(df, l = 5) {
v <- unique(substr(names(df), 0, l))
lapply(v, function(x) df[grepl(x, names(df))])
}
Then simply use:
checkExpression(eatable, 5)
Get multiple column value based on partial matching with another column value for pandas dataframe
Give this a try I think it should be able to handle a few millions of rows.
def list_check(emails_list, email_match):
match_indexes = [i for i, s in enumerate(emails_list) if email_match in s]
return [emails_list[index] for index in match_indexes]
# Parse main_url to get domain column
df['domain'] = list(map(lambda x: x.split('//')[1], df['main_url']))
# Apply list_check to your dataframe using emails and domain columns
df['emails'] = list(map(lambda x, y: list_check(x, y), df['emails'], df['domain']))
# Drop domain column
df.drop(columns=['domain'], inplace=True)
list_check
function checks whether your match string is in the emails list and gets indexes of matches, then gets values from the emails list using matched indexes and returns those values in a list.
Output:
source for getting matched indexes
subset pandas df columns with partial string match OR match before ? using lists of names
You can form a dynamic regex for each df lists:
df_lists = [df1_lst, df2_lst, df3_lst]
result = [df.filter(regex=fr"\b({'|'.join(names)})\??") for names in df_lists]
e.g., for the first list, the regex is \b(ab|cd)\??
i.e. look for either ab
or cd
but they should be standalone from the left side (\b
) and there might be an optional ?
afterwards.
The desired entries are in the result
list e.g.
>>> result[1]
efab? cba efab? 1 efab? 2
0 husband son None
1 wife grandparent son
2 husband son None
3 None None None
R subset data.frame by column names using partial string match from another list
# Specify `interesting.list` items manually
df[,grep("P3170|C453", x=names(df))]
#> P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1 1 3 5
# Use paste to create pattern from lots of items in `interesting.list`
il <- c("P3170", "C453")
df[,grep(paste(il, collapse = "|"), x=names(df))]
#> P3170.Tp2 C453.Tn7 P3170.Tn10
#> 1 1 3 5
Example data:
n <- c("P3170.Tp2" , "P3189.Tn10" ,"C453.Tn7" ,"F678.Tc23" ,"P3170.Tn10")
df <- data.frame(1,2,3,4,5)
names(df) <- n
Created on 2021-10-20 by the reprex package (v2.0.1)
R: find number of columns 0 per row for a group of column names with a partial string match
First filter the data to keep only the numeric columns.
Use split.default
to divide the data into groups so that you have all the 'A'
columns in one group, 'B'
in another and so on. Within each group return TRUE
if a row has a single value which is greater than 0, sum
all the values together from all the groups to get final count.
tmp <- Filter(is.numeric, df)
rowSums(sapply(split.default(tmp, sub('_.*', '', names(tmp))),
function(x) rowSums(x) > 0))
#[1] 0 1 3 3
Related Topics
How to Create a Dropdown List in a Shiny Table Using Datatable When Editing the Table
Why Isn't the R Function Sink() Writing a Summary Output to My Results File
How to Sort a Vector of Alphanumeric Values Using Lexical Ordering in R
Dataframe Is Subseted by Row Number and Not by Cell Value After Clicking on Dt::Datatable
Manual Simulation of Markov Chain in R
Place 1 Heatmap on Another with Transparency in R
Scraping JavaScript Generated Data
Extracting HTML Table from a Website in R
Take the Subsets of a Data.Frame with the Same Feature and Select a Single Row from Each Subset
Rolling by Group in Data.Table R
R: Pivoting Using 'Spread' Function
R - Converting Posixct to Milliseconds
Ggplot2: How to Rotate a Graph in a Specific Angle