Find Column Whose Name Contains a Specific String

Find column whose name contains a specific string

Just iterate over DataFrame.columns, now this is an example in which you will end up with a list of column names that match:

import pandas as pd

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)

Output:

['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']

Explanation:

  1. df.columns returns a list of column names
  2. [col for col in df.columns if 'spike' in col] iterates over the list df.columns with the variable col and adds it to the resulting list if col contains 'spike'. This syntax is list comprehension.

If you only want the resulting data set with the columns that match you can do this:

df2 = df.filter(regex='spike')
print(df2)

Output:

   spike-2  spiked-in
0 1 7
1 2 8
2 3 9

Find column whose name contains a specific value that is in a fixed column

I don't know whether you want to get the column names which contain the string you want or the columns name of the columns which have at least one value that contains the string you want.

if the dataframe is:

In [1]: import pandas as pd 
...: df = pd.DataFrame({'a_1': ['b_1', 'b_2'], 'b_1': ['a_1', 'a_2']})
In [2]: df
Out[2]:
a_1 b_1
0 b_1 a_1
1 b_2 a_2

for the first case, if you want to find all the column name that match a_*:

In [3]: import re                                                                                                                                     
In [4]: columns = [col for col in df.columns if isinstance(col, str) and re.match('a_.*', col)]
In [5]: columns
Out[5]: ['a_1']

for the second case, if you want to find all the columns in which there is at least one value that match a_.*:

In [6]: columns = [col for col, ser in df.iteritems() if ser.str.match('a_.*').any()]                                                                 
In [7]: columns
Out[7]: ['b_1']

in which:

df.iteritems: return a iterator of (column name, column values (series)) pairs.

Series.any: return True if any value in the series is True.

creating new column from columns whose name contains a specific string

You can use:

# get columns with "Time" in the name
cols = list(df.filter(like='Time'))
# ['Run_Time', 'Rest_Time']

# add the value of df['Temp']
df[cols] = df[cols].add(df['Temp'], axis=0)

output:

   Run_Time  Temp  Rest_Time
0 70 10 15
1 40 20 25
2 60 30 35
3 95 50 55
4 130 60 65
5 200 100 105

select columns based on columns names containing a specific string in pandas

alternative methods:

In [13]: df.loc[:, df.columns.str.startswith('alp')]
Out[13]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182

In [14]: df.loc[:, df.columns.str.contains('alp')]
Out[14]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182

Find column name containing a string after a punctuation

For getting list of results containing mango after an underscore _ in your dataframe df, you can either do

mango_list = [word for word in df.columns if '_mango' in word]

or

mango_list = [word for word in df.columns if word.split("_")[1]=="mango"]

Pandas Find name of column in which a string is found

Definitely not the best/elegant answer but it does the trick

word = 'Giraffe'
df.columns[df[df==word].notna().sum()>0][0]

returns 'Animal' as a string.

This does only work if we assume there is only one column which can contain the word.



Related Topics



Leave a reply



Submit