Find column whose name contains a specific string
Just iterate over DataFrame.columns
, now this is an example in which you will end up with a list of column names that match:
import pandas as pd
data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)
spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)
Output:
['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']
Explanation:
df.columns
returns a list of column names[col for col in df.columns if 'spike' in col]
iterates over the listdf.columns
with the variablecol
and adds it to the resulting list ifcol
contains'spike'
. This syntax is list comprehension.
If you only want the resulting data set with the columns that match you can do this:
df2 = df.filter(regex='spike')
print(df2)
Output:
spike-2 spiked-in
0 1 7
1 2 8
2 3 9
Find column whose name contains a specific value that is in a fixed column
I don't know whether you want to get the column names which contain the string you want or the columns name of the columns which have at least one value that contains the string you want.
if the dataframe is:
In [1]: import pandas as pd
...: df = pd.DataFrame({'a_1': ['b_1', 'b_2'], 'b_1': ['a_1', 'a_2']})
In [2]: df
Out[2]:
a_1 b_1
0 b_1 a_1
1 b_2 a_2
for the first case, if you want to find all the column name that match a_*
:
In [3]: import re
In [4]: columns = [col for col in df.columns if isinstance(col, str) and re.match('a_.*', col)]
In [5]: columns
Out[5]: ['a_1']
for the second case, if you want to find all the columns in which there is at least one value that match a_.*
:
In [6]: columns = [col for col, ser in df.iteritems() if ser.str.match('a_.*').any()]
In [7]: columns
Out[7]: ['b_1']
in which:
df.iteritems
: return a iterator of (column name, column values (series)) pairs.
Series.any
: return True
if any value in the series is True
.
creating new column from columns whose name contains a specific string
You can use:
# get columns with "Time" in the name
cols = list(df.filter(like='Time'))
# ['Run_Time', 'Rest_Time']
# add the value of df['Temp']
df[cols] = df[cols].add(df['Temp'], axis=0)
output:
Run_Time Temp Rest_Time
0 70 10 15
1 40 20 25
2 60 30 35
3 95 50 55
4 130 60 65
5 200 100 105
select columns based on columns names containing a specific string in pandas
alternative methods:
In [13]: df.loc[:, df.columns.str.startswith('alp')]
Out[13]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182
In [14]: df.loc[:, df.columns.str.contains('alp')]
Out[14]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182
Find column name containing a string after a punctuation
For getting list of results containing mango
after an underscore _
in your dataframe df
, you can either do
mango_list = [word for word in df.columns if '_mango' in word]
or
mango_list = [word for word in df.columns if word.split("_")[1]=="mango"]
Pandas Find name of column in which a string is found
Definitely not the best/elegant answer but it does the trick
word = 'Giraffe'
df.columns[df[df==word].notna().sum()>0][0]
returns 'Animal' as a string.
This does only work if we assume there is only one column which can contain the word.
Related Topics
Convert String in Base64 to Image and Save on Filesystem
How to Write a Python Dictionary to a CSV File
Converting Int to Bytes in Python 3
How to Install Python Packages on Windows
How to Use Subprocess Popen Python
Way to Change Google Chrome User Agent in Selenium
Add Leading Zeros to Strings in Pandas Dataframe
Lost Connection to MySQL Server During Query
Transpose Column to Row with Spark
Binning Data in Python with Scipy/Numpy
Convert Unicode to Ascii Without Errors in Python
How to Use Method Overloading in Python
How to Write Output in Same Place on the Console