Get column name where value is something in pandas dataframe
Here is one, perhaps inelegant, way to do it:
df_result = pd.DataFrame(ts, columns=['value'])
Set up a function which grabs the column name which contains the value (from ts
):
def get_col_name(row):
b = (df.ix[row.name] == row['value'])
return b.index[b.argmax()]
for each row, test which elements equal the value, and extract column name of a True.
And apply
it (row-wise):
In [3]: df_result.apply(get_col_name, axis=1)
Out[3]:
1979-01-01 00:00:00 col5
1979-01-01 06:00:00 col3
1979-01-01 12:00:00 col1
1979-01-01 18:00:00 col1
i.e. use df_result['Column'] = df_result.apply(get_col_name, axis=1)
.
.
Note: there is quite a lot going on in get_col_name
so perhaps it warrants some further explanation:
In [4]: row = df_result.irow(0) # an example row to pass to get_col_name
In [5]: row
Out[5]:
value 1181.220328
Name: 1979-01-01 00:00:00
In [6]: row.name # use to get rows of df
Out[6]: <Timestamp: 1979-01-01 00:00:00>
In [7]: df.ix[row.name]
Out[7]:
col5 1181.220328
col4 912.154923
col3 648.848635
col2 390.986156
col1 138.185861
Name: 1979-01-01 00:00:00
In [8]: b = (df.ix[row.name] == row['value'])
#checks whether each elements equal row['value'] = 1181.220328
In [9]: b
Out[9]:
col5 True
col4 False
col3 False
col2 False
col1 False
Name: 1979-01-01 00:00:00
In [10]: b.argmax() # index of a True value
Out[10]: 0
In [11]: b.index[b.argmax()] # the index value (column name)
Out[11]: 'col5'
It might be there is more efficient way to do this...
Pandas selecting the column name based on row information
General solution - working if not match row
or val
:
val = 70
row = 10
val = df.reindex(index=[row]).eq(val).squeeze()
col = next(iter(val.index[val]), 'no match')
print (col)
no match
Another general solution:
def get_col(row, val):
try:
a = df.loc[row].eq(val)
c = a.index[a][0]
except KeyError:
c = 'not matched row'
except IndexError:
c = 'not matched value'
return c
print (get_col(1, 7))
name3
print (get_col(10, 7))
not matched row
print (get_col(1, 70))
not matched value
print (get_col(10, 70))
not matched row
Solution if always exist val and row values in DataFrame
, because if not exist and all Falses are returned from df.loc[row].eq(val)
then idxmax
return first False
- first column name.
val = 7
row = 1
col = df.loc[row].eq(val).idxmax()
#if want seelct by pocition use iloc
#col = df.iloc[row].eq(val).idxmax()
print (col)
name3
Explanation:
First select row by DataFrame.loc
:
print (df.loc[row])
name1 5
name2 6
name3 7
name4 8
Name: 1, dtype: int64
Then compare by eq
print (df.loc[row].eq(val))
name1 False
name2 False
name3 True
name4 False
Name: 1, dtype: bool
And last get index value of first True
by idxmax
:
print (df.loc[row].eq(val).idxmax())
name3
How to get the column name in pandas based on row values?
Using dot
df.dot(df.columns+',').str[:-1]
Out[168]:
0 id_0,id_2
1 id_0
2 id_1
3 id_0,id_1
4 id_2
5
dtype: object
Pandas get column value based on row value
Per this page:
idx, cols = pd.factorize(df['flag'])
df['COl_VAL'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Output:
>>> df
flag col1 col2 col3 col4 COl_VAL
index
A col3 1 5 6 0 6
B col2 3 2 3 4 2
C col2 2 4 6 4 4
Find column name in Pandas that contains a specific value in the row from another column
I want to search only columns A through F and find the column name for the first instance (leftmost) the value exists
You can use idxmax
on axis=1
after comparing Value
column with the slice of the datframe (using .loc[]
)
df['Value_Col'] = df.loc[:,'A':'F'].isin(df['Value']).idxmax(1)
print(df)
Date Time A B C D E F Value Value_Col
0 Jan1 1245 3.0 3.2 4.6 5.7 2.1 8.0 5.7 D
1 Jan2 1045 4.5 8.4 3.9 2.2 9.4 8.3 3.9 C
2 Jan3 1350 1.4 3.3 4.5 8.9 1.4 0.4 1.4 A
If there are chances that none of the column may contain the df['Value]
value , you can use:
m = df.loc[:,'A':'F']
df['Value_Col'] = m.isin(df['Value']).dot(m.columns).str[0]
Pandas transform dataframe to get column names based on row condition
Just use DataFrame.apply
on axis=1
, then join the columns by ,
which has the value less than or greater than the given value.
df.assign(normal_speed_t=df.apply(lambda x:','.join(x[x<100].index), axis=1),
high_speed_t=df.apply(lambda x:','.join(x[x>=100].index), axis=1)
)
OUTPUT:
speed_t1 speed_t2 speed_t3 speed_t4 normal_speed_t high_speed_t
car id
1.0 90 80 120 34 speed_t1,speed_t2,speed_t4 speed_t3
2.0 110 130 140 99 speed_t4 speed_t1,speed_t2,speed_t3
3.0 40 110 20 110 speed_t1,speed_t3 speed_t2,speed_t4
Break-Down:
assign
just lets you assign a new column with given values.apply
allows you to apply some function to the dataframes columnwise foraxis=0
, and row-wise foraxis=1
x[x<100].index
will filter the values that are less than 100, and get the index i.e. column index/column names','.join(....)
it is joining the columns that are coming from above step
Get column names of a data frame based on values from a list in pandas python
General soluion for multiple rows - tested if at least one value or if all values per columns has values from val
.
You can test membership by DataFrame.isin
and then test by DataFrame.any
or DataFrame.all
:
#added new row for see difference
print (df)
col1 col2 col3 col4 col5
0 a1 b1 c_d d1 e10
1 a1 d1 c_e f1 e10
val = ['a1', 'c_d', 'e10']
#tested membership
print (df.isin(val))
col1 col2 col3 col4 col5
0 True False True False True
1 True False False False True
#test if at least one True per column
print (df.isin(val).any())
col1 True
col2 False
col3 True
col4 False
col5 True
dtype: bool
#test if all Trues per column
print (df.isin(val).all())
col1 True
col2 False
col3 False
col4 False
col5 True
dtype: bool
names = df.columns[df.isin(val).any()]
print (names)
Index(['col1', 'col3', 'col5'], dtype='object')
names = df.columns[df.isin(val).all()]
print (names)
Index(['col1', 'col5'], dtype='object')
If DataFrame has only one row is possible seelct first row for Series
by DataFrame.iloc
and then test membership by Series.isin
:
names = df.columns[df.iloc[0].isin(val)]
EDIT: If not help upgdare to last version of pandas here is one solution for repalce all object columns with no strings to missing values:
data = [
{'id': 1, 'content': [{'values': 3}]},
{'id': 2, 'content': 'a1'},
{'id': 3, 'content': 'c_d'},
{'id': 4, 'content': np.array([4,5])}
]
df = pd.DataFrame(data)
mask1 = ~df.columns.isin(df.select_dtypes(object).columns)
mask2 = df.applymap(lambda x: isinstance(x, str))
df = df.where(mask2 | mask1)
print (df)
id content
0 1 NaN
1 2 a1
2 3 c_d
3 4 NaN
val = ['a1', 'c_d', 'e10']
print (df.isin(val))
id content
0 False False
1 False True
2 False True
3 False False
Related Topics
Using a String Variable as a Variable Name
How to Retrieve SQL Result Column Value Using Column Name in Python
How to Convert a Django Queryset into List of Dicts
How to Close a Program Using Python
Getting All Possible Combinations from a List With Duplicate Elements
How to Sum Multiple Columns in a Spark Dataframe in Pyspark
How to Delete Blank Lines from CSV File
Update Tkinter Canvas in Mainloop()
How to Create a Date Picker in Tkinter
How to Delete All Columns in Dataframe Except Certain Ones
How to Find Duration Between Two Time Difference in Python Dataframe
Python Selenium, Find Out When a Download Has Completed
How to Fill a Column With the Value of Another Column Based on a Condition on Some Other Columns
What Do Numbers Starting With 0 Mean in Python
Using a Pandas Dataframe as a Lookup Table
How to Extract List from List of Lists When Any One Element Match With Another List'S Element