How to Get the Column Name in Pandas Based on Row Values

Get column name where value is something in pandas dataframe

Here is one, perhaps inelegant, way to do it:

df_result = pd.DataFrame(ts, columns=['value'])

Set up a function which grabs the column name which contains the value (from ts):

def get_col_name(row):    
b = (df.ix[row.name] == row['value'])
return b.index[b.argmax()]

for each row, test which elements equal the value, and extract column name of a True.

And apply it (row-wise):

In [3]: df_result.apply(get_col_name, axis=1)
Out[3]:
1979-01-01 00:00:00 col5
1979-01-01 06:00:00 col3
1979-01-01 12:00:00 col1
1979-01-01 18:00:00 col1

i.e. use df_result['Column'] = df_result.apply(get_col_name, axis=1).

.

Note: there is quite a lot going on in get_col_name so perhaps it warrants some further explanation:

In [4]: row = df_result.irow(0) # an example row to pass to get_col_name

In [5]: row
Out[5]:
value 1181.220328
Name: 1979-01-01 00:00:00

In [6]: row.name # use to get rows of df
Out[6]: <Timestamp: 1979-01-01 00:00:00>

In [7]: df.ix[row.name]
Out[7]:
col5 1181.220328
col4 912.154923
col3 648.848635
col2 390.986156
col1 138.185861
Name: 1979-01-01 00:00:00

In [8]: b = (df.ix[row.name] == row['value'])
#checks whether each elements equal row['value'] = 1181.220328

In [9]: b
Out[9]:
col5 True
col4 False
col3 False
col2 False
col1 False
Name: 1979-01-01 00:00:00

In [10]: b.argmax() # index of a True value
Out[10]: 0

In [11]: b.index[b.argmax()] # the index value (column name)
Out[11]: 'col5'

It might be there is more efficient way to do this...

Pandas selecting the column name based on row information

General solution - working if not match row or val:

val = 70
row = 10

val = df.reindex(index=[row]).eq(val).squeeze()
col = next(iter(val.index[val]), 'no match')
print (col)
no match

Another general solution:

def get_col(row, val):
try:
a = df.loc[row].eq(val)
c = a.index[a][0]
except KeyError:
c = 'not matched row'
except IndexError:
c = 'not matched value'
return c

print (get_col(1, 7))
name3
print (get_col(10, 7))
not matched row
print (get_col(1, 70))
not matched value
print (get_col(10, 70))
not matched row

Solution if always exist val and row values in DataFrame, because if not exist and all Falses are returned from df.loc[row].eq(val) then idxmax return first False - first column name.

val = 7
row = 1
col = df.loc[row].eq(val).idxmax()
#if want seelct by pocition use iloc
#col = df.iloc[row].eq(val).idxmax()
print (col)
name3

Explanation:

First select row by DataFrame.loc:

print (df.loc[row])
name1 5
name2 6
name3 7
name4 8
Name: 1, dtype: int64

Then compare by eq

print (df.loc[row].eq(val))
name1 False
name2 False
name3 True
name4 False
Name: 1, dtype: bool

And last get index value of first True by idxmax:

print (df.loc[row].eq(val).idxmax())
name3

How to get the column name in pandas based on row values?

Using dot

df.dot(df.columns+',').str[:-1]
Out[168]:
0 id_0,id_2
1 id_0
2 id_1
3 id_0,id_1
4 id_2
5
dtype: object

Pandas get column value based on row value

Per this page:

idx, cols = pd.factorize(df['flag'])
df['COl_VAL'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

Output:

>>> df
flag col1 col2 col3 col4 COl_VAL
index
A col3 1 5 6 0 6
B col2 3 2 3 4 2
C col2 2 4 6 4 4

Find column name in Pandas that contains a specific value in the row from another column

I want to search only columns A through F and find the column name for the first instance (leftmost) the value exists

You can use idxmax on axis=1 after comparing Value column with the slice of the datframe (using .loc[])

df['Value_Col'] = df.loc[:,'A':'F'].isin(df['Value']).idxmax(1)
print(df)

   Date  Time    A    B    C    D    E    F  Value Value_Col
0 Jan1 1245 3.0 3.2 4.6 5.7 2.1 8.0 5.7 D
1 Jan2 1045 4.5 8.4 3.9 2.2 9.4 8.3 3.9 C
2 Jan3 1350 1.4 3.3 4.5 8.9 1.4 0.4 1.4 A

If there are chances that none of the column may contain the df['Value] value , you can use:

m = df.loc[:,'A':'F']
df['Value_Col'] = m.isin(df['Value']).dot(m.columns).str[0]

Pandas transform dataframe to get column names based on row condition

Just use DataFrame.apply on axis=1, then join the columns by , which has the value less than or greater than the given value.

df.assign(normal_speed_t=df.apply(lambda x:','.join(x[x<100].index), axis=1),
high_speed_t=df.apply(lambda x:','.join(x[x>=100].index), axis=1)
)

OUTPUT:

        speed_t1  speed_t2  speed_t3  speed_t4              normal_speed_t                high_speed_t
car id
1.0 90 80 120 34 speed_t1,speed_t2,speed_t4 speed_t3
2.0 110 130 140 99 speed_t4 speed_t1,speed_t2,speed_t3
3.0 40 110 20 110 speed_t1,speed_t3 speed_t2,speed_t4

Break-Down:

  • assign just lets you assign a new column with given values
  • .apply allows you to apply some function to the dataframes columnwise for axis=0, and row-wise for axis=1
  • x[x<100].index will filter the values that are less than 100, and get the index i.e. column index/column names
  • ','.join(....) it is joining the columns that are coming from above step

Get column names of a data frame based on values from a list in pandas python

General soluion for multiple rows - tested if at least one value or if all values per columns has values from val.

You can test membership by DataFrame.isin and then test by DataFrame.any or DataFrame.all:

#added new row for see difference
print (df)
col1 col2 col3 col4 col5
0 a1 b1 c_d d1 e10
1 a1 d1 c_e f1 e10

val = ['a1', 'c_d', 'e10']

#tested membership
print (df.isin(val))
col1 col2 col3 col4 col5
0 True False True False True
1 True False False False True

#test if at least one True per column
print (df.isin(val).any())
col1 True
col2 False
col3 True
col4 False
col5 True
dtype: bool

#test if all Trues per column
print (df.isin(val).all())
col1 True
col2 False
col3 False
col4 False
col5 True
dtype: bool

names = df.columns[df.isin(val).any()]
print (names)
Index(['col1', 'col3', 'col5'], dtype='object')

names = df.columns[df.isin(val).all()]
print (names)
Index(['col1', 'col5'], dtype='object')

If DataFrame has only one row is possible seelct first row for Series by DataFrame.iloc and then test membership by Series.isin:

names = df.columns[df.iloc[0].isin(val)]

EDIT: If not help upgdare to last version of pandas here is one solution for repalce all object columns with no strings to missing values:

data = [
{'id': 1, 'content': [{'values': 3}]},
{'id': 2, 'content': 'a1'},
{'id': 3, 'content': 'c_d'},
{'id': 4, 'content': np.array([4,5])}
]

df = pd.DataFrame(data)

mask1 = ~df.columns.isin(df.select_dtypes(object).columns)
mask2 = df.applymap(lambda x: isinstance(x, str))

df = df.where(mask2 | mask1)
print (df)
id content
0 1 NaN
1 2 a1
2 3 c_d
3 4 NaN

val = ['a1', 'c_d', 'e10']
print (df.isin(val))
id content
0 False False
1 False True
2 False True
3 False False


Related Topics



Leave a reply



Submit