Populate a panda's dataframe column based on another column and dictionary value
You can explode
"DIAGNOSES" column, get the first elements of each string using str
, map diagnoses
dictionary to get types, groupby
the index and aggregate to a list:
df['DIAGNOSES_TYPE'] = df['DIAGNOSES'].explode().str[0].map(diagnoses).groupby(level=0).apply(list)
Output:
DIAGNOSES DIAGNOSES_TYPE
0 [A03] [Arbitrary]
1 [A03, B23] [Arbitrary, Brutal]
2 [A30, B54, D65, C60] [Arbitrary, Brutal, Dropped, Cluster]
Populate Pandas Dataframe column from other columns based on a condition and previous row value
import numpy as np
df['Hlv'] = np.NaN
df.loc[df.Close>df.SMA_High,'Hlv'] = 1
df.loc[df.Close<df.SMA_Low,'Hlv'] = -1
df.fillna(method='ffill',inplace=True)
populating pandas columns based on values in other columns
1st modify your column , then using groupby
+first
df=df.replace('',np.nan)#prepare for first
df.columns=df.columns.str.replace('\d+','')
df.columns=df.columns.str.split('-').str[-1]
newdf=df.groupby(level=0,axis=1).first()
newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
newdf
Out[40]:
Address City ID State
0 6th street Mpls 1 MN
1 15th St Flint 2 MI
2 Essexb St New York 3 NY
3 7 street SE Mpls 4 MN
How to populate a column based on values from multiple columns in python?
Here is one way of doing it, probably not the most optimal, using regex. It assumes there is always one Sxx at each row. Assuming your DataFrame is data_df
:
import pandas as pd
import re
last_col = list()
for index, row in data_df.iterrows():
for cell in row.to_list():
if re.match('S[0-9]+', cell):
last_col.append(cell)
break
data_df['last_col'] = last_col
Populate two columns based on different values of other two columns
This should do what your question asks:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID':[1,1,2,2,2,2,3,4,4], 'CURRENT':list('ABCDEFGHI')})
print(df)
from collections import defaultdict
valById = defaultdict(list)
df.apply(lambda x: valById[x['ID']].append(x['CURRENT']), axis = 1)
df = pd.DataFrame([{'ID':k, 'PREVIOUS': v[i-1] if i else np.nan, 'CURRENT': v[i], 'NEXT': v[i+1] if i+1 < len(v) else np.nan} for k, v in valById.items() for i in range(len(v))])
print(df)
Output:
ID CURRENT
0 1 A
1 1 B
2 2 C
3 2 D
4 2 E
5 2 F
6 3 G
7 4 H
8 4 I
ID PREVIOUS CURRENT NEXT
0 1 NaN A B
1 1 A B NaN
2 2 NaN C D
3 2 C D E
4 2 D E F
5 2 E F NaN
6 3 NaN G NaN
7 4 NaN H I
8 4 H I NaN
How to populate a dataframe column based on condition met in another column
Your solution is possible if change ()
:
df['B'] = np.where(df['A']>2,'A1',
np.where(df['A'].between(0,2),'A2',
np.where(df['A'].between(-2,0),'A3',
np.where(df['A']<-2, 'A4',''))))
Alternative with cut
:
df['B1'] = pd.cut(df['A'], bins=(-np.inf,-2,0,2,np.inf), labels=('A4','A3','A2','A1'))
print (df)
A B B1
0 -4.0 A4 A4
1 -3.5 A4 A4
2 -2.5 A4 A4
3 -1.0 A3 A3
4 1.0 A2 A2
5 1.5 A2 A2
6 2.0 A2 A2
7 2.5 A1 A1
8 3.5 A1 A1
How to populate a dataframe column based on a lookup of other columns?
Here's one way:
df['Parent_age'] = df.Parent.map(dict(df[['Child' , 'Age']].values))
# when Parent is not in Child column, then apply get_parent_age
cond = df['Parent_age'].isnull()
df.loc[cond, 'Parent_age'] = df.loc[cond, 'Parent'].map(get_parent_age)
populate column using loop based on value in row index 0
You can try to use pd.date_range
:
# set your date column as index
df.set_index('date', inplace=True)
# generate dates for 7 days descending for periods equal to length of the dataframe
df.index = pd.date_range(start=df.index[0], freq='-7d', periods=df.shape[0])
This can be done without setting as an index as well.
df['date'] = pd.date_range(start=df.iloc[0]['date'], freq='-7d', periods=df.shape[0])
Related Topics
Making Python Dictionary from a Text File With Multiple Keys
Large File Crashing on Jupyter Notebook
Check If List Is Ascending or Descending (Using For)
Most Efficient Way to Find Mode in Numpy Array
Running Multiple Commands Simultaneously from Python
How to Get the Current Ipython/Jupyter Notebook Name
How to Iterate Through a Matrix Column in Python
How to Use Authenticated Proxy in Selenium Chromedriver
Reading Particular Cell Value from Excelsheet in Python
How to Open Excel File Fast in Python
Conda: Remove All Installed Packages from Base/Root Environment
Get Row Value of Maximum Count After Applying Group by in Pandas
Issue in Using Win32Com to Access Excel File
How to Restart Airflow Webserver
How to Sort a List of Lists by a Specific Index of the Inner List