Create New Column Based on 4 Values in Another Column

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:

def label_race (row):
if row['eri_hispanic'] == 1 :
return 'Hispanic'
if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 :
return 'Two Or More'
if row['eri_nat_amer'] == 1 :
return 'A/I AK Native'
if row['eri_asian'] == 1:
return 'Asian'
if row['eri_afr_amer'] == 1:
return 'Black/AA'
if row['eri_hawaiian'] == 1:
return 'Haw/Pac Isl.'
if row['eri_white'] == 1:
return 'White'
return 'Other'

You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".

Next, use the apply function in pandas to apply the function - e.g.

df.apply (lambda row: label_race(row), axis=1)

Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:

0           White
1 Hispanic
2 White
3 White
4 Other
5 White
6 Two Or More
7 White
8 Haw/Pac Isl.
9 White

If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.

df['race_label'] = df.apply (lambda row: label_race(row), axis=1)

The resultant dataframe looks like this (scroll to the right to see the new column):

      lname   fname rno_cd  eri_afr_amer  eri_asian  eri_hawaiian   eri_hispanic  eri_nat_amer  eri_white rno_defined    race_label
0 MOST JEFF E 0 0 0 0 0 1 White White
1 CRUISE TOM E 0 0 0 1 0 0 White Hispanic
2 DEPP JOHNNY NaN 0 0 0 0 0 1 Unknown White
3 DICAP LEO NaN 0 0 0 0 0 1 Unknown White
4 BRANDO MARLON E 0 0 0 0 0 0 White Other
5 HANKS TOM NaN 0 0 0 0 0 1 Unknown White
6 DENIRO ROBERT E 0 1 0 0 0 1 White Two Or More
7 PACINO AL E 0 0 0 0 0 1 White White
8 WILLIAMS ROBIN E 0 0 1 0 0 0 White Haw/Pac Isl.
9 EASTWOOD CLINT E 0 0 0 0 0 1 White White

A new column in pandas which value depends on other columns

To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.

def calc_new_col(row):
if row['col2'] <= 50 & row['col3'] <= 50:
return row['col1']
else:
return max(row['col1'], row['col2'], row['col3'])

df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row

print(df)
datetime col1 col2 col3 state
2021-04-10 01:00:00 25 50 50 25
2021-04-10 02:00:00 25 50 50 25
2021-04-10 03:00:00 25 100 50 100
2021-04-10 04:00:00 50 50 100 100
2021-04-10 05:00:00 100 100 100 100

apply helps the code to be cleaner and more reusable.

Python Pandas create new column based on another column value

Use DataFrame.lookup with add col prefix to values of val_1 column:

df['mycol'] = df.lookup(df.index, 'col' + df['val_1'].astype(str))
print (df)
id col0 col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 \
0 1 0 5 -5 5 -5 0 0 1 4 3 -3
1 2 0 0 0 0 0 0 0 4 -4 0 0
2 3 0 0 1 2 3 0 0 0 5 6 0

val_1 mycol
0 1 5
1 7 4
2 9 6

How to create a new column based on a condition in another column

I was little confused with your rows numeration bacause we should have missing value on last row instead of first if we calcule for B_i basing on condition A_(i+1)-A_(i) (first row should have both, A_(i) and A_(i+1) and last row should be missing A_(i+1) value.

Anyway,basing on your example i assumed that we calculate for B_(i+1).

import pandas as pd
df = pd.DataFrame(columns=["A"],data=[5,12,14,22,20,33])
df['shifted_A'] = df['A'].shift(1) #This row can be removed - it was added only show to how shift works on final dataframe
df['B']=''
df.loc[((df['A']-df['A'].shift(1))>5) + (df['A'].shift(1)<=10), 'B']=1 #Update rows that fulfill one of conditions with 1
df.loc[(df['A']-df['A'].shift(1))<=5, 'B']=0 #Update rows that fulfill condition with 0
df.loc[df.index==0, 'B']=1 #Update first row in B column
print(df)

That prints:

    A  shifted_A  B
0 5 NaN 1
1 12 5.0 1
2 14 12.0 0
3 22 14.0 1
4 20 22.0 0
5 33 20.0 1

I am not sure if it is fastest way, but i guess it should be one of easier to understand.

Little explanation:

df.loc[mask, columnname]=newvalue allows us to update value in given column if condition (mask) is fulfilled

(df['A']-df['A'].shift(1))>5) + (df['A'].shift(1)<=10)
Each condition here returns True or False. If we added them the result is True if any of that is True (which is simply OR). In case we need AND we can multiply the conditions

Create a new column based on values in another column and another table

using pandas you can merge the two dataframes like so:

df3 = df2.merge(df1, on='COMMUNITY', how='left')

If you want to read more you can check out the documentaion

Add a column based on the value of another column in a dataframe

Generilized solution using Pandas tools

Ok it took me some time to figure it out but I wanted to find a slick answer and I kind of like this one:

import pandas as pd

data = {'A': ['Emo/3', 'Emo/4', 'Emo/1','Emo/3', '','Emo/3', 'Emo/4', 'Emo/1','Emo/3', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2'],
'Pos': ["repeat3", "repeat3", "repeat3", "repeat3", '',"repeat1", "repeat1", "repeat1", "repeat1", '', "repeat2", "repeat2","repeat2", "repeat2", '', "repeat2", "repeat2","repeat2", "repeat2"],
}
df = pd.DataFrame(data)

#First we create column B and set first 4 value that are marked as repeat3 in 'Pos' column to zero
df['B']=df['Pos'].apply(lambda x: 0 if x == "repeat3" else x)

#Then we create a boolean mask for the rows where 'Pos' is equal to repeat1
mask1=df['B'].apply(lambda x: 1 if x == "repeat1" else 0)
#Then we count how many blocks of type repeat1 we have
number_of_repeat1_blocks=int(mask1.sum()/4)
mask1=mask1.astype('bool')

#We do another mask the same for the rows where 'Pos' is equal to repeat2
mask2=df['B'].apply(lambda x: 1 if x == "repeat2" else 0).astype('bool')
#Then we count how many blocks of type repeat1 we have
number_of_repeat2_blocks=int(mask2.sum()/4)
mask2=mask2.astype('bool')

#We define the number sequence that you want to replace in each case
#For rows matchin repeat1
repl1= [1,2,3,4]*number_of_repeat1_blocks
#For rows matching repeat2
repl2= [4,3,2,1,]*number_of_repeat2_blocks

#Finally we simply replace the matched patterns
df.loc[mask1,'B'] = repl1
df.loc[mask2,'B'] = repl2

print(df)

Results:

        A      Pos  B
0 Emo/3 repeat3 0
1 Emo/4 repeat3 0
2 Emo/1 repeat3 0
3 Emo/3 repeat3 0
4
5 Emo/3 repeat1 1
6 Emo/4 repeat1 2
7 Emo/1 repeat1 3
8 Emo/3 repeat1 4
9
10 Neu/5 repeat2 4
11 Neu/2 repeat2 3
12 Neu/5 repeat2 2
13 Neu/2 repeat2 1
14
15 Neu/5 repeat2 4
16 Neu/2 repeat2 3
17 Neu/5 repeat2 2
18 Neu/2 repeat2 1

Creating a new column based on the difference of another column

groupby + diff:

df.groupby('col1').col3.transform('diff')

0 NaN
1 10.0
2 5.0
3 NaN
4 2.0
5 2.0
Name: col3, dtype: float64

Create new column based on a value of another column in a data-frame

IIUC your conditions, you just miss to extract the left part of parentID column:

pid = df.loc[df['depth'] == 2, 'parentID'].str.split('.').str[0].values

df['receiveAReply'] = 0
df.loc[df['commentID'].isin(pid), 'receiveAReply'] = 1

Output:

>>> df
commentID commentType depth parentID receiveAReply
0 58b61d1d comment 1.0 0.0 1
1 58b6393b userReply 2.0 58b61d1d.0 0
2 58b6556e comment 1.0 0.0 0
3 58b657fa userReply 3.0 58b61d1d.0 0
4 58b657fa comment 1.0 0.0 0


Related Topics



Leave a reply



Submit