Creating a New Column Based on If-Elif-Else Condition

Creating a new column based on if-elif-else condition

To formalize some of the approaches laid out above:

Create a function that operates on the rows of your dataframe like so:

def f(row):
if row['A'] == row['B']:
val = 0
elif row['A'] > row['B']:
val = 1
else:
val = -1
return val

Then apply it to your dataframe passing in the axis=1 option:

In [1]: df['C'] = df.apply(f, axis=1)

In [2]: df
Out[2]:
A B C
a 2 2 0
b 3 1 1
c 1 3 -1

Of course, this is not vectorized so performance may not be as good when scaled to a large number of records. Still, I think it is much more readable. Especially coming from a SAS background.

Edit

Here is the vectorized version

df['C'] = np.where(
df['A'] == df['B'], 0, np.where(
df['A'] > df['B'], 1, -1))

Creating a new column based on if-elif-else condition with np.where in python

You can use pandas.cut to map your values.

NB. for clarity, I divided all values by 1 million.

# list of bins used to categorize the values
bins = [0, 50, 500, 2000, 4000, 6000, float('inf')]
# matching factors to map ]0-50] -> 1 ; ]50-500] -> 0.9, etc.
factors = [1, 0.9, 0.8, 0.5, 0.25, 0]

# get the factors and convert to float
f = pd.cut(df['Quantity Total Price'], bins=bins, labels=factors).astype(float)

# use the factors in numerical operation
df['Result'] = df['Rate']*f+df['Rate1']*(1-f)

output:

   Quantity Total Price  Rate  Rate1  Result
0 20 15 14.5 15.000
1 100 15 14.5 14.950
2 700 15 14.5 14.900
3 3000 15 14.5 14.750
4 5000 15 14.5 14.625
5 7000 15 14.5 14.500

used input:

df = pd.DataFrame({'Quantity Total Price': [20, 100, 700, 3000, 5000, 7000],
'Rate': [15]*6, 'Rate1': [14.5]*6})

Error while creating a new column based on if-elif-else condition

In your data conditions are not list of conditions, because is used boolean indexing, so get list of DataFrames.

So for list of conditions remove df_sup[ and last ]:

conditions = [
(df_sup['Date']>='2016-11-14') & (df_sup['Date']<'2016-11-21'),
(df_sup['Date']>='2016-11-21') & (df_sup['Date']<'2016-11-28'),
(df_sup['Date']>='2016-11-28') & (df_sup['Date']<'2016-12-05'),
(df_sup['Date']>='2016-12-05') & (df_sup['Date']<'2016-12-12'),
(df_sup['Date']>='2016-12-12') & (df_sup['Date']<'2016-12-19')
]

Create new column in pandas dataframe based on if/elif/and functions

The issue maybe due to the way you are using it. I don't know if it will help you. but I have re written the code as per my knowledge that is working.

import pandas as pd

rate_2006, rate_2007 = 100, 200

c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}

df1 = pd.DataFrame(c)
print(df1)

def new_col(value):
if df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2006:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
elif df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2007:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
elif df1.loc[value,"region"] == "b" and df1.loc[value,"year"] == 2006:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
else:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007

for value in range(len(df1)):
new_col(value)

Creating New Column based on condition on Other Column in Pandas DataFrame

import pandas as pd 

# initialize list of lists
data = [[1,'High School',7.884], [2,'Bachelors',6.952], [3,'High School',8.185], [4,'High School',6.556],[5,'Bachelors',6.347],[6,'Master',6.794]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['ID', 'Education', 'Score'])

df['Labels'] = ['Bad' if x<7.000 else 'Good' if 7.000<=x<8.000 else 'Very Good' for x in df['Score']]
df

ID Education Score Labels
0 1 High School 7.884 Good
1 2 Bachelors 6.952 Bad
2 3 High School 8.185 Very Good
3 4 High School 6.556 Bad
4 5 Bachelors 6.347 Bad
5 6 Master 6.794 Bad


Related Topics



Leave a reply



Submit