Replace a Value in a Data Frame Based on a Conditional ('If') Statement

Replace a value in a data frame based on a conditional (`if`) statement

Easier to convert nm to characters and then make the change:

junk$nm <- as.character(junk$nm)
junk$nm[junk$nm == "B"] <- "b"

EDIT: And if indeed you need to maintain nm as factors, add this in the end:

junk$nm <- as.factor(junk$nm)

How to conditionally replace values in r data frame using if/then statement

You can use ifelse, like this:

df$customer_id <- ifelse(df$customer %in% c('paramount', 'pixar'), 99, df$customer_id)

The syntax is simple:

ifelse(condition, result if TRUE, result if FALSE)

This is vectorized, so you can use it on a dataframe column.

Replace a value in a data frame based on a conditional statement

You need to mutate:

library(dplyr)
gm %>%
mutate(continent = ifelse(country == "Bahamas", "S America", continent))

Conditional replacement of a string in data frame

You need to use the ifelse() function.

DF$ID <- ifelse(DF$INT == 1,  gsub("^9", "8", DF$ID), DF$ID)

Using dplyr:

DF %>% 
mutate(ID=ifelse(INT==1,gsub("^9","8",ID),ID))

This will run the gsub on the rows where DF$INT == 1, and if it's not 1 then it will remain the same.

The if() function that you used:

if(DF$INT == "1") { }

is not intended to work on data.frames. The if() function is used only to check if something (like a statement) is TRUE or FALSE. For example:

if(use_new_function == "on"){ 
run_new_function()
}

Replace column value of Dataframe based on a condition on another Dataframe

You can also try with map:

df_student['student_Id'] = (
df_student['student_Id'].map(df_updated_id.set_index('old_id')['new_id'])
.fillna(df_student['student_Id'])
)
print(df_student)

# Output
Name gender math score student_Id
0 John male 50 1234
1 Jay male 100 6788
2 sachin male 70 xyz
3 Geetha female 80 abcd
4 Amutha female 75 83ko
5 ganesh male 40 v432

Update

I believe the updated_id isn't unique, so I need to further pre-process the data.

In this case, maybe you could drop duplicates before considering the last value (keep='last') is the most recent for a same old_id:

sr = df_updated_id.drop_duplicates('old_id', keep='last') \
.set_index('old_id')['new_id']

df_student['student_Id'] = df_student['student_Id'].map(sr) \
.fillna(df_student['student_Id']
)

Note: this is exactly what the @BENY's answer does. As he creates a dict, only the last occurrence of an old_id is kept. However, if you want to keep the first value appears, his code doesn't work. With drop_duplicates, you can adjust the keep parameter.

Replace NaN Values with the Means of other Cols based on Condition

You could implement the function like this:

def replace_missing_with_conditional_mean(df, condition_cols, cols):
s = df.groupby(condition_cols)[cols].transform('mean')
return df.fillna(s.to_dict('series'))


res = replace_missing_with_conditional_mean(df, ['Col1', 'Col2'], ['Col3'])
print(res)

Output

  Col1 Col2  Col3
0 A c 1.0
1 A c 3.0
2 B c 5.0
3 A d 6.0
4 A c 2.0

How to use str.contains() in a conditional statement to apply a function to some elements of a dataframe column?

You are using pandas.Series.apply therefore your function (lambda) receives element (str) itself, so you might simply us in as follows

df['URL'] = df['URL'].apply(
lambda x: urlparse(x).netloc if "http" in x else x
)

Dataframe change column value on if statement and keeps the new value to next row

If I understood what you want this should be something like what you want. Because it is row by row and is based on two values it is not easy to vectorize but probably someone else can do it. Hope it works for you.

order = []
have_found_plus_3 = False

for i, row in df.iterrows():

if row['condition'] == 3:
have_found_plus_3 = True
elif row['condition'] == -3:
have_found_plus_3 = False
if have_found_plus_3:
order.append(1)
else:
order.append(0)
df['order'] = order


Related Topics



Leave a reply



Submit