Replace All Particular Values in a Data Frame

Replace all particular values in a data frame

Like this:

> df[df==""]<-NA
> df
A B
1 <NA> 12
2 xyz <NA>
3 jkl 100

Replace all specific values in data.frame with values from another data.frame sequentially R

With base R, we can use max.col to return the last column index for each row, where the 'Age' columns are not ., cbind with sequence of rows to return a row/column index, extract the elements and change the 'Age' column in 'df1', where the 'Age' is .

df1$Age <- ifelse(df1$Age == ".", df2[-1][cbind(seq_len(nrow(df2)), 
max.col(df2[-1] != ".", "last"))], df1$Age)

df1 <- type.convert(df1, as.is = TRUE)

-output

df1
# Sample Age
#1 1 50
#2 2 49
#3 3 30

or using tidyverse by reshaping into 'long' format and then do a join after sliceing the last row grouped by 'Sample'

library(dplyr)
library(tidyr)
df2 %>%
mutate(across(starts_with('Age'), as.integer)) %>%
pivot_longer(cols = starts_with('Age'), values_drop_na = TRUE) %>%
group_by(Sample) %>%
slice_tail(n = 1) %>%
ungroup %>%
select(-name) %>%
right_join(df1) %>%
transmute(Sample, Age = coalesce(as.integer(Age), value))

-output

# A tibble: 3 x 2
# Sample Age
# <int> <int>
#1 1 50
#2 2 49
#3 3 30

data

df1 <- structure(list(Sample = 1:3, Age = c("50", ".", ".")), 
class = "data.frame",
row.names = c(NA,
-3L))

df2 <- structure(list(Sample = 1:3, Age_1 = c(40L, 35L, 30L), Age_2 = c("42",
"49", "."), Age_3 = c("44", ".", ".")), class = "data.frame",
row.names = c(NA,
-3L))

Replacing all values based on specific value in column dataframe

Based on 1) my initial idea of using multiplication instead of replace and 2) riding on @piRSquared's syntax together with 3) modification to exclude first column for operation, you can use:

df.iloc[:-1, 1:] *= df.iloc[-1, 1:]

Test run:

data = {'number': {0: '1', 1: '2', 2: '3', 3: 'result'},
'error1': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.5},
'error2': {0: 0.0, 1: 1.0, 2: 1.0, 3: 0.6},
'error2040': {0: 1.0, 1: 1.0, 2: 0.0, 3: 0.001}}

df = pd.DataFrame(data)
print(df)

number error1 error2 error2040
0 1 0.0 0.0 1.000
1 2 1.0 1.0 1.000
2 3 0.0 1.0 0.000
3 result 0.5 0.6 0.001


df.iloc[:-1, 1:] *= df.iloc[-1, 1:]

print(df)

number error1 error2 error2040
0 1 0.0 0.0 0.001
1 2 0.5 0.6 0.001
2 3 0.0 0.6 0.0
3 result 0.5 0.6 0.001

Replace specific values in a dataframe column using Pandas

A clean syntax for this kind of "find and replace" uses a dict, as

df.Num_of_employees = df.Num_of_employees.replace({"10-Jan": "1-10",
"Nov-50": "11-50"})

I'm trying to replace a specific value in my dataframe

You assign only the day of week column into sales, so it makes sense you get only one column. Try:

sales["day of week"]=sales["day of week"].replace(0, "Thru")

If it doesn't work (because day of week is an object type column), try:

sales["day of week"]=sales["day of week"].replace('0', "Thru")

Pandas DataFrame: replace all values in a column, based on condition

You need to select that column:

In [41]:
df.loc[df['First Season'] > 1990, 'First Season'] = 1
df

Out[41]:
Team First Season Total Games
0 Dallas Cowboys 1960 894
1 Chicago Bears 1920 1357
2 Green Bay Packers 1921 1339
3 Miami Dolphins 1966 792
4 Baltimore Ravens 1 326
5 San Franciso 49ers 1950 1003

So the syntax here is:

df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]

You can check the docs and also the 10 minutes to pandas which shows the semantics

EDIT

If you want to generate a boolean indicator then you can just use the boolean condition to generate a boolean Series and cast the dtype to int this will convert True and False to 1 and 0 respectively:

In [43]:
df['First Season'] = (df['First Season'] > 1990).astype(int)
df

Out[43]:
Team First Season Total Games
0 Dallas Cowboys 0 894
1 Chicago Bears 0 1357
2 Green Bay Packers 0 1339
3 Miami Dolphins 0 792
4 Baltimore Ravens 1 326
5 San Franciso 49ers 0 1003

Replace values from rows with specific values from another dataframe in R

You could use dplyr:

df %>%
group_by(ID) %>%
mutate(Min_Range_New = ifelse(is.na(Range), NA, min(Range, na.rm=TRUE)))

which returns

      ID Range Min_Range Min_Range_New
<dbl> <dbl> <dbl> <dbl>
1 1 10 10 10
2 1 15 10 10
3 1 20 10 10
4 2 30 30 30
5 2 35 30 30
6 3 40 40 40
7 3 45 40 40
8 3 50 40 40
9 3 NA NA NA
10 4 NA NA NA
11 4 NA NA NA


Related Topics



Leave a reply



Submit