How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
I believe DataFrame.fillna()
will do this for you.
Link to Docs for a dataframe and for a Series.
Example:
In [7]: df
Out[7]:
0 1
0 NaN NaN
1 -0.494375 0.570994
2 NaN NaN
3 1.876360 -0.229738
4 NaN NaN
In [8]: df.fillna(0)
Out[8]:
0 1
0 0.000000 0.000000
1 -0.494375 0.570994
2 0.000000 0.000000
3 1.876360 -0.229738
4 0.000000 0.000000
To fill the NaNs in only one column, select just that column. in this case I'm using inplace=True to actually change the contents of df.
In [12]: df[1].fillna(0, inplace=True)
Out[12]:
0 0.000000
1 0.570994
2 0.000000
3 -0.229738
4 0.000000
Name: 1
In [13]: df
Out[13]:
0 1
0 NaN 0.000000
1 -0.494375 0.570994
2 NaN 0.000000
3 1.876360 -0.229738
4 NaN 0.000000
EDIT:
To avoid a SettingWithCopyWarning
, use the built in column-specific functionality:
df.fillna({1:0}, inplace=True)
Replace NA with 0 in a data frame column
Since nobody so far felt fit to point out why what you're trying doesn't work:
NA == NA
doesn't returnTRUE
, it returnsNA
(since comparing to undefined values should yield an undefined result).- You're trying to call
apply
on an atomic vector. You can't useapply
to loop over the elements in a column. - Your subscripts are off - you're trying to give two indices into
a$x
, which is just the column (an atomic vector).
I'd fix up 3. to get to a$x[is.na(a$x)] <- 0
Python Pandas replace multiple columns zero to Nan
I think you need replace
by dict
:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace({'0':np.nan, 0:np.nan})
How do I replace NA values with zeros in an R dataframe?
See my comment in @gsk3 answer. A simple example:
> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5
> d[is.na(d)] <- 0
> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5
There's no need to apply apply
. =)
EDIT
You should also take a look at norm
package. It has a lot of nice features for missing data analysis. =)
Replace NA in DataFrame for multiple columns with mean per country
Note Example is based on your additional data source from the comments
Replacing the NA-Values for multiple columns with mean()
you can combine the following three methods:
fillna()
(Iterating per columnaxis
should be 0, which is default value offillna()
)groupby()
transform()
Create data frame from your example:
df = pd.read_excel('https://happiness-report.s3.amazonaws.com/2021/DataPanelWHR2021C2.xls')
Country name | year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Positive affect | Negative affect |
---|---|---|---|---|---|---|---|---|---|---|
Canada | 2005 | 7.41805 | 10.6518 | 0.961552 | 71.3 | 0.957306 | 0.25623 | 0.502681 | 0.838544 | 0.233278 |
Canada | 2007 | 7.48175 | 10.7392 | nan | 71.66 | 0.930341 | 0.249479 | 0.405608 | 0.871604 | 0.25681 |
Canada | 2008 | 7.4856 | 10.7384 | 0.938707 | 71.84 | 0.926315 | 0.261585 | 0.369588 | 0.89022 | 0.202175 |
Canada | 2009 | 7.48782 | 10.6972 | 0.942845 | 72.02 | 0.915058 | 0.246217 | 0.412622 | 0.867433 | 0.247633 |
Canada | 2010 | 7.65035 | 10.7165 | 0.953765 | 72.2 | 0.933949 | 0.230451 | 0.41266 | 0.878868 | 0.233113 |
How to replace all NA values in numerical columns only with median values and update the dataframe
Based on your screenshots, it looks like you're just going back to the RStudio viewer window to look at the data frame again. If so, the issue is this:
When you write test2 %>% mutate_if(...)
, you're telling R to change something in test2
and return the result (roughly meaning, in this context, to just print the result and show it to you). What you're not telling it to do is to save that result anywhere.
You would want something like test2 <- test2 %>% mutate_if(...)
to overwrite the existing test2
data frame in your global environment, or something like test3 <- test2 %>% mutate_if(...)
to give it a new name and store the modified thing as a separate object while retaining the old one.
Lastly, I would echo Andrea M's concern that you might not want to do this at all. Imputing missing data with averages is, on a good day, risky.
Replacing NA with 0 in columns that contain a substring in the column name
Update
If it is to select columns having 'keyword' as substring in the column names, use contains
to select across
those columns
library(dplyr)
library(tidyr)
df1 <- df1 %>%
mutate(across(contains('keyword'), replace_na, 0))
-output
df1
# A tibble: 5 × 4
col1 col2_keyword col3 col4
<int> <chr> <chr> <dbl>
1 1 a a 1
2 2 b b 3
3 3 0 c NA
4 4 c d 5
5 5 d <NA> 6
Assuming that the OP mentioned to replace
NA
only in columns that have a specific element 'keyword', use where
with a logical expression to select the columns that have the 'keyword', loop across
those columns and use replace_na
to replace the NA to 0
df <- df %>%
mutate(across(where(~ is.character(.x) && 'keyword' %in% .x), replace_na, 0))
-output
df
# A tibble: 5 × 4
col1 col2 col3 col4
<int> <chr> <chr> <dbl>
1 1 a a 1
2 2 b b 3
3 3 keyword c NA
4 4 0 d 5
5 5 c <NA> 6
data
df <- tibble(col1 = 1:5, col2 = c("a", "b", "keyword", NA, 'c'),
col3 = c('a', 'b', 'c', 'd', NA), col4 = c(1, 3, NA, 5, 6))
df1 <- tibble(col1 = 1:5, col2_keyword = c("a", "b", NA, 'c', 'd'),
col3 =c('a', 'b', 'c', 'd', NA), col4 = c(1, 3, NA, 5, 6))
Replace NA values in data frame with the column mean
library(tidyverse)
df1 <- tibble(x = seq(3), y = c(1, NA, 2))
df1 %>% mutate(y = y %>% replace_na(mean(df1$y, na.rm = TRUE)))
#> # A tibble: 3 × 2
#> x y
#> <int> <dbl>
#> 1 1 1
#> 2 2 1.5
#> 3 3 2
Created on 2022-03-10 by the reprex package (v2.0.0)
Related Topics
How to Count How Many Values Per Level in a Given Factor
Plotting a Curve Around a Set of Points
Ggplot2:Adding Two Errorbars to Each Point in Scatterplot
Plotting Multiple Curves Same Graph and Same Scale
R: Xtable Caption (Or Comment)
How to Open an .Xlsb File in R
Date Time Conversion and Extract Only Time
Error When Using Predict() on a Randomforest Object Trained with Caret's Train() Using Formula
Formatter Argument in Scale_Continuous Throwing Errors in R 2.15
Coding Variable Values into Classes Using R
Replace Na with 0 in a Data Frame Column
R: Remove Multiple Empty Columns of Character Variables
Reversed Order After Coord_Flip in R
Sendmailr (Part2): Sending Files as Mail Attachments