Change the Blank Cells to NA
I'm assuming you are talking about row 5 column "sex." It could be the case that in the data2.csv file, the cell contains a space and hence is not considered empty by R.
Also, I noticed that in row 5 columns "axles" and "door", the original values read from data2.csv are string "NA". You probably want to treat those as na.strings as well. To do this,
dat2 <- read.csv("data2.csv", header=T, na.strings=c("","NA"))
EDIT:
I downloaded your data2.csv. Yes, there is a space in row 5 column "sex". So you want
na.strings=c(""," ","NA")
How to replace empty string with NA in R dataframe?
I'm not sure why df[df==""]<-NA
would not have worked for OP. Let's take a sample data.frame and investigate options.
Option#1: Base-R
df[df==""]<-NA
df
# One Two Three Four
# 1 A A <NA> AAA
# 2 <NA> B BA <NA>
# 3 C <NA> CC CCC
Option#2: dplyr::mutate_all
and na_if
. Or mutate_if
if the data frame has multiple types of columns
library(dplyr)
mutate_all(df, list(~na_if(.,"")))
OR
#if data frame other types of character Then
df %>% mutate_if(is.character, list(~na_if(.,"")))
# One Two Three Four
# 1 A A <NA> AAA
# 2 <NA> B BA <NA>
# 3 C <NA> CC CCC
Toy Data:
df <- data.frame(One=c("A","","C"),
Two=c("A","B",""),
Three=c("","BA","CC"),
Four=c("AAA","","CCC"),
stringsAsFactors = FALSE)
df
# One Two Three Four
# 1 A A AAA
# 2 B BA
# 3 C CC CCC
Replace blank with NA in R
What type of variable are we talking about? Numeric? Character?
A better formulated question makes it easier to give a better answer.
This could help:
DT[DT == ""] <- NA
Do not try so hard. R should be fun!
How can I replace empty cells with NA in R?
The result you get from readHTMLTable
is giving you a list of two tables, so you need to work on each list element, which can be done using lapply
table <- lapply(table, function(x){
x[x == ""] <- NA
return(x)
})
table$team_stats
Player PF Yds Ply Y/P TO FL 1stD Cmp Att Yds TD Int NY/A 1stD Att Yds TD Y/A 1stD Pen Yds 1stPy
1 Team Stats 442 6268 1021 6.1 25 14 350 339 483 4302 35 11 8.1 209 493 1966 14 4.0 124 109 922 17
2 Opp. Stats 253 4618 979 4.7 37 16 283 316 564 3235 15 21 5.3 178 372 1383 9 3.7 76 75 581 29
3 Lg Rank Offense 1 1 <NA> <NA> 2 10 1 <NA> 20 2 1 1 1 <NA> 13 10 12 13 <NA> <NA> <NA> <NA>
4 Lg Rank Defense 3 4 <NA> <NA> 11 9 9 <NA> 25 11 3 9 5 <NA> 1 3 3 8 <NA> <NA> <NA> <NA>
Replacing blank values (white space) with NaN in pandas
I think df.replace()
does the job, since pandas 0.13:
df = pd.DataFrame([
[-0.532681, 'foo', 0],
[1.490752, 'bar', 1],
[-1.387326, 'foo', 2],
[0.814772, 'baz', ' '],
[-0.222552, ' ', 4],
[-1.176781, 'qux', ' '],
], columns='A B C'.split(), index=pd.date_range('2000-01-01','2000-01-06'))
# replace field that's entirely space (or empty) with NaN
print(df.replace(r'^\s*$', np.nan, regex=True))
Produces:
A B C
2000-01-01 -0.532681 foo 0
2000-01-02 1.490752 bar 1
2000-01-03 -1.387326 foo 2
2000-01-04 0.814772 baz NaN
2000-01-05 -0.222552 NaN 4
2000-01-06 -1.176781 qux NaN
As Temak pointed it out, use df.replace(r'^\s+$', np.nan, regex=True)
in case your valid data contains white spaces.
How to replace blank strings with NA?
To show that the code works:
data <- data.frame( col1= c("", letters[1:4]), col2=c(letters[1:4], ""))
is.na(data) <- data==''
data
# col1 col2
#1 <NA> a
#2 a b
#3 b c
#4 c d
#5 d <NA>
Suppose, if you have ''
along with spaces ' '
, this won't work
data <- data.frame( col1= c("", letters[1:4]), col2=c(letters[1:4], " "))
data1 <- data
is.na(data) <- data==''
data
col1 col2
#1 <NA> a
#2 a b
#3 b c
#4 c d
#5 d
In such cases, you could use str_trim
library(stringr)
data1[] <- lapply(data1, str_trim)
is.na(data1) <- data1==''
data1
# col1 col2
#1 <NA> a
#2 a b
#3 b c
#4 c d
#5 d <NA>
Fast way to replace all blanks with NA in R data.table
Here's probably the generic data.table
way of doing this. I'm also going to use your regex which handles several types of blanks (I havn't seen other answers doing this). You probably shouldn't run this over all your columns rather only over the factor
or character
ones, because other classes won't accept blank values.
For factor
s
indx <- which(sapply(data, is.factor))
for (j in indx) set(data, i = grep("^$|^ $", data[[j]]), j = j, value = NA_integer_)
For character
s
indx2 <- which(sapply(data, is.character))
for (j in indx2) set(data, i = grep("^$|^ $", data[[j]]), j = j, value = NA_character_)
Related Topics
Turning Off Some Legends in a Ggplot
Create a Variable Name With "Paste" in R
Unordered Combinations of All Lengths
R Conditional Evaluation When Using the Pipe Operator %≫%
Control Ggplot2 Legend Look Without Affecting the Plot
Change Bar Plot Colour in Geom_Bar With Ggplot2 in R
Generate N Random Integers That Sum to M in R
How to Assign Values to Dynamic Names Variables
How Can One Work Fully Generically in Data.Table in R With Column Names in Variables
How to Convert Long to Wide Format With Counts
R - Concatenate Two Dataframes
Nested Facets in Ggplot2 Spanning Groups