Remove the Rows That Have Non-Numeric Characters in One Column in R

Remove the rows that have non-numeric characters in one column in R

When you import data to a data.frame, it generally gets converted to a factor if the entire column is not numeric. With that in mind, you usually have to convert to character and then to numeric.

dat <- data.frame(A=c(letters[1:5],1:5))

str(dat)
'data.frame': 10 obs. of 1 variable:
$ A: Factor w/ 10 levels "1","2","3","4",..: 6 7 8 9 10 1 2 3 4 5

as.numeric(as.character(dat$A))
[1] NA NA NA NA NA 1 2 3 4 5
Warning message:
NAs introduced by coercion

Notice that it converts characters to NA. Combining this:

dat <- dat[!is.na(as.numeric(as.character(dat$A))),]

In words, the rows of dat that are not NA after conversion from factor to numeric.

Second Issue:

> dat <- data.frame(A=c(letters[1:5],1:5))
> dat <- dat[!is.na(as.numeric(as.character(dat$A))),]
Warning message:
In `[.data.frame`(dat, !is.na(as.numeric(as.character(dat$A))), :
NAs introduced by coercion
> dat <- dat[!is.na(as.numeric(as.character(dat$A))),]
Error in dat$A : $ operator is invalid for atomic vectors

Is there any way to delete the rows of data which don't have all numeric values?

One base R option could be:

data[!is.na(Reduce(`+`, lapply(data, as.numeric))), ]

a b
2 2 2
3 3 3

And for importing the data, use stringsAsFactors = FALSE.

Or using sapply():

data[!is.na(rowSums(sapply(data, as.numeric))), ]

Replacing all non-numeric characters in certain columns in R

You could use across (within mutate) to do it over all columns but a and use regex (within str_extract) to extract only numerics (and convert to numerics type).

library(tidyverse)

d |>
mutate(across(-a, ~ . |> str_extract("\\d+") |> as.numeric()))

Output:

# A tibble: 6 × 3
a b c
<chr> <dbl> <dbl>
1 Tom 8 2
2 Mary 3 12
3 Ben 6 6
4 Jane 7 7
5 Lucas 5 1
6 Mark 1 9

Removing data with a non-numeric column value in R

If you just want to filter out rows with NA values, you can use complete.cases():

> df
id age fev height male smoke
1 1 72 1.284 66.5 1 1
2 2 81 2.553 67.0 0 0
3 3 90 2.383 67.0 1 0
4 4 72 2.699 71.5 1 0
5 5 70 2.031 62.5 0 0
6 6 72 2.410 67.5 1 0
7 7 75 3.586 69.0 1 0
8 8 75 2.958 67.0 1 0
9 9 67 1.916 62.5 0 0
10 10 70 NA 66.0 0 1
> df[complete.cases(df), ]
id age fev height male smoke
1 1 72 1.284 66.5 1 1
2 2 81 2.553 67.0 0 0
3 3 90 2.383 67.0 1 0
4 4 72 2.699 71.5 1 0
5 5 70 2.031 62.5 0 0
6 6 72 2.410 67.5 1 0
7 7 75 3.586 69.0 1 0
8 8 75 2.958 67.0 1 0
9 9 67 1.916 62.5 0 0

How to delete all non-numeric rows in R?

Subset to numeric IDs:

subset(df, grepl('^\\d+$', df$ID))

The pattern should match values of ID that start and end with digits, and only contain digits.

How to delete a row in R that doesn't have a number

Example data.frame:

df <- data.frame(a=1:10, b=1:10, FRQ=c(rnorm(8), '.', 'rabbit'), stringsAsFactors=FALSE)

To check the class of all your columns try: lapply(df, class)

If the FRQ column is character, you can convert it to numeric by removing all non-numerics, then convert to numeric. Like this:

library(stringr)
df <- df[!str_detect(df$FRQ, '([A-Za-z])'), ]
df <- df[!str_detect(df$FRQ, '\\.$'), ]
df$FRQ <- as.numeric(df$FRQ)

Remove Non Numeric values (*Unknown*) in my data frame

We could avoid this problem while specifying na.strings in the read.csv/read.table

dataL <- read.csv("file.csv", stringsAsFactors = FALSE,
na.strings = c("NA", "N/A", "Unknown*", "NULL", ".P"))

The problem with the current approach is that these are factor columns and replacing those levels to NA still show the unused levels. So, we need droplevels to remove the unused levels

dataS <- droplevels(na.omit(dataL))

Remove non numeric values from vector in r

A simple solution is to use Filter over vec <- list(1, 2, T, 'x', 'abc', '6', 7, F, F, 10), i.e.,

> unlist(Filter(is.numeric,vec))
[1] 1 2 7 10

Removing rows from dataframe that contains string in a particular column

There are multiple ways you can do this :

Convert to numeric and remove NA values

subset(df, !is.na(as.numeric(Score)))

# ID Score
#1 1001 4
#2 1002 20
#5 1005 30

Or with grepl find if there are any non-numeric characters in them and remove them

subset(df, !grepl('\\D', Score))

This can be done with grep as well.

df[grep('\\D', df$Score, invert = TRUE), ]

data

df <- structure(list(ID = 1001:1005, Score = c("4", "20", "h", "v", 
"30")), class = "data.frame", row.names = c(NA, -5L))

Convert non-numeric rows and columns to zero

library(ISLR)
data("Hitters")
d = head(Hitters)

library(dplyr)

d %>%
mutate_if(function(x) !is.numeric(x), function(x) 0) %>% # if column is non numeric add zeros
mutate_all(function(x) ifelse(is.na(x), 0, x)) # if there is an NA element replace it with 0

# AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns CRBI CWalks League Division PutOuts Assists Errors Salary NewLeague
# 1 293 66 1 30 29 14 1 293 66 1 30 29 14 0 0 446 33 20 0.0 0
# 2 315 81 7 24 38 39 14 3449 835 69 321 414 375 0 0 632 43 10 475.0 0
# 3 479 130 18 66 72 76 3 1624 457 63 224 266 263 0 0 880 82 14 480.0 0
# 4 496 141 20 65 78 37 11 5628 1575 225 828 838 354 0 0 200 11 3 500.0 0
# 5 321 87 10 39 42 30 2 396 101 12 48 46 33 0 0 805 40 4 91.5 0
# 6 594 169 4 74 51 35 11 4408 1133 19 501 336 194 0 0 282 421 25 750.0 0

If you want to avoid function(x) you can use this

d %>% 
mutate_if(Negate(is.numeric), ~0) %>%
mutate_all(~ifelse(is.na(.), 0, .))


Related Topics



Leave a reply



Submit