Remove the Rows That Have Non-Numeric Characters in One Column in R

Remove the rows that have non-numeric characters in one column in R

When you import data to a data.frame, it generally gets converted to a factor if the entire column is not numeric. With that in mind, you usually have to convert to character and then to numeric.

dat <- data.frame(A=c(letters[1:5],1:5))

str(dat)
'data.frame':   10 obs. of  1 variable:
 $ A: Factor w/ 10 levels "1","2","3","4",..: 6 7 8 9 10 1 2 3 4 5

as.numeric(as.character(dat$A))
 [1] NA NA NA NA NA  1  2  3  4  5
Warning message:
NAs introduced by coercion

Notice that it converts characters to NA. Combining this:

dat <- dat[!is.na(as.numeric(as.character(dat$A))),]

In words, the rows of dat that are not NA after conversion from factor to numeric.

Second Issue:

> dat <- data.frame(A=c(letters[1:5],1:5))
> dat <- dat[!is.na(as.numeric(as.character(dat$A))),]
Warning message:
In `[.data.frame`(dat, !is.na(as.numeric(as.character(dat$A))),  :
  NAs introduced by coercion
> dat <- dat[!is.na(as.numeric(as.character(dat$A))),]
Error in dat$A : $ operator is invalid for atomic vectors

Is there any way to delete the rows of data which don't have all numeric values?

One base R option could be:

data[!is.na(Reduce(`+`, lapply(data, as.numeric))), ]

  a b
2 2 2
3 3 3

And for importing the data, use stringsAsFactors = FALSE.

Or using sapply():

data[!is.na(rowSums(sapply(data, as.numeric))), ]

Replacing all non-numeric characters in certain columns in R

You could use across (within mutate) to do it over all columns but a and use regex (within str_extract) to extract only numerics (and convert to numerics type).

library(tidyverse)

d |> 
  mutate(across(-a, ~ . |> str_extract("\\d+") |> as.numeric()))

Output:

# A tibble: 6 × 3
  a         b     c
  <chr> <dbl> <dbl>
1 Tom       8     2
2 Mary      3    12
3 Ben       6     6
4 Jane      7     7
5 Lucas     5     1
6 Mark      1     9

Removing data with a non-numeric column value in R

If you just want to filter out rows with NA values, you can use complete.cases():

> df
   id age   fev height male smoke
1   1  72 1.284   66.5    1     1
2   2  81 2.553   67.0    0     0
3   3  90 2.383   67.0    1     0
4   4  72 2.699   71.5    1     0
5   5  70 2.031   62.5    0     0
6   6  72 2.410   67.5    1     0
7   7  75 3.586   69.0    1     0
8   8  75 2.958   67.0    1     0
9   9  67 1.916   62.5    0     0
10 10  70    NA   66.0    0     1
> df[complete.cases(df), ]
  id age   fev height male smoke
1  1  72 1.284   66.5    1     1
2  2  81 2.553   67.0    0     0
3  3  90 2.383   67.0    1     0
4  4  72 2.699   71.5    1     0
5  5  70 2.031   62.5    0     0
6  6  72 2.410   67.5    1     0
7  7  75 3.586   69.0    1     0
8  8  75 2.958   67.0    1     0
9  9  67 1.916   62.5    0     0

How to delete all non-numeric rows in R?

Subset to numeric IDs:

subset(df, grepl('^\\d+$', df$ID))

The pattern should match values of ID that start and end with digits, and only contain digits.

How to delete a row in R that doesn't have a number

Example data.frame:

df <- data.frame(a=1:10, b=1:10, FRQ=c(rnorm(8), '.', 'rabbit'), stringsAsFactors=FALSE)

To check the class of all your columns try: lapply(df, class)

If the FRQ column is character, you can convert it to numeric by removing all non-numerics, then convert to numeric. Like this:

library(stringr)
df <- df[!str_detect(df$FRQ, '([A-Za-z])'), ]
df <- df[!str_detect(df$FRQ, '\\.$'), ]
df$FRQ <- as.numeric(df$FRQ)

Remove Non Numeric values (Unknown) in my data frame

We could avoid this problem while specifying na.strings in the read.csv/read.table

dataL <- read.csv("file.csv", stringsAsFactors = FALSE,
   na.strings = c("NA", "N/A", "Unknown*", "NULL", ".P"))

The problem with the current approach is that these are factor columns and replacing those levels to NA still show the unused levels. So, we need droplevels to remove the unused levels

dataS <- droplevels(na.omit(dataL))

Remove non numeric values from vector in r

A simple solution is to use Filter over vec <- list(1, 2, T, 'x', 'abc', '6', 7, F, F, 10), i.e.,

> unlist(Filter(is.numeric,vec))
[1]  1  2  7 10

Removing rows from dataframe that contains string in a particular column

There are multiple ways you can do this :

Convert to numeric and remove NA values

subset(df, !is.na(as.numeric(Score)))

#    ID Score
#1 1001     4
#2 1002    20
#5 1005    30

Or with grepl find if there are any non-numeric characters in them and remove them

subset(df, !grepl('\\D', Score))

This can be done with grep as well.

df[grep('\\D', df$Score, invert = TRUE), ]

data

df <- structure(list(ID = 1001:1005, Score = c("4", "20", "h", "v", 
"30")), class = "data.frame", row.names = c(NA, -5L))

Convert non-numeric rows and columns to zero

library(ISLR)
data("Hitters")
d = head(Hitters)

library(dplyr)

d %>% 
  mutate_if(function(x) !is.numeric(x), function(x) 0) %>%   # if column is non numeric add zeros
  mutate_all(function(x) ifelse(is.na(x), 0, x))             # if there is an NA element replace it with 0

#   AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns CRBI CWalks League Division PutOuts Assists Errors Salary NewLeague
# 1   293   66     1   30  29    14     1    293    66      1    30   29     14      0        0     446      33     20    0.0         0
# 2   315   81     7   24  38    39    14   3449   835     69   321  414    375      0        0     632      43     10  475.0         0
# 3   479  130    18   66  72    76     3   1624   457     63   224  266    263      0        0     880      82     14  480.0         0
# 4   496  141    20   65  78    37    11   5628  1575    225   828  838    354      0        0     200      11      3  500.0         0
# 5   321   87    10   39  42    30     2    396   101     12    48   46     33      0        0     805      40      4   91.5         0
# 6   594  169     4   74  51    35    11   4408  1133     19   501  336    194      0        0     282     421     25  750.0         0

If you want to avoid function(x) you can use this

d %>% 
  mutate_if(Negate(is.numeric), ~0) %>%  
  mutate_all(~ifelse(is.na(.), 0, .))

Remove the Rows That Have Non-Numeric Characters in One Column in R