Subsetting R Data Frame Results in Mysterious Na Rows

Subsetting Rows in R producing NAs, but there are no NAs in Data Frame

We can specify the row with the logical expression, subset the columns with the column names as strings, get the unique and extract the distance

unique(x[x$component ==1, c("ObjectID", "distance")])$distance
#[1] 2 4

If the intention is only to get the 'distance' based on the 'unique' values of 'ObjectID', we can use duplicated

with(subset(x, component == 1, select = c(ObjectID, distance)), 
distance[!duplicated(ObjectID)])
#[1] 2 4

Or more compactly, join two conditions with &

subset(x, !duplicated(ObjectID) & component == 1)$distance
#[1] 2 4

The issue in OP's code is using the unique value of 'ObjectID' as row index, which fails as the index can be either logical or numeric index

unique(x[x$component==1,]$ObjectID)
#[1] "11AD1234" "11DA354"

If we have to convert this to logical, we can use %in%

Subsetting rows in R generates mysterious NA row [Version 2.0]

using your example (which doesnt show any NAs because you forgot to reassign the variable):

iris
iris$Petal.Width <- gsub(1.8, NA, iris$Petal.Width)
iris[!is.na(iris$Petal.Width) & iris$Petal.Width == 2.0,]

this also works:

iris[complete.cases(iris$Petal.Width) & iris$Petal.Width== 2 ,]

which gives the following output:

    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
111 6.5 3.2 5.1 2 virginica
114 5.7 2.5 5.0 2 virginica
122 5.6 2.8 4.9 2 virginica
123 7.7 2.8 6.7 2 virginica
132 7.9 3.8 6.4 2 virginica
148 6.5 3.0 5.2 2 virginica

read those links as an introduction to NAs in R:
http://www.statmethods.net/input/missingdata.html
http://www.ats.ucla.edu/stat/r/faq/missing.htm

Subsetting R data frame with NAs in index variable

I think this does what you wanted:

> a[(a$Diab != 0) | is.na(a$Diab),]
Diab INF HYP
2 NA 1 0
3 1 1 1
4 1 1 0
5 1 1 NA
8 NA 0 1
9 NA 1 1

You need to find entries in Diab which are either not equal to zero (!= 0) or equal to NA (is.na). The boolean operator | means OR.

Subset in R producing na

This is a workaround, a response to your #2

Looking at your code, there is a much easier way of subsetting data. Try this.

Check if this solves your issue.

library(dplyr)

active<- clinic %>%
filter(Days.since.injury.physio>20,
Days.since.injury.physio<35,
Days.since.injury.F.U.1>27,
Days.since.injury.F.U.1<63
)

dplyr does wonders when it comes to subsetting and manipulation of data.

The %>% symbol chains statements together so you don't ever have to use the $ symbol.

If, for some bizarre reason, you don't like this, you should look at the subset function in r.

subsetting !is.na for multiple conditions unexpected results

I don't know why the initial approach didn't work, but I guess there is some fault in the chaining that I can not see. Taking the opposite approach (removing those that fulfills the condition) seems to produce the desired output.

tmp <- data.frame(state = c(1,  1, 2,  2, 3, 3, 4, 5),
reg = c(NA, 3, 6, NA, 9, 1, NA, 7),
gas = c(NA, 5, NA, 9, 1, 3, NA, 1),
other = c(1, 2, 4, 2, 6, 8, 1, 1) )

res = tmp[-which(is.na(tmp$reg) & is.na(tmp$gas)),]

res
#> state reg gas other
#> 2 1 3 5 2
#> 3 2 6 NA 4
#> 4 2 NA 9 2
#> 5 3 9 1 6
#> 6 3 1 3 8
#> 8 5 7 1 1

Created on 2020-12-24 by the reprex package (v0.3.0)



Related Topics



Leave a reply



Submit