Subset a Data Frame Based on Value Pairs Stored in Independent Ordered Vectors

Subset a data frame based on value pairs stored in independent ordered vectors

You could try match which an appropriated nomatch argument:

sub <- match(DATA$A, AList, nomatch=-1) == match(DATA$B, BList, nomatch=-2)
sub
# [1] TRUE FALSE TRUE FALSE FALSE FALSE

DATA[sub,]
# A B Value
#1 1 6 9
#3 3 8 2

A paste based approach would also be possible:

sub <- paste(DATA$A, DATA$B, sep=":") %in% paste(AList, BList, sep=":")
sub
# [1] TRUE FALSE TRUE FALSE FALSE FALSE

DATA[sub,]
# A B Value
#1 1 6 9
#3 3 8 2

How to subset columns based on the value in another column in R

I made a subset of your data and came up with the following (could be more elegant but this works):

Individual<-c("A","B","C","D","E")
Age2010<-c(53,22,33,NA,NA)
`weight 2010`<-c(50,NA,65,70,NA)
Age2011<-c(85,23,34,28,64)
Weight2011<-c(100,75,64,NA,90)
df<-as.data.frame(cbind(Individual,Age2010,`weight 2010`,Age2011,Weight2011))
colnames(df)<-str_replace_all(colnames(df)," ", "") # remove spaces

# create a dataframe for each year (prob could do this using `apply`)
df2010<-df %>% select(Individual, contains("2010")) %>% mutate(year=2010) %>% rename(weight=weight2010,age=Age2010)
df2011<-df %>% select(Individual, contains("2011")) %>% mutate(year=2011) %>% rename(weight=Weight2011,age=Age2011)

final<-bind_rows(df2010,df2011)

Of course, you can extend this for the remaining years in your dataset. You will then have a year variable to perform your analyses.

How to create a loop which creates multiple subset dataframes from a larger data frame?

Your code works fine. Just remove list so you create a vector of color names and not a list. If you only want distinct values, use unique.

mydata <- data.frame(x = c(1,2,3), y = c('a','b','c'), z = c('red','red','yellow'))

colors <- unique(mydata$z)

for (i in 1:length(colors)) {
assign(paste0("mydata_",i), subset(mydata, z == colors[[i]]))
}

Purging data frame of unwanted rows based on two vector sets

It is best to share a reproducible example, (How to make a great R reproducible example?)

Using the dataset available in base R mtcars and data(mtcars), create two conditional vectors

vectorsetforcol1<- mtcars$mpg[mtcars$mpg<15]
vectorsetforcol2<-unique(mtcars$carb[mtcars$carb==2])

Output condition 1: (mpg < 15)

> mtcars[mtcars$mpg %in% vectorsetforcol1,]
mpg cyl disp hp drat wt qsec vs am gear carb
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4

Output condition 2: (carb == 2)

> mtcars[mtcars$carb %in% vectorsetforcol2,]
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
>

Combine conditions 1 and 2

> cond.df<-mtcars[(mtcars$mpg %in% vectorsetforcol1 | mtcars$carb %in% vectorsetforcol2  ),]
> cond.df
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
>

Test condition 1: (NOT(mpg < 15))

The cases where condition 1 is violated are present because they follow condition 2 (carb ==2)

> cond.test.col1<-cond.df[!cond.df$mpg %in% vectorsetforcol1, ]
> cond.test.col1
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
>

Test condition 2: (NOT(carb == 2))

The cases where condition 2 is violated are present because they follow condition 1 (mpg <15)

> cond.test.col2<-cond.df[!cond.df$carb %in% vectorsetforcol2, ]
> cond.test.col2
mpg cyl disp hp drat wt qsec vs am gear carb
Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4

This is the same approach as yours, if you had provided a working example someone would have pointed out the issue...

Make a table showing the 10 largest values of a variable in R?

This should do it...

data <- data[with(data,order(-Score)),]

data <- data[1:10,]



Related Topics



Leave a reply



Submit