In R, Why Does Selecting Rows from a Data Frame Return Data as a Vector If The Data Frame Has Only One Column

In R, why does selecting rows from a data frame return data as a vector if the data frame has only one column?

It's because the default argument to [ is drop=TRUE.

From ?"["

drop

For matrices and arrays. If TRUE the result is coerced to the
lowest possible dimension (see the examples). This only works for
extracting elements, not for the replacement. See drop for further
details.

> dat1 <- data.frame(x=letters[1:3])
> dat2 <- data.frame(x=letters[1:3], y=LETTERS[1:3])

The default behaviour:

> dat[1, ]
row sessionId scenarionName stepName duration
[1,] 1 1001 A start 0

> dat[2, ]
row sessionId scenarionName stepName duration
[1,] 2 1001 A step1 2.2

Using drop=FALSE:

> dat1[1, , drop=FALSE]
x
1 a

> dat2[1, , drop=FALSE]
x y
1 a A

R: avoid turning one-row data frames into a vector when using apply functions

You can solve your problem by using lapply instead of sapply, and then combine the result using do.call as follows

new_df <- as.data.frame(lapply(mydf[,-1,drop=F], function(x) gsub("\\s+","_",x)))
new_df <- do.call(cbind, new_df)
new_df
# value1 value2
#[1,] "A_1" "Z_1"

new_df <- cbind(mydf[,1,drop=F], new_df)
#new_df
# ID value1 value2
#1 A A_1 Z_1

As for your question about unpredictable behavior of sapply, it is because s in sapply represent simplification, but the simplified result is not guaranteed to be a data frame. It can be a data frame, a matrix, or a vector.

According to the documentation of sapply:

sapply is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or, if simplify = "array", an array if
appropriate, by applying simplify2array().

On the simplify argument:

logical or character string; should the result be simplified
to a vector, matrix or higher dimensional array if possible? For
sapply it must be named and not abbreviated. The default value, TRUE,
returns a vector or matrix if appropriate, whereas if simplify =
"array" the result may be an array of “rank” (=length(dim(.))) one
higher than the result of FUN(X[[i]]).

The Details part explain its behavior that loos similar with what you experienced (emphasis is from me) :

Simplification in sapply is only attempted if X has length greater
than zero and if the return values from all elements of X are all of
the same (positive) length. If the common length is one the result is
a vector
, and if greater than one is a matrix with a column
corresponding to each element of X.

Hadley Wickham also recommend not to use sapply:

I recommend that you avoid sapply() because it tries to simplify the
result, so it can return a list, a vector, or a matrix. This makes it
difficult to program with, and it should be avoided in non-interactive
settings

He also recommends not to use apply with a data frame. See Advanced R for further explanation.

how to get each column as data.frame (instead of a vector) from a data.frame?

Instead of calling the desired column with a comma i.e. data.frame[,i] use data.frame[i] to preserve the class as data.frame and also retain row names.

data.frame[,i] #As a vector
data.frame[i] #As a data.frame

How do I extract a single column from a data.frame as a data.frame?

Use drop=FALSE

> x <- df[,1, drop=FALSE]
> x
A
1 10
2 20
3 30

From the documentation (see ?"[") you can find:

If drop=TRUE the result is coerced to the lowest possible dimension.

Subset dataframe rows based on character vector when %in% and which are not working

(Just adding my comment as an answer since it was posted before the other ones)

The problem is that in vec you have dots, whereas in df$Specimen.Label you have hyphens, so your first commands do not return anything. If you write instead

df[df$Specimen.Label %in% gsub("\\.", "-", vec),]

you obtain

#     PCC Participant.ID                    Specimen.Label
# 3 PNNL 01CO008 8cc7e656-0152-4359-8566-0581c3
# 6 PNNL 05CO002 f635496c-0046-4ecd-89bc-7a4f33_D2
# 8 PNNL 11CO051 b3696374-c6c0-49dd-833e-596e26_D2
# 10 PNNL 11CO053 e1cd3d70-132b-452f-ba10-026721_D2

Another base R option is to use the function subset

subset(df, Specimen.Label %in% gsub("\\.", "-", vec))

Getting a row from a data frame as a vector in R

Data.frames created by importing data from a external source will have their data transformed to factors by default. If you do not want this set stringsAsFactors=FALSE

In this case to extract a row or a column as a vector you need to do something like this:

as.numeric(as.vector(DF[1,]))

or like this

as.character(as.vector(DF[1,]))

How do I delete rows from a data frame when the DF only has one column

The problem is that R is trying to be "helpful", and simplifying your data for you. The solution is to do the following (note two commas, not one):

df[-1,, drop = FALSE]

This will remove the specified row, and leave your data.frame otherwise untouched.

Select rows from a data frame based on values in a vector

Have a look at ?"%in%".

dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4

You could also use ?is.element:

dt[is.element(dt$fct, vc),]


Related Topics



Leave a reply



Submit