In R, why does selecting rows from a data frame return data as a vector if the data frame has only one column?
It's because the default argument to [
is drop=TRUE
.
From ?"["
drop
For matrices and arrays. If TRUE the result is coerced to the
lowest possible dimension (see the examples). This only works for
extracting elements, not for the replacement. See drop for further
details.
> dat1 <- data.frame(x=letters[1:3])
> dat2 <- data.frame(x=letters[1:3], y=LETTERS[1:3])
The default behaviour:
> dat[1, ]
row sessionId scenarionName stepName duration
[1,] 1 1001 A start 0
> dat[2, ]
row sessionId scenarionName stepName duration
[1,] 2 1001 A step1 2.2
Using drop=FALSE
:
> dat1[1, , drop=FALSE]
x
1 a
> dat2[1, , drop=FALSE]
x y
1 a A
R: avoid turning one-row data frames into a vector when using apply functions
You can solve your problem by using lapply
instead of sapply
, and then combine the result using do.call
as follows
new_df <- as.data.frame(lapply(mydf[,-1,drop=F], function(x) gsub("\\s+","_",x)))
new_df <- do.call(cbind, new_df)
new_df
# value1 value2
#[1,] "A_1" "Z_1"
new_df <- cbind(mydf[,1,drop=F], new_df)
#new_df
# ID value1 value2
#1 A A_1 Z_1
As for your question about unpredictable behavior of sapply
, it is because s
in sapply
represent simplification, but the simplified result is not guaranteed to be a data frame. It can be a data frame, a matrix, or a vector.
According to the documentation of sapply
:
sapply is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or, if simplify = "array", an array if
appropriate, by applying simplify2array().
On the simplify
argument:
logical or character string; should the result be simplified
to a vector, matrix or higher dimensional array if possible? For
sapply it must be named and not abbreviated. The default value, TRUE,
returns a vector or matrix if appropriate, whereas if simplify =
"array" the result may be an array of “rank” (=length(dim(.))) one
higher than the result of FUN(X[[i]]).
The Details part explain its behavior that loos similar with what you experienced (emphasis is from me) :
Simplification in sapply is only attempted if X has length greater
than zero and if the return values from all elements of X are all of
the same (positive) length. If the common length is one the result is
a vector, and if greater than one is a matrix with a column
corresponding to each element of X.
Hadley Wickham also recommend not to use sapply
:
I recommend that you avoid sapply() because it tries to simplify the
result, so it can return a list, a vector, or a matrix. This makes it
difficult to program with, and it should be avoided in non-interactive
settings
He also recommends not to use apply
with a data frame. See Advanced R for further explanation.
how to get each column as data.frame (instead of a vector) from a data.frame?
Instead of calling the desired column with a comma i.e. data.frame[,i] use data.frame[i] to preserve the class as data.frame and also retain row names.
data.frame[,i] #As a vector
data.frame[i] #As a data.frame
How do I extract a single column from a data.frame as a data.frame?
Use drop=FALSE
> x <- df[,1, drop=FALSE]
> x
A
1 10
2 20
3 30
From the documentation (see ?"["
) you can find:
If drop=TRUE the result is coerced to the lowest possible dimension.
Subset dataframe rows based on character vector when %in% and which are not working
(Just adding my comment as an answer since it was posted before the other ones)
The problem is that in vec
you have dots, whereas in df$Specimen.Label
you have hyphens, so your first commands do not return anything. If you write instead
df[df$Specimen.Label %in% gsub("\\.", "-", vec),]
you obtain
# PCC Participant.ID Specimen.Label
# 3 PNNL 01CO008 8cc7e656-0152-4359-8566-0581c3
# 6 PNNL 05CO002 f635496c-0046-4ecd-89bc-7a4f33_D2
# 8 PNNL 11CO051 b3696374-c6c0-49dd-833e-596e26_D2
# 10 PNNL 11CO053 e1cd3d70-132b-452f-ba10-026721_D2
Another base R option is to use the function subset
subset(df, Specimen.Label %in% gsub("\\.", "-", vec))
Getting a row from a data frame as a vector in R
Data.frames created by importing data from a external source will have their data transformed to factors by default. If you do not want this set stringsAsFactors=FALSE
In this case to extract a row or a column as a vector you need to do something like this:
as.numeric(as.vector(DF[1,]))
or like this
as.character(as.vector(DF[1,]))
How do I delete rows from a data frame when the DF only has one column
The problem is that R is trying to be "helpful", and simplifying your data for you. The solution is to do the following (note two commas, not one):
df[-1,, drop = FALSE]
This will remove the specified row, and leave your data.frame otherwise untouched.
Select rows from a data frame based on values in a vector
Have a look at ?"%in%"
.
dt[dt$fct %in% vc,]
fct X
1 a 2
3 c 3
5 c 5
7 a 7
9 c 9
10 a 1
12 c 2
14 c 4
You could also use ?is.element
:
dt[is.element(dt$fct, vc),]
Related Topics
How to Define "Hidden Global Variables" Inside R Packages
Using Leaflet-Side-By-Side Plugin in R
How to Keep The Only Intersection of The Spatial Features & Remove Everything Outside of a Boundary
R: As.Posixct Timezone and Scale_X_Datetime Issues in My Dataset
Could Not Find Function Tagpos
Find Match of Two Data Frames and Rewrite The Answer as Data Frame
How to Subscript The X Axis Tick Label
Line Segments or Rectangles with Hover Information in R Plotly Figure
Check If a Program Is Installed
Group/Bin/Bucket Data in R and Get Count Per Bucket and Sum of Values Per Bucket
Creating a Cumulative Step Graph in R
Remove Whiskers in Box-Whisker-Plot
R -Apply- Convert Many Columns from Numeric to Factor
Change Font Size for All Inline Equations R Markdown
Using Read.Csv.Sql to Select Multiple Values from a Single Column
Separate String After Last Underscore
How to Align or Center The Bars of a Histogram on The X Axis