Convert a row of a data frame to vector
When you extract a single row from a data frame you get a one-row data frame. Convert it to a numeric vector:
as.numeric(df[1,])
As @Roland suggests, unlist(df[1,])
will convert the one-row data frame to a numeric vector without dropping the names. Therefore unname(unlist(df[1,]))
is another, slightly more explicit way to get to the same result.
As @Josh comments below, if you have a not-completely-numeric (alphabetic, factor, mixed ...) data frame, you need as.character(df[1,])
instead.
Convert a dataframe to a vector (by rows)
You can try as.vector(t(test))
. Please note that, if you want to do it by columns you should use unlist(test)
.
Getting a row from a data frame as a vector in R
Data.frames created by importing data from a external source will have their data transformed to factors by default. If you do not want this set stringsAsFactors=FALSE
In this case to extract a row or a column as a vector you need to do something like this:
as.numeric(as.vector(DF[1,]))
or like this
as.character(as.vector(DF[1,]))
Convert data.frame column to a vector?
I'm going to attempt to explain this without making any mistakes, but I'm betting this will attract a clarification or two in the comments.
A data frame is a list. When you subset a data frame using the name of a column and [
, what you're getting is a sublist (or a sub data frame). If you want the actual atomic column, you could use [[
, or somewhat confusingly (to me) you could do aframe[,2]
which returns a vector, not a sublist.
So try running this sequence and maybe things will be clearer:
avector <- as.vector(aframe['a2'])
class(avector)
avector <- aframe[['a2']]
class(avector)
avector <- aframe[,2]
class(avector)
Using R convert data.frame to simple vector
see ?unlist
Given a list structure x, unlist simplifies it to produce a vector
which contains all the atomic components which occur in x.
unlist(v.row)
[1] 177 165 177 177 177 177 145 132 126 132 132 132 126 120 145 167 167 167
167 165 177 177 177 177
EDIT
You can do it with as.vector
also, but you need to provide the correct mode:
as.vector(v.row,mode='numeric')
[1] 177 165 177 177 177 177 145 132 126 132 132 132 126 120 145 167 167
167 167 165 177 177 177 177
Get a row in data.frame as a vector where each element is a string
Use unlist
and then as.character
as.character(unlist(test[1, ]))
#[1] "no" "no" "no" "yes" "no" "no" "yes"
test[1, ]
is still a dataframe and applying as.character
on data frame doesn't work. We use unlist
to make dataframe to vector and then use as.character
to convert it into character.
data
test <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"),
T = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"),
L = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"),
B = structure(c(2L, 1L, 1L, 2L, 1L, 1L), .Label = c("no",
"yes"), class = "factor"), E = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "no", class = "factor"), X = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"), D = structure(c(2L,
1L, 1L, 2L, 1L, 1L), .Label = c("no", "yes"), class = "factor")),
class = "data.frame", row.names = c("4", "7", "11", "12", "17", "27"))
Convert a row into a combine, c() as a vector in r and then use vectors to calculate the cosine similarity
Another approach would be to use apply
over each row, which allows you to set the environment directly:
apply(df, 1, function(x) assign(x[1], tail(x, -1), envir = globalenv()))
However I agree with @danlooo's comment: I can't think of any reason that you would want to do this.
Edit: how to calculate cosine similarity matrix (following comment)
If you want to calculate a cosine similarity matrix it's better to start off with a matrix than to clutter up your global environment, and then have to do a potentially large combination of pairwise calculations.
First get the data into the right format, a numeric matrix with column names which are the first column of your data frame:
data_matrix <- tail(t(df), -1) |>
sapply(as.numeric) |>
matrix(
nrow = ncol(df) - 1,
ncol = nrow(df),
dimnames = list(
seq_len(ncol(df)-1), # rows
df[,1] # columns
)
)
data_matrix
# i1 i10 i11
# 1 0.11 0.07 0.114
# 2 0.07 0.08 0.030
Then it is straightforward to calculate the cosine similarity:
library(lsa)
cosine(data_matrix)
# i1 i10 i11
# i1 1.0000000 0.9595950 0.9525148
# i10 0.9595950 1.0000000 0.8283488
# i11 0.9525148 0.8283488 1.0000000
convert a row of a data frame to a simple vector in R
Example from mtcars
data
mydata<-mtcars
k<-mydata[1,]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
names(k)<-NULL
unlist(c(k))
[1] 21.00 6.00 160.00 110.00 3.90 2.62 16.46 0.00 1.00 4.00 4.00
Updated as per @Ananda: unlist(mydata[1, ], use.names = FALSE)
R: avoid turning one-row data frames into a vector when using apply functions
You can solve your problem by using lapply
instead of sapply
, and then combine the result using do.call
as follows
new_df <- as.data.frame(lapply(mydf[,-1,drop=F], function(x) gsub("\\s+","_",x)))
new_df <- do.call(cbind, new_df)
new_df
# value1 value2
#[1,] "A_1" "Z_1"
new_df <- cbind(mydf[,1,drop=F], new_df)
#new_df
# ID value1 value2
#1 A A_1 Z_1
As for your question about unpredictable behavior of sapply
, it is because s
in sapply
represent simplification, but the simplified result is not guaranteed to be a data frame. It can be a data frame, a matrix, or a vector.
According to the documentation of sapply
:
sapply is a user-friendly version and wrapper of lapply by default
returning a vector, matrix or, if simplify = "array", an array if
appropriate, by applying simplify2array().
On the simplify
argument:
logical or character string; should the result be simplified
to a vector, matrix or higher dimensional array if possible? For
sapply it must be named and not abbreviated. The default value, TRUE,
returns a vector or matrix if appropriate, whereas if simplify =
"array" the result may be an array of “rank” (=length(dim(.))) one
higher than the result of FUN(X[[i]]).
The Details part explain its behavior that loos similar with what you experienced (emphasis is from me) :
Simplification in sapply is only attempted if X has length greater
than zero and if the return values from all elements of X are all of
the same (positive) length. If the common length is one the result is
a vector, and if greater than one is a matrix with a column
corresponding to each element of X.
Hadley Wickham also recommend not to use sapply
:
I recommend that you avoid sapply() because it tries to simplify the
result, so it can return a list, a vector, or a matrix. This makes it
difficult to program with, and it should be avoided in non-interactive
settings
He also recommends not to use apply
with a data frame. See Advanced R for further explanation.
Related Topics
Find Match of Two Data Frames and Rewrite The Answer as Data Frame
Error with Scale_X_Labels in Ggplot2
Adding Text Labels to Tmap Plot
Use Different Font Sizes for Different Portions of Text in Ggplot2 Title
Get Data Out of a Tcltk Function
How to Align or Center The Bars of a Histogram on The X Axis
Ggplot2 Equivalent of 'Factorization or Categorization' in Googlevis in R
Calculate Differences Between Rows Faster Than a for Loop
How to Extract Coefficients' Standard Error from an "Aov" Model
"Update by Reference" Vs Shallow Copy
How to Create a Continuous Legend (Color Bar Style) for Scale_Alpha
Loop with a Defined Ggplot Function Over Multiple Dataframes
R Ddply with Multiple Variables
An Error in R: When I Try to Apply Outer Function:
Select List Element Programmatically Using Name Stored as String