In R: Joining vector elements by row, converting vector rows to strings
Yes, there is. It is called "apply" ;-)
apply(d,1,paste,collapse=" ")
[1] "Data 2" "Data 73"
# convert to matrix using as.matrix to get exactly your solution
See ?apply
and ?paste
Combine row values into character vector by condition
One option would be the tidyverse, where you can accomplish this a little more succinctly. The basic idea is the same:
library(tidyverse)
new.result <- df %>%
group_by(col1) %>%
summarize(
col2 = ifelse(n() == 1, as.character(col2), paste(min(col2), max(col2), sep = '-'))
)
col1 col2
<chr> <chr>
1 A 1995-1997
2 B 1999-2000
3 C 2005
A different (but possibly overcomplicated) approach assumes that you have at most two years per grouping. We can pivot the start and end years into their own columns, and then paste them together directly. This requires a little more data transformation but avoids having to check explicitly for groups with 1 year:
df %>%
group_by(col1) %>%
mutate(n = row_number()) %>%
pivot_wider(names_from = n, values_from = col2) %>%
rowwise() %>%
mutate(
vec = list(c(`1`, `2`)),
col2 = paste(vec[!is.na(vec)], collapse = '-')
) %>%
select(col1, col2)
concatenate vector of strings into a single string - for each row in df
You are close to the right code, just add collapse and work on rows with margin=1:
apply(data, 1, paste,collapse=" ")
[1] "abc fghi m" " j " "de kl "
from documentation
collapse an optional character string to separate the results.
To integrate the output in your dataset:
data$pasted<-apply(data, 1, paste,collapse=" ")
> data
x1 x2 x3 pasted
1 abc fghi m abc fghi m
2 j j
3 de kl de kl
Convert a row into a combine, c() as a vector in r and then use vectors to calculate the cosine similarity
Another approach would be to use apply
over each row, which allows you to set the environment directly:
apply(df, 1, function(x) assign(x[1], tail(x, -1), envir = globalenv()))
However I agree with @danlooo's comment: I can't think of any reason that you would want to do this.
Edit: how to calculate cosine similarity matrix (following comment)
If you want to calculate a cosine similarity matrix it's better to start off with a matrix than to clutter up your global environment, and then have to do a potentially large combination of pairwise calculations.
First get the data into the right format, a numeric matrix with column names which are the first column of your data frame:
data_matrix <- tail(t(df), -1) |>
sapply(as.numeric) |>
matrix(
nrow = ncol(df) - 1,
ncol = nrow(df),
dimnames = list(
seq_len(ncol(df)-1), # rows
df[,1] # columns
)
)
data_matrix
# i1 i10 i11
# 1 0.11 0.07 0.114
# 2 0.07 0.08 0.030
Then it is straightforward to calculate the cosine similarity:
library(lsa)
cosine(data_matrix)
# i1 i10 i11
# i1 1.0000000 0.9595950 0.9525148
# i10 0.9595950 1.0000000 0.8283488
# i11 0.9525148 0.8283488 1.0000000
Get a row in data.frame as a vector where each element is a string
Use unlist
and then as.character
as.character(unlist(test[1, ]))
#[1] "no" "no" "no" "yes" "no" "no" "yes"
test[1, ]
is still a dataframe and applying as.character
on data frame doesn't work. We use unlist
to make dataframe to vector and then use as.character
to convert it into character.
data
test <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"),
T = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"),
L = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"),
B = structure(c(2L, 1L, 1L, 2L, 1L, 1L), .Label = c("no",
"yes"), class = "factor"), E = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "no", class = "factor"), X = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"), D = structure(c(2L,
1L, 1L, 2L, 1L, 1L), .Label = c("no", "yes"), class = "factor")),
class = "data.frame", row.names = c("4", "7", "11", "12", "17", "27"))
R: Convert vectors of arbitrary concatenated variable names and values to single data frame
We can do this with bind_rows
easily
library(dplyr)
bind_rows(do.call(Map, c(f = setNames, lapply(unname(data)[2:1], strsplit, ","))))
# A tibble: 3 x 8
# a b c e j d f k
#* <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 212 12 sfd 3 1 <NA> <NA> <NA>
#2 23 <NA> <NA> <NA> <NA> fds g <NA>
#3 w w2 <NA> <NA> <NA> <NA> df sdf
Or it can be
bind_rows(do.call(Map, c(f = function(x, y)
setNames(as.list(x), y), lapply(unname(data)[2:1], strsplit, ","))))
Or another option is unnest_wider
from tidyr
library(tidyr)
library(purrr)
data %>%
mutate_all(strsplit, ",") %>%
transmute(new = map2(values, var_names, ~ set_names(as.list(.x), .y))) %>%
unnest_wider(c(new))
# A tibble: 3 x 8
# a b c e j d f k
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 212 12 sfd 3 1 <NA> <NA> <NA>
#2 23 <NA> <NA> <NA> <NA> fds g <NA>
#3 w w2 <NA> <NA> <NA> <NA> df sdf
Or using rbindlist
from data.table
library(data.table)
rbindlist(do.call(Map, c(f = function(x, y)
setNames(as.list(x), y), lapply(unname(data)[2:1], strsplit, ","))),
fill = TRUE)
# a b c e j d f k
#1: 212 12 sfd 3 1 <NA> <NA> <NA>
#2: 23 <NA> <NA> <NA> <NA> fds g <NA>
#3: w w2 <NA> <NA> <NA> <NA> df sdf
Concatenate a vector of strings/character
Try using an empty collapse argument within the paste function:
paste(sdata, collapse = '')
Thanks to http://twitter.com/onelinetips/status/7491806343
Convert a row of a data frame to vector
When you extract a single row from a data frame you get a one-row data frame. Convert it to a numeric vector:
as.numeric(df[1,])
As @Roland suggests, unlist(df[1,])
will convert the one-row data frame to a numeric vector without dropping the names. Therefore unname(unlist(df[1,]))
is another, slightly more explicit way to get to the same result.
As @Josh comments below, if you have a not-completely-numeric (alphabetic, factor, mixed ...) data frame, you need as.character(df[1,])
instead.
Row-wise flatten_chr() or unlist() to convert string to vector
You can convert the text column into a character vector and then see if the code is within that vector. The benefit of this is that the discharge_codes
are now available for other uses, if needed.
library(dplyr)
library(purrr)
library(stringr)
mre %>%
mutate(discharge_codes = str_split(discharge_codes, "_"),
match = map2_lgl(discharge_codes, follow_up_code, ~ .y %in% .x))
You can see that discharge_codes
is now a list column with character vectors.
# A tibble: 3 x 4
patient_id discharge_codes follow_up_code match
<dbl> <list> <chr> <lgl>
1 1234 <chr [3]> A TRUE
2 4567 <chr [3]> C FALSE
3 7890 <chr [3]> E TRUE
Related Topics
Calculate the Derivative of a Data-Function in R
Update() Inside a Function Only Searches the Global Environment
Convert a File Encoding Using R? (Ansi to Utf-8)
Two Y Axis in Highcharter in R
Converting Utc Time to Local Standard Time in R
Print a Data Frame with Columns Aligned (As Displayed in R)
How to Set Axis Ranges in Ggplot2 When Using a Log Scale
Refer to Range of Columns by Name in R
Ggplot Inserting Space Before Degree Symbol on Axis Label
Scientific Notation Issue in R
Efficiently Counting Non-Na Elements in Data.Table
How to Find Which Polygon a Point Belong to via Sf
Frustration Using Rjava to Call a Third Party Java Jar
How to Plot X-Axis Labels and Bars Between Tick Marks in Ggplot2 Bar Plot
How to Install the Odbc Driver for Snowflake Successfully on an M1 Apple Silicon MAC