In R: Joining Vector Elements by Row, Converting Vector Rows to Strings

In R: Joining vector elements by row, converting vector rows to strings

Yes, there is. It is called "apply" ;-)

apply(d,1,paste,collapse=" ")
[1] "Data 2" "Data 73"
# convert to matrix using as.matrix to get exactly your solution

See ?apply and ?paste

Combine row values into character vector by condition

One option would be the tidyverse, where you can accomplish this a little more succinctly. The basic idea is the same:

library(tidyverse)

new.result <- df %>%
group_by(col1) %>%
summarize(
col2 = ifelse(n() == 1, as.character(col2), paste(min(col2), max(col2), sep = '-'))
)

col1 col2
<chr> <chr>
1 A 1995-1997
2 B 1999-2000
3 C 2005

A different (but possibly overcomplicated) approach assumes that you have at most two years per grouping. We can pivot the start and end years into their own columns, and then paste them together directly. This requires a little more data transformation but avoids having to check explicitly for groups with 1 year:

df %>% 
group_by(col1) %>%
mutate(n = row_number()) %>%
pivot_wider(names_from = n, values_from = col2) %>%
rowwise() %>%
mutate(
vec = list(c(`1`, `2`)),
col2 = paste(vec[!is.na(vec)], collapse = '-')
) %>%
select(col1, col2)

concatenate vector of strings into a single string - for each row in df

You are close to the right code, just add collapse and work on rows with margin=1:

apply(data, 1, paste,collapse=" ")
[1] "abc fghi m" " j " "de kl "

from documentation

collapse an optional character string to separate the results.

To integrate the output in your dataset:

data$pasted<-apply(data, 1, paste,collapse=" ")
> data
x1 x2 x3 pasted
1 abc fghi m abc fghi m
2 j j
3 de kl de kl

Convert a row into a combine, c() as a vector in r and then use vectors to calculate the cosine similarity

Another approach would be to use apply over each row, which allows you to set the environment directly:

apply(df, 1, function(x) assign(x[1], tail(x, -1), envir = globalenv()))

However I agree with @danlooo's comment: I can't think of any reason that you would want to do this.

Edit: how to calculate cosine similarity matrix (following comment)

If you want to calculate a cosine similarity matrix it's better to start off with a matrix than to clutter up your global environment, and then have to do a potentially large combination of pairwise calculations.

First get the data into the right format, a numeric matrix with column names which are the first column of your data frame:

data_matrix  <- tail(t(df), -1) |>
sapply(as.numeric) |>
matrix(
nrow = ncol(df) - 1,
ncol = nrow(df),
dimnames = list(
seq_len(ncol(df)-1), # rows
df[,1] # columns
)
)

data_matrix
# i1 i10 i11
# 1 0.11 0.07 0.114
# 2 0.07 0.08 0.030

Then it is straightforward to calculate the cosine similarity:


library(lsa)
cosine(data_matrix)

# i1 i10 i11
# i1 1.0000000 0.9595950 0.9525148
# i10 0.9595950 1.0000000 0.8283488
# i11 0.9525148 0.8283488 1.0000000

Get a row in data.frame as a vector where each element is a string

Use unlist and then as.character

as.character(unlist(test[1, ]))
#[1] "no" "no" "no" "yes" "no" "no" "yes"

test[1, ] is still a dataframe and applying as.character on data frame doesn't work. We use unlist to make dataframe to vector and then use as.character to convert it into character.

data

test <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"), 
T = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"),
L = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"),
B = structure(c(2L, 1L, 1L, 2L, 1L, 1L), .Label = c("no",
"yes"), class = "factor"), E = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "no", class = "factor"), X = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "no", class = "factor"), D = structure(c(2L,
1L, 1L, 2L, 1L, 1L), .Label = c("no", "yes"), class = "factor")),
class = "data.frame", row.names = c("4", "7", "11", "12", "17", "27"))

R: Convert vectors of arbitrary concatenated variable names and values to single data frame

We can do this with bind_rows easily

library(dplyr)
bind_rows(do.call(Map, c(f = setNames, lapply(unname(data)[2:1], strsplit, ","))))
# A tibble: 3 x 8
# a b c e j d f k
#* <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 212 12 sfd 3 1 <NA> <NA> <NA>
#2 23 <NA> <NA> <NA> <NA> fds g <NA>
#3 w w2 <NA> <NA> <NA> <NA> df sdf

Or it can be

bind_rows(do.call(Map, c(f = function(x, y)
setNames(as.list(x), y), lapply(unname(data)[2:1], strsplit, ","))))

Or another option is unnest_wider from tidyr

library(tidyr)
library(purrr)
data %>%
mutate_all(strsplit, ",") %>%
transmute(new = map2(values, var_names, ~ set_names(as.list(.x), .y))) %>%
unnest_wider(c(new))
# A tibble: 3 x 8
# a b c e j d f k
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 212 12 sfd 3 1 <NA> <NA> <NA>
#2 23 <NA> <NA> <NA> <NA> fds g <NA>
#3 w w2 <NA> <NA> <NA> <NA> df sdf

Or using rbindlist from data.table

library(data.table)
rbindlist(do.call(Map, c(f = function(x, y)
setNames(as.list(x), y), lapply(unname(data)[2:1], strsplit, ","))),
fill = TRUE)
# a b c e j d f k
#1: 212 12 sfd 3 1 <NA> <NA> <NA>
#2: 23 <NA> <NA> <NA> <NA> fds g <NA>
#3: w w2 <NA> <NA> <NA> <NA> df sdf

Concatenate a vector of strings/character

Try using an empty collapse argument within the paste function:

paste(sdata, collapse = '')

Thanks to http://twitter.com/onelinetips/status/7491806343

Convert a row of a data frame to vector

When you extract a single row from a data frame you get a one-row data frame. Convert it to a numeric vector:

as.numeric(df[1,])

As @Roland suggests, unlist(df[1,]) will convert the one-row data frame to a numeric vector without dropping the names. Therefore unname(unlist(df[1,])) is another, slightly more explicit way to get to the same result.

As @Josh comments below, if you have a not-completely-numeric (alphabetic, factor, mixed ...) data frame, you need as.character(df[1,]) instead.

Row-wise flatten_chr() or unlist() to convert string to vector

You can convert the text column into a character vector and then see if the code is within that vector. The benefit of this is that the discharge_codes are now available for other uses, if needed.

library(dplyr)
library(purrr)
library(stringr)

mre %>%
mutate(discharge_codes = str_split(discharge_codes, "_"),
match = map2_lgl(discharge_codes, follow_up_code, ~ .y %in% .x))

You can see that discharge_codes is now a list column with character vectors.

# A tibble: 3 x 4
patient_id discharge_codes follow_up_code match
<dbl> <list> <chr> <lgl>
1 1234 <chr [3]> A TRUE
2 4567 <chr [3]> C FALSE
3 7890 <chr [3]> E TRUE


Related Topics



Leave a reply



Submit