Unlist Data Frame Column Preserving Information from Other Column

Unlist data frame column preserving information from other column

Here, the idea is to first get the length of each list element using sapply and then use rep to replicate the col1 with that length

 l1 <- sapply(myDataFrame$col2, length)
unlist.col1 <- rep(myDataFrame$col1, l1)
unlist.col1
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"

Or as suggested by @Ananda Mahto, the above could be also done with vapply

   with(myDataFrame, rep(col1, vapply(col2, length, 1L)))
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"

Unlist a data.frame column

For unlisting list-columns, you need to call unnest from the tidyr package.

unnest(dataframe, nameofcolumns)

Best,

Colin

Is it possible to unlist() listed data.frame while keeping other columns from data.frame?

The issue is related to df3 having a numeric for V3 while df6 is a character for V3. You can:

  1. Skip importing either df3$V3 or df6$V3
  2. Rename one of those variables

Also, to get rid of the warnings, you could create your data.frames with stringsAsFactors = FALSE or you could use tibble() instead of data.frame() as that's the default behavior of a tibble.

Edit: to better do option 2, you can use the code below to add a prefix to each variable.

my.list2 <- lapply(my.list, function(x) sapply(x, function(y) paste0(class(y), names(y))))
, function(x)
{
x%>%
rename_if(is.numeric, ~paste0('num', .x))%>%
rename_if(is.character, ~paste0('char', .x))%>%
rename_if(is.factor, ~paste0('fact', .x))
}
)

This is option 2 and it works with only the factor warnings:

df1<-data.frame(V1=c(sample(900:970,6)),
V2=c(sample(LETTERS[1:6],6)))

df2<-data.frame(V1=sample(750:780,6),
V2=sample(LETTERS[8:16],6))

df3<-data.frame(V1=sample(200:250,6),
V2=sample(LETTERS[10:20],6),
V4=sample(2300:5821,6)) #used to be V3

df4<-data.frame(V1=sample(396:480,6),
V2=sample(LETTERS,6))

df5<-data.frame(V1=sample(50:100,6),
V2=sample(LETTERS,6))

df6<-data.frame(V1=sample(200:250,6),
V2=sample(LETTERS,6),
V3=sample(letters,6))

my.list <- list(df1,df2,df3,df4,df5,df6)

mydf<-data.frame(
files=c("C:/Folder1/Data/File1.xlsx","C:/Folder1/Data/File2.xlsx",
"C:/Folder1/Data/File3.xlsx","C:/Folder2/Data/File1.xlsx",
"C:/Folder2/Data/File2.xlsx","C:/Folder2/Data/File3.xlsx"))

mydf$data<-my.list

unnest(mydf, data)

files V1 V2 V4 V3
1 C:/Folder1/Data/File1.xlsx 951 A NA <NA>
2 C:/Folder1/Data/File1.xlsx 932 F NA <NA>
3 C:/Folder1/Data/File1.xlsx 908 B NA <NA>
4 C:/Folder1/Data/File1.xlsx 953 C NA <NA>
5 C:/Folder1/Data/File1.xlsx 929 E NA <NA>
6 C:/Folder1/Data/File1.xlsx 928 D NA <NA>
7 C:/Folder1/Data/File2.xlsx 778 K NA <NA>
8 C:/Folder1/Data/File2.xlsx 771 H NA <NA>
9 C:/Folder1/Data/File2.xlsx 757 M NA <NA>
10 C:/Folder1/Data/File2.xlsx 773 P NA <NA>
11 C:/Folder1/Data/File2.xlsx 759 N NA <NA>
12 C:/Folder1/Data/File2.xlsx 765 O NA <NA>
13 C:/Folder1/Data/File3.xlsx 236 M 3964 <NA>
14 C:/Folder1/Data/File3.xlsx 214 O 5241 <NA>
...truncated

Unlist list of data frames to one data frame with change structure

library(tidyverse)

names(df) <- paste0('outcome', year)

df %>%
purrr::map_df(as.data.frame, .id = 'name') %>%
tidyr::pivot_wider(names_from = name, values_from = outcome) %>%
dplyr::arrange(id)

# A tibble: 20 x 12
id outcome1950 outcome1951 outcome1952 outcome1953 outcome1954 outcome1955 outcome1956 outcome1957 outcome1958 outcome1959 outcome1960
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 NA 0 NA 0 NA 0 NA NA 0 NA NA
2 2 1 NA 0 1 NA 1 NA 1 NA 0 NA
3 3 1 0 NA NA 1 NA NA 0 0 NA NA
4 4 0 0 NA NA NA 0 NA 1 0 1 NA
5 5 0 NA 0 NA NA NA 1 NA NA NA NA
6 6 0 NA 0 1 NA 1 1 NA 0 1 NA
7 7 NA 1 0 0 1 NA 1 0 NA NA 1
8 8 NA 0 0 0 1 NA NA 1 1 0 0
9 9 NA 0 0 NA 1 NA NA 1 NA 0 0
10 10 0 0 0 NA NA 0 1 NA NA 1 1
11 11 0 NA NA 1 NA NA 1 NA 1 0 NA
12 12 NA NA NA NA 0 1 NA NA 1 0 0
13 13 NA NA 0 NA 1 NA NA 1 0 1 0
14 14 0 1 NA NA 0 NA 0 NA NA 0 0
15 15 1 NA 1 0 NA 0 1 NA 1 NA NA
16 16 NA NA NA 0 0 0 NA NA 0 NA 0
17 17 NA NA NA NA NA NA NA 1 NA NA NA
18 18 NA NA NA NA NA NA 1 NA NA NA 1
19 19 1 0 NA 0 0 1 0 1 NA NA NA
20 20 NA 1 1 1 0 0 0 1 NA NA 0

Unlist data frame column and pasting them together

We can use stri_extract_all_regex from the stringi package to extract all the words which matches the pattern.

library(stringi)
med_pattern <- c("NOVOMIX|MIXTARD|METFORMIN|ASPART")
df$MEDICATION2 <- stri_extract_all_regex(df$MEDICATION, pattern = med_pattern)

As mentioned by @mt1022, the new column is a list. We may paste them together with

df$MEDICATION2<-paste(stri_extract_all_regex(df$MEDICATION,pattern = med_pattern)) 

However, it will not give some unwanted characters for lists with more than 1 element. This should give you the expected output.

chars <- stri_extract_all_regex(df$MEDICATION, pattern = med_pattern)
df$MEDICATION2 <- sapply(chars, paste, collapse = "-")
df$MEDICATION2

#[1] "NA" "NOVOMIX" "NOVOMIX" "NOVOMIX"
#[5] "MIXTARD" "MIXTARD" "MIXTARD" "MIXTARD"
#[9] "MIXTARD" "MIXTARD" "MIXTARD" "NOVOMIX"
#[13] "MIXTARD" "NA" "MIXTARD" "NOVOMIX"
#[17] "MIXTARD-NOVOMIX" "METFORMIN" "ASPART"

You can also do this in single line :

df$MEDICATION2 <- sapply(stri_extract_all_regex(df$MEDICATION, 
pattern = med_pattern), paste, collapse = "-")

R - Unlist Data_frame column of lists in tidy manner

If you want to keep your data in "long" format, you can do:

example_data %>% unnest(observations) 
   ID location observations
1 1 A e
2 1 A x
3 1 A w
...
44 5 E u
45 5 E o
46 5 E z

To spread the data to "wide" format, as in your example, you can do:

library(stringr)

example_data %>% unnest(observations) %>%
group_by(location) %>%
mutate(counter=paste0("Obs_", str_pad(1:n(),2,"left","0"))) %>%
spread(counter, observations)
     ID location Obs_01 Obs_02 Obs_03 Obs_04 Obs_05 Obs_06 Obs_07 Obs_08 Obs_09 Obs_10 Obs_11
* <int> <fctr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 A e x w c s j k t z <NA> <NA>
2 2 B k u d h z x <NA> <NA> <NA> <NA> <NA>
3 3 C v z m o s f n c r u b
4 4 D z i m s a v n r e t x
5 5 E f b g h a d u o z <NA> <NA>

Unlist data frame column preserving information from other column

Here, the idea is to first get the length of each list element using sapply and then use rep to replicate the col1 with that length

 l1 <- sapply(myDataFrame$col2, length)
unlist.col1 <- rep(myDataFrame$col1, l1)
unlist.col1
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"

Or as suggested by @Ananda Mahto, the above could be also done with vapply

   with(myDataFrame, rep(col1, vapply(col2, length, 1L)))
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"


Related Topics



Leave a reply



Submit