Unlist data frame column preserving information from other column
Here, the idea is to first get the length of each list element using sapply
and then use rep
to replicate the col1
with that length
l1 <- sapply(myDataFrame$col2, length)
unlist.col1 <- rep(myDataFrame$col1, l1)
unlist.col1
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Or as suggested by @Ananda Mahto, the above could be also done with vapply
with(myDataFrame, rep(col1, vapply(col2, length, 1L)))
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Unlist a data.frame column
For unlisting list-columns, you need to call unnest
from the tidyr
package.
unnest(dataframe, nameofcolumns)
Best,
Colin
Is it possible to unlist() listed data.frame while keeping other columns from data.frame?
The issue is related to df3 having a numeric for V3
while df6 is a character for V3
. You can:
- Skip importing either
df3$V3
ordf6$V3
- Rename one of those variables
Also, to get rid of the warnings, you could create your data.frames with stringsAsFactors = FALSE
or you could use tibble()
instead of data.frame()
as that's the default behavior of a tibble.
Edit: to better do option 2, you can use the code below to add a prefix to each variable.
my.list2 <- lapply(my.list, function(x) sapply(x, function(y) paste0(class(y), names(y))))
, function(x)
{
x%>%
rename_if(is.numeric, ~paste0('num', .x))%>%
rename_if(is.character, ~paste0('char', .x))%>%
rename_if(is.factor, ~paste0('fact', .x))
}
)
This is option 2 and it works with only the factor warnings:
df1<-data.frame(V1=c(sample(900:970,6)),
V2=c(sample(LETTERS[1:6],6)))
df2<-data.frame(V1=sample(750:780,6),
V2=sample(LETTERS[8:16],6))
df3<-data.frame(V1=sample(200:250,6),
V2=sample(LETTERS[10:20],6),
V4=sample(2300:5821,6)) #used to be V3
df4<-data.frame(V1=sample(396:480,6),
V2=sample(LETTERS,6))
df5<-data.frame(V1=sample(50:100,6),
V2=sample(LETTERS,6))
df6<-data.frame(V1=sample(200:250,6),
V2=sample(LETTERS,6),
V3=sample(letters,6))
my.list <- list(df1,df2,df3,df4,df5,df6)
mydf<-data.frame(
files=c("C:/Folder1/Data/File1.xlsx","C:/Folder1/Data/File2.xlsx",
"C:/Folder1/Data/File3.xlsx","C:/Folder2/Data/File1.xlsx",
"C:/Folder2/Data/File2.xlsx","C:/Folder2/Data/File3.xlsx"))
mydf$data<-my.list
unnest(mydf, data)
files V1 V2 V4 V3
1 C:/Folder1/Data/File1.xlsx 951 A NA <NA>
2 C:/Folder1/Data/File1.xlsx 932 F NA <NA>
3 C:/Folder1/Data/File1.xlsx 908 B NA <NA>
4 C:/Folder1/Data/File1.xlsx 953 C NA <NA>
5 C:/Folder1/Data/File1.xlsx 929 E NA <NA>
6 C:/Folder1/Data/File1.xlsx 928 D NA <NA>
7 C:/Folder1/Data/File2.xlsx 778 K NA <NA>
8 C:/Folder1/Data/File2.xlsx 771 H NA <NA>
9 C:/Folder1/Data/File2.xlsx 757 M NA <NA>
10 C:/Folder1/Data/File2.xlsx 773 P NA <NA>
11 C:/Folder1/Data/File2.xlsx 759 N NA <NA>
12 C:/Folder1/Data/File2.xlsx 765 O NA <NA>
13 C:/Folder1/Data/File3.xlsx 236 M 3964 <NA>
14 C:/Folder1/Data/File3.xlsx 214 O 5241 <NA>
...truncated
Unlist list of data frames to one data frame with change structure
library(tidyverse)
names(df) <- paste0('outcome', year)
df %>%
purrr::map_df(as.data.frame, .id = 'name') %>%
tidyr::pivot_wider(names_from = name, values_from = outcome) %>%
dplyr::arrange(id)
# A tibble: 20 x 12
id outcome1950 outcome1951 outcome1952 outcome1953 outcome1954 outcome1955 outcome1956 outcome1957 outcome1958 outcome1959 outcome1960
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 NA 0 NA 0 NA 0 NA NA 0 NA NA
2 2 1 NA 0 1 NA 1 NA 1 NA 0 NA
3 3 1 0 NA NA 1 NA NA 0 0 NA NA
4 4 0 0 NA NA NA 0 NA 1 0 1 NA
5 5 0 NA 0 NA NA NA 1 NA NA NA NA
6 6 0 NA 0 1 NA 1 1 NA 0 1 NA
7 7 NA 1 0 0 1 NA 1 0 NA NA 1
8 8 NA 0 0 0 1 NA NA 1 1 0 0
9 9 NA 0 0 NA 1 NA NA 1 NA 0 0
10 10 0 0 0 NA NA 0 1 NA NA 1 1
11 11 0 NA NA 1 NA NA 1 NA 1 0 NA
12 12 NA NA NA NA 0 1 NA NA 1 0 0
13 13 NA NA 0 NA 1 NA NA 1 0 1 0
14 14 0 1 NA NA 0 NA 0 NA NA 0 0
15 15 1 NA 1 0 NA 0 1 NA 1 NA NA
16 16 NA NA NA 0 0 0 NA NA 0 NA 0
17 17 NA NA NA NA NA NA NA 1 NA NA NA
18 18 NA NA NA NA NA NA 1 NA NA NA 1
19 19 1 0 NA 0 0 1 0 1 NA NA NA
20 20 NA 1 1 1 0 0 0 1 NA NA 0
Unlist data frame column and pasting them together
We can use stri_extract_all_regex
from the stringi
package to extract all the words which matches the pattern.
library(stringi)
med_pattern <- c("NOVOMIX|MIXTARD|METFORMIN|ASPART")
df$MEDICATION2 <- stri_extract_all_regex(df$MEDICATION, pattern = med_pattern)
As mentioned by @mt1022, the new column is a list. We may paste
them together with
df$MEDICATION2<-paste(stri_extract_all_regex(df$MEDICATION,pattern = med_pattern))
However, it will not give some unwanted characters for lists with more than 1 element. This should give you the expected output.
chars <- stri_extract_all_regex(df$MEDICATION, pattern = med_pattern)
df$MEDICATION2 <- sapply(chars, paste, collapse = "-")
df$MEDICATION2
#[1] "NA" "NOVOMIX" "NOVOMIX" "NOVOMIX"
#[5] "MIXTARD" "MIXTARD" "MIXTARD" "MIXTARD"
#[9] "MIXTARD" "MIXTARD" "MIXTARD" "NOVOMIX"
#[13] "MIXTARD" "NA" "MIXTARD" "NOVOMIX"
#[17] "MIXTARD-NOVOMIX" "METFORMIN" "ASPART"
You can also do this in single line :
df$MEDICATION2 <- sapply(stri_extract_all_regex(df$MEDICATION,
pattern = med_pattern), paste, collapse = "-")
R - Unlist Data_frame column of lists in tidy manner
If you want to keep your data in "long" format, you can do:
example_data %>% unnest(observations)
ID location observations
1 1 A e
2 1 A x
3 1 A w
...
44 5 E u
45 5 E o
46 5 E z
To spread the data to "wide" format, as in your example, you can do:
library(stringr)
example_data %>% unnest(observations) %>%
group_by(location) %>%
mutate(counter=paste0("Obs_", str_pad(1:n(),2,"left","0"))) %>%
spread(counter, observations)
ID location Obs_01 Obs_02 Obs_03 Obs_04 Obs_05 Obs_06 Obs_07 Obs_08 Obs_09 Obs_10 Obs_11
* <int> <fctr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 A e x w c s j k t z <NA> <NA>
2 2 B k u d h z x <NA> <NA> <NA> <NA> <NA>
3 3 C v z m o s f n c r u b
4 4 D z i m s a v n r e t x
5 5 E f b g h a d u o z <NA> <NA>
Unlist data frame column preserving information from other column
Here, the idea is to first get the length of each list element using sapply
and then use rep
to replicate the col1
with that length
l1 <- sapply(myDataFrame$col2, length)
unlist.col1 <- rep(myDataFrame$col1, l1)
unlist.col1
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Or as suggested by @Ananda Mahto, the above could be also done with vapply
with(myDataFrame, rep(col1, vapply(col2, length, 1L)))
#[1] "A" "A" "A" "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D"
Related Topics
Ggplot Combining Two Plots from Different Data.Frames
Forcing Garbage Collection to Run in R With the Gc() Command
Locate the ".Rprofile" File Generating Default Options
Rotating X Axis Labels in R For Barplot
Formatting Dates on X Axis in Ggplot2
Figure Position in Markdown When Converting to Pdf With Knitr and Pandoc
Create New Dummy Variable Columns from Categorical Variable
Convert Hour:Minute:Second (Hh:Mm:Ss) String to Proper Time Class
Getting Warning: " 'Newdata' Had 1 Row But Variables Found Have 32 Rows" on Predict.Lm
Why Do R Objects Not Print in a Function or a "For" Loop
Assign Multiple Columns Using := in Data.Table, by Group
Using the Rjava Package on Win7 64 Bit With R