Combining Vectors of Unequal Length into a Data Frame

Merge 2 vectors with different lengths into a data frame

This one maybe:

sq <- seq(max(length(n), length(s)))
data.frame(n[sq], s[sq])

# n.sq. s.sq.
#1 2 aa
#2 3 bb
#3 5 cc
#4 6 <NA>

Combining vectors of unequal length into a data frame

I think that you may be approaching this the wrong way:

If you have time series of unequal length then the absolute best thing to do is to keep them as time series and merge them. Most time series packages allow this. So you will end up with a multi-variate time series and each value will be properly associated with the same date.

So put your time series into zoo objects, merge them, then use my qplot.zoo function to plot them. That will deal with switching from zoo into a long data frame.

Here's an example:

> z1 <- zoo(1:8, 1:8)
> z2 <- zoo(2:8, 2:8)
> z3 <- zoo(4:8, 4:8)
> nm <- list("z1", "z2", "z3")
> z <- zoo()
> for(i in 1:length(nm)) z <- merge(z, get(nm[[i]]))
> names(z) <- unlist(nm)
> z
z1 z2 z3
1 1 NA NA
2 2 2 NA
3 3 3 NA
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
>
> x.df <- data.frame(dates=index(x), coredata(x))
> x.df <- melt(x.df, id="dates", variable="val")
> ggplot(na.omit(x.df), aes(x=dates, y=value, group=val, colour=val)) + geom_line() + opts(legend.position = "none")

How to write two vectors of different length into one data frame by writing same values into same row?

One option is match

(tmp <- unique(c(ef1, ef2)))
# [1] "A1" "A2" "B0" "B1" "C1" "C2" "D1" "D2"

out <- data.frame(ef1 = ef1[match(tmp, ef1)],
ef2 = ef2[match(tmp, ef2)])

Result

out
# ef1 ef2
#1 A1 A1
#2 A2 A2
#3 B0 <NA>
#4 B1 <NA>
#5 C1 C1
#6 C2 C2
#7 <NA> D1
#8 <NA> D2

combine vectors of different length into data frame in R

There's a non-exported function charMat in my "splitstackshape" package that might be useful for something like this.

Here, I've used it in conjunction with mget:

## library(splitstackshape) # not required since you'll be using ::: anyway...
data.frame(t(splitstackshape:::charMat(mget(ls(pattern = "x\\d")), mode = "value")))
# X1 X2 X3 X4
# a a a a a
# b b b b b
# c c c c <NA>
# d d <NA> d <NA>
# e e <NA> <NA> e

Combining vectors of unequal length and non-unique values

I maintain that your problem might be solved in terms of the shortest common supersequence. It assumes that your two vectors each represent one sequence. Please give the code below a try.

If it still does not solve your problem, you'll have to explain exactly what you mean by "my vector contains not one but many sequences": define what you mean by a sequence and tell us how sequences can be identified by scanning through your two vectors.

Part I: given two sequences, find the longest common subsequence

LongestCommonSubsequence <- function(X, Y) {
m <- length(X)
n <- length(Y)
C <- matrix(0, 1 + m, 1 + n)
for (i in seq_len(m)) {
for (j in seq_len(n)) {
if (X[i] == Y[j]) {
C[i + 1, j + 1] = C[i, j] + 1
} else {
C[i + 1, j + 1] = max(C[i + 1, j], C[i, j + 1])
}
}
}

backtrack <- function(C, X, Y, i, j) {
if (i == 1 | j == 1) {
return(data.frame(I = c(), J = c(), LCS = c()))
} else if (X[i - 1] == Y[j - 1]) {
return(rbind(backtrack(C, X, Y, i - 1, j - 1),
data.frame(LCS = X[i - 1], I = i - 1, J = j - 1)))
} else if (C[i, j - 1] > C[i - 1, j]) {
return(backtrack(C, X, Y, i, j - 1))
} else {
return(backtrack(C, X, Y, i - 1, j))
}
}

return(backtrack(C, X, Y, m + 1, n + 1))
}

Part II: given two sequences, find the shortest common supersequence

ShortestCommonSupersequence <- function(X, Y) {
LCS <- LongestCommonSubsequence(X, Y)[c("I", "J")]
X.df <- data.frame(X = X, I = seq_along(X), stringsAsFactors = FALSE)
Y.df <- data.frame(Y = Y, J = seq_along(Y), stringsAsFactors = FALSE)
ALL <- merge(LCS, X.df, by = "I", all = TRUE)
ALL <- merge(ALL, Y.df, by = "J", all = TRUE)
ALL <- ALL[order(pmax(ifelse(is.na(ALL$I), 0, ALL$I),
ifelse(is.na(ALL$J), 0, ALL$J))), ]
ALL$SCS <- ifelse(is.na(ALL$X), ALL$Y, ALL$X)
ALL
}

Your Example:

ShortestCommonSupersequence(X = c("a","g","b","h","a","g","c"),
Y = c("a","g","b","a","g","b","h","c"))
# J I X Y SCS
# 1 1 1 a a a
# 2 2 2 g g g
# 3 3 3 b b b
# 9 NA 4 h <NA> h
# 4 4 5 a a a
# 5 5 6 g g g
# 6 6 NA <NA> b b
# 7 7 NA <NA> h h
# 8 8 7 c c c

(where the two updated vectors are in columns X and Y.)

How to convert a list consisting of vector of different lengths to a usable data frame in R?

try this:

word.list <- list(letters[1:4], letters[1:5], letters[1:2], letters[1:6])
n.obs <- sapply(word.list, length)
seq.max <- seq_len(max(n.obs))
mat <- t(sapply(word.list, "[", i = seq.max))

the trick is, that,

c(1:2)[1:4]

returns the vector + two NAs

The simplest way to convert a list with various length vectors to a data.frame in R

We can use

data.frame(lapply(aa, "length<-", max(lengths(aa))))

Or using tidyverse

library(dplyr)
library(tibble)
library(tidyr)
enframe(aa) %>%
unnest(value)


Related Topics



Leave a reply



Submit