Create a Data Frame of Unequal Lengths

Create a Data Frame of Unequal Lengths

Sorry this isn't exactly what you asked, but I think there may be another way to get what you want.

First, if the vectors are different lengths, the data isn't really tabular, is it? How about just save it to different CSV files? You might also try ascii formats that allow storing multiple objects (json, XML).

If you feel the data really is tabular, you could pad on NAs:

> x = 1:5
> y = 1:12
> max.len = max(length(x), length(y))
> x = c(x, rep(NA, max.len - length(x)))
> y = c(y, rep(NA, max.len - length(y)))
> x
[1] 1 2 3 4 5 NA NA NA NA NA NA NA
> y
[1] 1 2 3 4 5 6 7 8 9 10 11 12

If you absolutely must make a data.frame with unequal columns you could subvert the check, at your own peril:

> x = 1:5
> y = 1:12
> df = list(x=x, y=y)
> attributes(df) = list(names = names(df),
row.names=1:max(length(x), length(y)), class='data.frame')
> df
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 <NA> 6
7 <NA> 7
[ reached getOption("max.print") -- omitted 5 rows ]]
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs

How do you make a data frame with unequal row lengths?

This is a better version of you function that does not remove any NA from your data:

(However, the function will still trip on non numeric values for x, or in cases where scale and center are both FALSE. But one could ask oneself why a scale function needs a scale yes or no parameter??)

MyScale <- function (x, scale, center){
meanofdata <- mean(x, na.rm = TRUE)
stdofdata <- sd(x, na.rm = TRUE)

if (scale==TRUE){
calcvec <- (x - meanofdata)/stdofdata
return(calcvec)
}else if (center ==TRUE){
centervec <- x - meanofdata
return(centervec)
}
}

Create dataframe with columns of unequal length from other dataframes

You could, also, "subset" each dataframe like df[nrow(df) + n,] in order to insert NAs:

#dataframes of different rows
long <- data.frame(accepted = rnorm(15, 2000), cost = rnorm(15,5000))
long2 <- data.frame(accepted = rnorm(10, 2000), cost = rnorm(10,5000))
long3 <- data.frame(accepted = rnorm(12, 2000), cost = rnorm(12,5000))

#insert all dataframes in list to manipulate
myls <- list(long, long2, long3)

#maximum number of rows
max.rows <- max(nrow(long), nrow(long2), nrow(long3))

#insert the needed `NA`s to each dataframe
new_myls <- lapply(myls, function(x) { x[1:max.rows,] })

#create wanted dataframe
do.call(cbind, lapply(new_myls, `[`, "accepted"))

# accepted accepted accepted
#1 2001.581 1999.014 2001.810
#2 2000.071 2000.033 2000.588
#3 1999.931 2000.188 2000.833
#4 1998.467 1999.891 1997.645
#5 2000.682 2000.144 1999.639
#6 1999.693 1999.341 1998.959
#7 2000.222 1998.939 2002.271
#8 1999.104 1998.530 1997.600
#9 1998.435 2001.496 2001.129
#10 1998.160 2000.729 2001.602
#11 1999.267 NA 1999.733
#12 2000.048 NA 2001.431
#13 1999.504 NA NA
#14 2000.660 NA NA
#15 2000.160 NA NA

How to create a data frame using two unequal length of lists

Use np.repeat like this:

df = pd.DataFrame({'table1': column1,
'table2': np.repeat(column2, len(column1) // len(column2))})
print(df)

# Output:
table1 table2
0 30 bat
1 40 bat
2 50 bat
3 60 ball
4 90 ball
5 20 ball
6 30 tent
7 20 tent
8 30 tent

Create pandas dataframe from list of lists with unequal lengths and NaN values

Just make all of your list elements to be lists:

pd.DataFrame([x if isinstance(x, list) else [x] for x in a])

To rename columns as given in the original example you could use:

pd.DataFrame([x if isinstance(x, list) else [x] for x in a]).rename(columns = lambda x: f"col_{x+1}")

Which gives:

   col_1  col_2  col_3  col_4  col_5
0 1.0 NaN NaN NaN NaN
1 1.0 2.0 NaN NaN NaN
2 1.0 2.0 3.0 NaN NaN
3 NaN NaN NaN NaN NaN
4 1.0 2.0 3.0 4.0 NaN
5 NaN NaN NaN NaN NaN
6 1.0 2.0 3.0 4.0 5.0

Create a pandas dataframe from a nested lists of unequal lengths

The zip_longest function from itertools does this:

>>> import itertools, pandas
>>> pandas.DataFrame((_ for _ in itertools.zip_longest(*nest)), columns=['aa', 'bb', 'cc'])
aa bb cc
0 aa1 bb1 cc1
1 aa2 bb2 cc2
2 aa3 bb3 cc3
3 aa4 bb4 None
4 aa5 None None

If you have an older version of pandas you may need to wrap zip_longest in a list constructor. On older Python you may need to call izip_longest instead of zip_longest.

Combining vectors of unequal length into a data frame

I think that you may be approaching this the wrong way:

If you have time series of unequal length then the absolute best thing to do is to keep them as time series and merge them. Most time series packages allow this. So you will end up with a multi-variate time series and each value will be properly associated with the same date.

So put your time series into zoo objects, merge them, then use my qplot.zoo function to plot them. That will deal with switching from zoo into a long data frame.

Here's an example:

> z1 <- zoo(1:8, 1:8)
> z2 <- zoo(2:8, 2:8)
> z3 <- zoo(4:8, 4:8)
> nm <- list("z1", "z2", "z3")
> z <- zoo()
> for(i in 1:length(nm)) z <- merge(z, get(nm[[i]]))
> names(z) <- unlist(nm)
> z
z1 z2 z3
1 1 NA NA
2 2 2 NA
3 3 3 NA
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
>
> x.df <- data.frame(dates=index(x), coredata(x))
> x.df <- melt(x.df, id="dates", variable="val")
> ggplot(na.omit(x.df), aes(x=dates, y=value, group=val, colour=val)) + geom_line() + opts(legend.position = "none")


Related Topics



Leave a reply



Submit