Create a Data Frame of Unequal Lengths
Sorry this isn't exactly what you asked, but I think there may be another way to get what you want.
First, if the vectors are different lengths, the data isn't really tabular, is it? How about just save it to different CSV files? You might also try ascii formats that allow storing multiple objects (json, XML).
If you feel the data really is tabular, you could pad on NAs:
> x = 1:5
> y = 1:12
> max.len = max(length(x), length(y))
> x = c(x, rep(NA, max.len - length(x)))
> y = c(y, rep(NA, max.len - length(y)))
> x
[1] 1 2 3 4 5 NA NA NA NA NA NA NA
> y
[1] 1 2 3 4 5 6 7 8 9 10 11 12
If you absolutely must make a data.frame
with unequal columns you could subvert the check, at your own peril:
> x = 1:5
> y = 1:12
> df = list(x=x, y=y)
> attributes(df) = list(names = names(df),
row.names=1:max(length(x), length(y)), class='data.frame')
> df
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 <NA> 6
7 <NA> 7
[ reached getOption("max.print") -- omitted 5 rows ]]
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs
How do you make a data frame with unequal row lengths?
This is a better version of you function that does not remove any NA from your data:
(However, the function will still trip on non numeric values for x
, or in cases where scale
and center
are both FALSE. But one could ask oneself why a scale function needs a scale yes or no parameter??)
MyScale <- function (x, scale, center){
meanofdata <- mean(x, na.rm = TRUE)
stdofdata <- sd(x, na.rm = TRUE)
if (scale==TRUE){
calcvec <- (x - meanofdata)/stdofdata
return(calcvec)
}else if (center ==TRUE){
centervec <- x - meanofdata
return(centervec)
}
}
Create dataframe with columns of unequal length from other dataframes
You could, also, "subset" each dataframe like df[nrow(df) + n,]
in order to insert NA
s:
#dataframes of different rows
long <- data.frame(accepted = rnorm(15, 2000), cost = rnorm(15,5000))
long2 <- data.frame(accepted = rnorm(10, 2000), cost = rnorm(10,5000))
long3 <- data.frame(accepted = rnorm(12, 2000), cost = rnorm(12,5000))
#insert all dataframes in list to manipulate
myls <- list(long, long2, long3)
#maximum number of rows
max.rows <- max(nrow(long), nrow(long2), nrow(long3))
#insert the needed `NA`s to each dataframe
new_myls <- lapply(myls, function(x) { x[1:max.rows,] })
#create wanted dataframe
do.call(cbind, lapply(new_myls, `[`, "accepted"))
# accepted accepted accepted
#1 2001.581 1999.014 2001.810
#2 2000.071 2000.033 2000.588
#3 1999.931 2000.188 2000.833
#4 1998.467 1999.891 1997.645
#5 2000.682 2000.144 1999.639
#6 1999.693 1999.341 1998.959
#7 2000.222 1998.939 2002.271
#8 1999.104 1998.530 1997.600
#9 1998.435 2001.496 2001.129
#10 1998.160 2000.729 2001.602
#11 1999.267 NA 1999.733
#12 2000.048 NA 2001.431
#13 1999.504 NA NA
#14 2000.660 NA NA
#15 2000.160 NA NA
How to create a data frame using two unequal length of lists
Use np.repeat
like this:
df = pd.DataFrame({'table1': column1,
'table2': np.repeat(column2, len(column1) // len(column2))})
print(df)
# Output:
table1 table2
0 30 bat
1 40 bat
2 50 bat
3 60 ball
4 90 ball
5 20 ball
6 30 tent
7 20 tent
8 30 tent
Create pandas dataframe from list of lists with unequal lengths and NaN values
Just make all of your list elements to be lists:
pd.DataFrame([x if isinstance(x, list) else [x] for x in a])
To rename columns as given in the original example you could use:
pd.DataFrame([x if isinstance(x, list) else [x] for x in a]).rename(columns = lambda x: f"col_{x+1}")
Which gives:
col_1 col_2 col_3 col_4 col_5
0 1.0 NaN NaN NaN NaN
1 1.0 2.0 NaN NaN NaN
2 1.0 2.0 3.0 NaN NaN
3 NaN NaN NaN NaN NaN
4 1.0 2.0 3.0 4.0 NaN
5 NaN NaN NaN NaN NaN
6 1.0 2.0 3.0 4.0 5.0
Create a pandas dataframe from a nested lists of unequal lengths
The zip_longest
function from itertools
does this:
>>> import itertools, pandas
>>> pandas.DataFrame((_ for _ in itertools.zip_longest(*nest)), columns=['aa', 'bb', 'cc'])
aa bb cc
0 aa1 bb1 cc1
1 aa2 bb2 cc2
2 aa3 bb3 cc3
3 aa4 bb4 None
4 aa5 None None
If you have an older version of pandas you may need to wrap zip_longest
in a list constructor. On older Python you may need to call izip_longest
instead of zip_longest
.
Combining vectors of unequal length into a data frame
I think that you may be approaching this the wrong way:
If you have time series of unequal length then the absolute best thing to do is to keep them as time series and merge
them. Most time series packages allow this. So you will end up with a multi-variate time series and each value will be properly associated with the same date.
So put your time series into zoo
objects, merge
them, then use my qplot.zoo
function to plot them. That will deal with switching from zoo
into a long data frame.
Here's an example:
> z1 <- zoo(1:8, 1:8)
> z2 <- zoo(2:8, 2:8)
> z3 <- zoo(4:8, 4:8)
> nm <- list("z1", "z2", "z3")
> z <- zoo()
> for(i in 1:length(nm)) z <- merge(z, get(nm[[i]]))
> names(z) <- unlist(nm)
> z
z1 z2 z3
1 1 NA NA
2 2 2 NA
3 3 3 NA
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
>
> x.df <- data.frame(dates=index(x), coredata(x))
> x.df <- melt(x.df, id="dates", variable="val")
> ggplot(na.omit(x.df), aes(x=dates, y=value, group=val, colour=val)) + geom_line() + opts(legend.position = "none")
Related Topics
How to Create a Loop That Includes Both a Code Chunk and Text with Knitr in R
Automatically Delete Files/Folders
Code to Import Data from a Stack Overflow Query into R
Non-Equi Join Using Data.Table: Column Missing from the Output
Using Lists Inside Data.Table Columns
How to Update R Packages in Default Library on Windows 7
Perform a Semi-Join with Data.Table
Count Number of Zeros Per Row, and Remove Rows with More Than N Zeros
Converting Two Columns of a Data Frame to a Named Vector
Is Set.Seed Consistent Over Different Versions of R (And Ubuntu)
Dplyr - Using Column Names as Function Arguments
Making a Stacked Bar Plot for Multiple Variables - Ggplot2 in R
How to Remove Empty Factors from Ggplot2 Facets
Finding Point of Intersection in R
Merge and Perfectly Align Histogram and Boxplot Using Ggplot2
Ggplot2: Change Order of Display of a Factor Variable on an Axis