R - Store a Matrix into a Single Dataframe Cell

R - store a matrix into a single dataframe cell

I think the trick may be to insert it in as a list:

set.seed(123)
dat <- data.frame(women, m=I(replicate(nrow(women), matrix(rnorm(4), 2, 2),
simplify=FALSE)))

str(dat)
'data.frame': 15 obs. of 3 variables:
$ height: num 58 59 60 61 62 63 64 65 66 67 ...
$ weight: num 115 117 120 123 126 129 132 135 139 142 ...
$ m :List of 15
..$ : num [1:2, 1:2] -0.5605 -0.2302 1.5587 0.0705
..$ : num [1:2, 1:2] 0.129 1.715 0.461 -1.265
...
..$ : num [1:2, 1:2] -1.549 0.585 0.124 0.216
..- attr(*, "class")= chr "AsIs"

dat[[1, "m"]]
[,1] [,2]
[1,] -0.5604756 1.55870831
[2,] -0.2301775 0.07050839

dat[[2, "m"]]
[,1] [,2]
[1,] 0.1292877 0.4609162
[2,] 1.7150650 -1.2650612

EDIT: So the question really is about initialising and then assigning. Given that, you should be able to define a data.frame like the one in your question like so:

data.frame(i=1:5, m=I(vector(mode="list", length=5)))

You can then assign to it like so:

dat[[2, "m"]] <- matrix(rnorm(9), 3, 3)

How to place a matrix as an element of a data.frame in R?

We can wrap the matrix in a list and then assign it to the cell.

dataframe$out[1] <- list(matrixObj)

Creating data.frames where one column contains matrices

You have two issues:

  • To store a matrix in a data.frame (tibble), you simply have to put it in a list.
  • To create 2 x 2 matrices (instead of repeating the same 4 x 32 matrix in each cell), you need to work row by row. Currently, when you do matrix(c(disp, hp, gear, carb)) you create a 4 x 32 matrix! You want only 4 x 1 inputs, reshaped to 2 x 2.

Working with pmap allows you to process the rows one by one, but alternatively you can use rowwise which groups by row:

library(tidyverse)
df <-
mtcars %>%
as_tibble() %>%
rowwise() %>%
mutate(mat = list(matrix(c(disp, hp, gear, carb), 2, 2)))

Edit: Now how do you actually use those? Let's take the example of a fisher.test. Note that a test is a complex object, with components (like p.value) and attributes, so we'll have to store them in a list-column.

You can either keep working rowwise, in which case the list is automagically "unlist-ed":

df %>%
# keep in mind df is still grouped by row so 'mat' is only one matrix.
# A test is a complex object so we need to store it in a list-column
mutate(test = list(fisher.test(mat)),
# test is just one test so we can extract p-value directly
pval = test$p.value)

Or if you stop working row by row (for which you simply need to ungroup), then mat is a list of matrices onto which you can map functions. We use the map functions from purrr.

library("purrr")

df %>%
ungroup() %>%
# Apply the test to each mat using `map` from `purrr`
# `map` returns a list so `test` is a list-column
mutate(test = map(mat, fisher.test),
# Now `test` is a list of tests... so you need to map operations onto it
# Extract the p-values from each test, into a numeric column rather than a list-column
pval = map_dbl(test, pluck, "p.value"))

Which one you prefer is a matter of taste :)

How to store mean vectors and covariance matrices in cells of a data table?

We can use mget instead of get as get is for returning a single object value and mget for one or more

data[, lapply(mget(bd), function(x) mean(x)), by = a]

If we need a list column

data[, .(mu = .(as.list(lapply(mget(bd), function(x) mean(x))))), by = a]

IF we want both columns i.e. cov as well

data[, .(mu = .(sapply(mget(bd), function(x) mean(x))), 
sigma = .(cov(do.call(cbind, mget(bd)))[2])), by = a]
a mu sigma
1: 1 0.2353046,2.2000000 -2.131663
2: 2 0.1876238,3.3333333 2.062627
3: 3 0.9299794,1.5000000 0.1445644

Is it possible to store an vector inside a dataframe cell?

I agree with Stephen Henderson's comment that you shouldn't use list-columns unless you are absolutely sure that they are the best way to solve your specific problem. That being said, if you do decide to use list columns, you might want to consider using tibbles instead of data frames. Tibbles are an 'upgrade' to regular data frames. They are part of the tidyverse and come in the tibble package.

Tibbles make it easy to create list columns:

tibble(x = 1:3, y = list(1:5, 1:10, 1:20))

#> # A tibble: 3 x 2
#> x y
#> <int> <list>
#> 1 1 <int [5]>
#> 2 2 <int [10]>
#> 3 3 <int [20]>

Moreover, you can "pack" and "unpack" list-columns using the commands nest and unnest from the tidyr package. For example:

df <- tibble(
x = 1:3,
y = c("a", "d,e,f", "g,h")
)
df %>%
transform(y = strsplit(y, ",")) %>%
unnest(y)

For more information about tibbles you can consult this vignette.

How to save multiple numbers in one cell in a matrix/dataframe?

This is what I ended up doing

stn[1,1] <- toString(temp_warnings$row)
stn[2,1] <- toString((subset(temp_warnings, row <= 31))$day)
stn[3,1] <- toString((subset(temp_warnings, 31 < row & row <= 61))$day)
stn[4,1] <- toString((subset(temp_warnings, 61 < row & row <= 92))$day)
stn[5,1] <- toString((subset(temp_warnings, 92 < row & row <= 123))$day)
stn[6,1] <- toString((subset(temp_warnings, 123 < row))$day)

Is there a way to turn a DataFrame/correlation matrix into a DataFrame with one column per cell combination?

Column names containing [ and ] are problematic, so I've used a slightly different naming convention to yours, but I believe this gives you the structure you want.

First generate some test data

library(tidyverse)

d <- tibble(
a=c(1, 0.2, 0.4, 0.6),
b=c(0.2, 1, 0.2, 0.4),
c=c(0.4, 0.2, 1, 0.2),
d=c(0.6, 0.4, 0.2, 1)
)
d
# A tibble: 4 × 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 0.2 0.4 0.6
2 0.2 1 0.2 0.4
3 0.4 0.2 1 0.2
4 0.6 0.4 0.2 1

Then do what you want

d %>% 
mutate(row=letters[1:nrow(.)]) %>%
pivot_longer(-row) %>%
pivot_wider(
names_from=c(row, name),
values_from=value
)
a_a a_b a_c a_d b_a b_b b_c b_d c_a c_b c_c c_d d_a d_b d_c d_d
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.2 0.4 0.6 0.2 1 0.2 0.4 0.4 0.2 1 0.2 0.6 0.4 0.2 1

Edit

d %>% 
mutate(row=letters[1:nrow(.)]) %>%
pivot_wider(
names_from=row,
values_from=-row
)

Gives the same result and is slightly shorter.

Storing vectors in a dataframe element

Not sure if I understood you correctly, anyway, here's an example very similar to the one suggested here :

# your initial data.frame 
data <- data.frame(job_id = c('abc','abc1','jsdf'), usetime = c(2345,4353,34985))

# initialize runtime_excluded with an empty list
data$runtime_excluded <- vector(mode = "list",length=nrow(data))

# > data
# job_id usetime runtime_excluded
# 1 abc 2345 NULL
# 2 abc1 4353 NULL
# 3 jsdf 34985 NULL

# example of initialization in a for-loop
for(i in 1:3){
data$runtime_excluded[[i]] <- 1:i
# or, similarly :
# data[['runtime_excluded']][[i]] <- 1:i
}

# > data
# job_id usetime runtime_excluded
# 1 abc 2345 1
# 2 abc1 4353 1, 2
# 3 jsdf 34985 1, 2, 3

EDIT :

Here's a working version of your code :

data <- data.frame(job_id = c('abc','abc1','jsdf'), 
starttime = c(1,2,3),
endtime = c(24,24,23),
endtime_modified = c(22,20,23),
usetime = c(22,22,9)
)
# > data
# job_id starttime endtime endtime_modified usetime
# 1 abc 1 24 22 22
# 2 abc1 2 24 20 22
# 3 jsdf 3 23 23 9

# initialize runtime_excluded with an empty list
data$runtime_excluded <- vector(mode = "list",length=nrow(data))

k=nrow(data)
for(i in 1:k)
{
indices_peak<-which((data[i,"endtime"] >= data$starttime) & (data[i,"endtime"] <= data$endtime))
indices_peak95<-which((data[i,"endtime_modified"] >= data$starttime) & (data[i,"endtime_modified"] <= data$endtime_modified))

indices_excluded<-indices_peak[!indices_peak %in% indices_peak95]
data[i,"peak"]<-length(indices_peak)
data[i,"peak_95"]<-length(indices_peak95)
vect <- data[indices_excluded, "usetime"] # here's the integer(0) problem, solved using the if-statement below
if(!is.null(vect)){
data$runtime_excluded[[i]] <- vect
}
}

# > data
# job_id starttime endtime endtime_modified usetime runtime_excluded peak peak_95
# 1 abc 1 24 22 22 22 2 2
# 2 abc1 2 24 20 22 2 3
# 3 jsdf 3 23 23 9 22, 22 3 1

data.frame with a column containing a matrix in R

I find data.frames containing matrices mind-bendingly weird, but: the only way I know to achieve this is hidden in stats:::simulate.lm

Try this, poke through and see what's happening:

d <- data.frame(y=1:5,n=5)
g0 <- glm(cbind(y,n-y)~1,data=d,family=binomial)
debug(stats:::simulate.lm)
s <- simulate(g0,n=5)

This is the weird, back-door solution. Create a list, change its class to data.frame, and then (this is required) set the names and row.names manually (if you don't do those final steps the data will still be in the object, but it will print out as though it had zero rows ...)

m1 <- matrix(1:10,ncol=2)
m2 <- matrix(5:14,ncol=2)
dd <- list(m1,m2)
class(dd) <- "data.frame"
names(dd) <- LETTERS[1:2]
row.names(dd) <- 1:5
dd


Related Topics



Leave a reply



Submit