How to Save a Data.Frame in R

How to save a data.frame in R?

There are several ways. One way is to use save() to save the exact object. e.g. for data frame foo:

save(foo,file="data.Rda")

Then load it with:

load("data.Rda")

You could also use write.table() or something like that to save the table in plain text, or dput() to obtain R code to reproduce the table.

How to save a data frame in R

You might want to take a look at this question here: R data formats: RData, Rda, Rds etc.

When loading an .rda object, you are going to load all objects with their original names to the global environment. You can't assign objects to new names using load as you tried to do.

If you want to save objects that can be loaded with different names later, then you should use the .rds format (saveRDS and readRDS). If you want to save more than one object in a .rds file, the simplest solution is to put all of them on a list and save only the list. If after reading the .rds you want to put the objects of the list in the global environment, you can use list2env.

How to save a large dataframe and quickly load it in R?

You can serialize it easily with:

readr::write_rds(pageInfo_df, "pageInfo_df.Rds")

and then deserialize it like so:

readr::read_rds("pageInfo_df.Rds")

this should handle every valid R object of an arbitrary complexity.

Saving a DataFrame to .txt-file in R (every value in new line)

I'm not certain I've 100% grasped what you're trying to do, but it looks like you're trying to print the data row-wise to a text file. Here's a possible solution using tidyverse. I'm not sure what your data looks like, so here's a slightly longer tibble just to show that it's doing what I'm seeing your question as.

To create some data for the example:

## if you need to install tidyverse
# install.packages("tidyverse")
library(tidyverse)

dat <-
tibble(
w = c("First", "Fourth", "Seventh"),
x = c("Second", "Fifth", "Eighth"),
y = c("Third", "Sixth", "Ninth"),
z = c("do", "not", "want")
)

The data looks like this:

w       x       y       z
First Second Third do
Fourth Fifth Sixth not
Seventh Eighth Ninth want

Here we're manipulating the data to the format you want printed.

dat_to_print <-
dat %>%
## whatever columns you do not want printed would go here
## you could also select(w,x,y) instead of dropping the unwanted columns
select(-z) %>%
rowwise() %>%
## whatever columns you want printed would go here... you can also provide it as c(w,x,y)
pivot_longer(w:y) %>%
## pivot longer will come up with two columns:
## the first is 'name' which holds the former name of the variable (i.e. w, x, or y)
## the second is 'value' which is what you want to print as I've understood the problem
## it doesn't look like you care about the old column names, so we remove it here
select(-name)

And creating the text file.

write.table(dat_to_print, 
file = "C:\\your\\folder\\location\\dat.txt",
col.names = FALSE,
row.names = FALSE,
quote = FALSE)

dat.txt will look like this:

First               
Second
Third
Fourth
Fifth
Sixth
Seventh
Eighth
Ninth

How do I save a dataframe with a list column of same-columned dataframes to parquet with arrow?

I've found that even if you define a schema (my_schema) that includes the structure of the list column, write_parquet(df,schema=my_schema) will still fail if some of the rows of the list_column do not hold the same structure as the rows that do have that structure (i.e. if some of the rows are NA)

For example, if dat is a data.table with five, columns, one of which is a list column holding data.table...

     grp                data         a          b          c
<num> <list> <num> <num> <num>
1: 1 <data.table[100x3]> 0.6142948 -1.0359482 -0.3782694
2: 2 NA 0.1192991 0.1889432 0.2735809
3: 3 <data.table[100x3]> 0.4198558 0.6189989 -0.8201980

Then, write_parquet(dat, schema=my_schema) will fail (i.e. Error: Invalid: Can only convert data frames to Struct type).

I think the approach of placing a 0-row table of the same structure as the other tables in that list column is a good idea:

# get a null table of same structure
null_table = dat[!is.na(data)]$data[[1]][0,]

# replace the NA with the null_table
dat[is.na(data),data:=list(null_table)]

# write the parquet file
write_parquet(dat, "dat.pqt")

This is easily retrieved:

# Read the file
dat = read_parquet("dat.pqt")

# Convert the arrow list to data.table
dat$data= lapply(dat$data, data.table)

# Convert the data.tables with 0 rows back to NA
dat[sapply(dat$data,nrow)==0,data:=NA][]

grp data a b c
<num> <list> <num> <num> <num>
1: 1 <data.table[100x3]> 0.6142948 -1.0359482 -0.3782694
2: 2 NA 0.1192991 0.1889432 0.2735809
3: 3 <data.table[100x3]> 0.4198558 0.6189989 -0.8201980

How to save t-test output with names of columns into a dataframe in R?

If you don't mind using an external package then:

library(matrixTests)
col_t_welch(df[df$group=="cluster2",-1], df[df$group=="cluster1",-1])

obs.x obs.y obs.tot mean.x mean.y mean.diff var.x var.y stderr df statistic pvalue conf.low conf.high alternative mean.null conf.level
One 7 3 10 -0.03035821 0.16533806 -0.1956963 0.4347748 0.01569194 0.2595021 6.906193 -0.7541221 0.4756968552 -0.8110149 0.41962235 two.sided 0 0.95
two 7 3 10 -0.06497898 0.03928572 -0.1042647 0.7347812 2.39802096 0.9509517 2.545136 -0.1096425 0.9207496910 -3.4608429 3.25231355 two.sided 0 0.95
three 7 3 10 -0.48970882 0.39769370 -0.8874025 0.3385390 0.63615343 0.5103076 2.964909 -1.7389561 0.1815091371 -2.5223572 0.74755220 two.sided 0 0.95
four 7 3 10 -0.52964750 0.86171745 -1.3913649 0.3785659 0.06283704 0.2739097 7.963842 -5.0796483 0.0009668828 -2.0235014 -0.75922852 two.sided 0 0.95
five 7 3 10 -0.29465530 0.48897708 -0.7836324 0.4417576 0.15588465 0.3392194 6.575237 -2.3101050 0.0565252579 -1.5963836 0.02911888 two.sided 0 0.95
six 7 3 10 -0.61128484 0.76991659 -1.3812014 0.6884335 0.09377073 0.3600063 7.996676 -3.8366033 0.0049749882 -2.2114376 -0.55096530 two.sided 0 0.95

Saving dataframe with separated column in R

If we need to automatically update the original object use the magrittr compound operator (%<>%)

library(magrittr)
four_rows %<>%
separate(Datetime, c('Date', 'Time'), sep=" ")

Now, we check for

four_rows


Related Topics



Leave a reply



Submit