How to Increase the Number of Columns Using R in Linux

How to increase the number of columns using R in Linux

Here is a function I have in my ~/.Rprofile file:

wideScreen <- function(howWide=Sys.getenv("COLUMNS")) {
options(width=as.integer(howWide))
}

Calling the function without the howWide argument sets the column to be the width of your terminal. You can optionally pass in the argument to set the width to an arbitrary number of your choosing.

Almost like Josh's suggestion, but less magic :-)

Create variable number of columns in data.table using function

Here is one way which avoids copying the original data.table object :

library(data.table)
#Create a temporary object
tmp <- dta[,my_function(a)]
#Create column names
cols <- paste0('cols', seq_along(tmp))
#Add the temporary object with new column names
dta[, (cols) := tmp]

Benchmark added by OP

Below the function I used to benchmark the solutions:

library(data.table)
my_function <- function(x) {
name <- deparse1(substitute(x))
res <- data.table(x == 1, x == 2)
names(res) <- paste0(name, "==", 1:2)
res
}
set.seed(1)
N <- 2E7
x <- sample(1:10, N, replace = TRUE)
dta <- data.table()
dta[, (letters[1:24]) := x]

t <- system.time({
tmp <- dta[, my_function(a)]
cols <- names(tmp)
dta[, (cols) := tmp]
})
#t <- system.time({
# dta <- cbind(dta, dta[, my_function(a)])
#})
print(t)

The command was run under Linux (Ubuntu 20.04) using /bin/time -v Rscript bench.R. time reports max memory use in the field Maximum resident set size (kbytes).

For the cbind solution the reported user time was 1.362 seconds and max memory 4206072 kbytes.

For the solution above the reported user time was 0.339 seconds and max memory 2486996 kbytes.

The solution above is threfore faster and uses less memory than the cbind version.

R: How to add a row that has different number of columns than the rest of the data frame?

This should do it

set <- data.frame("id"=c("one", "two","three"), "line_number"=c("1", "2", "3"), 
"content_type"=c("paragraph", "paragraph","paragraph"),
"text"=c("this is a sample","first batch is:", "second batch is:"),
"section"=c("introduction","content","summary"), stringsAsFactors = FALSE)
x <- data.frame(text = "Sample Report", stringsAsFactors = FALSE)
dplyr::bind_rows(set,x )

Increase the width of matrix printout

Andrie's answer is good, though sometimes one uses a super duper monitor and 9999 is not enough. ;-)

Here's my function for setting the display width:

setWidth <- function (width = NULL) 
{
if (is.null(width)) {
columns <- as.numeric(Sys.getenv("COLUMNS"))
if (!is.na(columns)) {
options(width = columns)
}
else {
options(width = 100)
}
}
else {
options(width = width)
}
}

This has been addressed previously, though.

So, to improve on just the changing of width, another trick that I recommend: change the number of digits used in numeric output - set options(digits = ...) to a smaller value. See ?options for more info.

Count columns with certain values in data frame

Your reprex:

temp <- as.data.frame(
cbind(
c("x3", "x2", "x1", NA ),
c("x5", "x2", "x1", NA ),
c("x2", "x3", "x1", NA ),
c("x3", "x2", "x1", "x4"),
c("x1", "x2", NA , NA )
)
)
target <- c("x3", "x2", "x1")

Then if you want to check that the column only contains those 3 levels:

sum(sapply(temp, function(x) setequal(target, levels(x))))

setequal() checks if two sets are equal regardless of order. levels (since you didn't set stringsAsFactors = FALSE tells you what all is in the column.

This will do the same thing:

sum(sapply(temp, function(x) setequal(target, na.omit(x))))

If you want to check that each element occurs the same number of times, try identical(), along with as.character() to turn your vectors back into characters.

sum(sapply(temp, function(x) {
identical(sort(target), sort(as.character(na.omit(x))))
}))

(Or just set stringsAsFactors = FALSE in your original dataset and you won't have to use as.character() here.)

add recursive number with condition in dataframe R

It is not entirely clear from the description. Based on the expected output, an option is to create a list column by looping over the 'id', get the sequence after multiplying by '4' and then unnest the list column

library(dplyr)
library(purrr)
library(tidyr)
df1 %>%
mutate(id = map(id*100, seq, length.out = 4)) %>%
unnest(c(id))
# A tibble: 8 x 3
# id word count
# <dbl> <chr> <int>
#1 100 aa 2
#2 101 aa 2
#3 102 aa 2
#4 103 aa 2
#5 200 bb 3
#6 201 bb 3
#7 202 bb 3
#8 203 bb 3

Or another option is to replicate the rows (uncount), grouped by 'word', modify the 'id'

df1 %>%
uncount(4) %>%
group_by(word) %>%
mutate(id = seq(100 * first(id), length.out = n()))

data

df1 <- structure(list(id = 1:2, word = c("aa", "bb"), count = 2:3), 
class = "data.frame", row.names = c("1",
"2"))

R View() does not display all columns of data frame

I also see this problem with x <- matrix(1:200,nrow=1); View(x) in RStudio, but not in vanilla R. It is a known limitation and they're working on it. You can contact the devs on their forum to give your feedback (and have done so, I see).

How to rowbind two datasets with different number of columns using R

One way would be to create a new matrix ("m1") with appropriate dimensions, ie. the nrow of "m1" will be the sum of rows of "one", and "two", likewise, the ncol is the length of all unique columns in both datasets. Create 'name' indexes ('onenm', 'twonm') that are exclusively present in one dataset, or unique column names in both datasets ('nm2'), or the names common in both ('nm1'). By use of appropriate 'row/column' index, we can assign the elements from 'one', 'two' datasets to the newly created xts dataset ("xt1" created from "m1").

nm1 <- intersect(colnames(one), colnames(two))
onenm <- setdiff(colnames(one), colnames(two))
twonm <- setdiff(colnames(two), colnames(one))
nm2 <- union(colnames(one), colnames(two))
m1 <- matrix(0, nrow=nrow(one)+nrow(two), ncol=length(nm2),
dimnames=list(NULL, nm2))
xt1 <- xts(m1, order.by=c(index(one), index(two)))
xt1[index(one), onenm] <- one[,onenm]
xt1[index(two), twonm] <- two[,twonm]
xt1[,nm1] <- rbind(one[,nm1], two[,nm1])
dim(xt1)
#[1] 12 11

Update

You could also use rbindlist from data.table (or bind_rows from dplyr). Convert the xts objects to "data.frame", place it in a list and use rbindlist with fill=TRUE option. Convert the output ('dt1') to xts ('xt1'), change the "NA" values to "0".

 library(data.table)
dt1 <- rbindlist(list(as.data.frame(one),
as.data.frame(two)), fill=TRUE)
#or
#library(dplyr)
#dt1 <- bind_rows(list(as.data.frame(one), as.data.frame(two)))
xt2 <- xts(dt1, order.by=c(index(one), index(two)))
xt2[is.na(xt2)] <- 0
identical(xt1, xt2)
#[1] TRUE


Related Topics



Leave a reply



Submit