How to increase the number of columns using R in Linux
Here is a function I have in my ~/.Rprofile
file:
wideScreen <- function(howWide=Sys.getenv("COLUMNS")) {
options(width=as.integer(howWide))
}
Calling the function without the howWide
argument sets the column to be the width of your terminal. You can optionally pass in the argument to set the width to an arbitrary number of your choosing.
Almost like Josh's suggestion, but less magic :-)
Create variable number of columns in data.table using function
Here is one way which avoids copying the original data.table
object :
library(data.table)
#Create a temporary object
tmp <- dta[,my_function(a)]
#Create column names
cols <- paste0('cols', seq_along(tmp))
#Add the temporary object with new column names
dta[, (cols) := tmp]
Benchmark added by OP
Below the function I used to benchmark the solutions:
library(data.table)
my_function <- function(x) {
name <- deparse1(substitute(x))
res <- data.table(x == 1, x == 2)
names(res) <- paste0(name, "==", 1:2)
res
}
set.seed(1)
N <- 2E7
x <- sample(1:10, N, replace = TRUE)
dta <- data.table()
dta[, (letters[1:24]) := x]
t <- system.time({
tmp <- dta[, my_function(a)]
cols <- names(tmp)
dta[, (cols) := tmp]
})
#t <- system.time({
# dta <- cbind(dta, dta[, my_function(a)])
#})
print(t)
The command was run under Linux (Ubuntu 20.04) using /bin/time -v Rscript bench.R
. time
reports max memory use in the field Maximum resident set size (kbytes)
.
For the cbind solution the reported user time was 1.362 seconds and max memory 4206072 kbytes.
For the solution above the reported user time was 0.339 seconds and max memory 2486996 kbytes.
The solution above is threfore faster and uses less memory than the cbind
version.
R: How to add a row that has different number of columns than the rest of the data frame?
This should do it
set <- data.frame("id"=c("one", "two","three"), "line_number"=c("1", "2", "3"),
"content_type"=c("paragraph", "paragraph","paragraph"),
"text"=c("this is a sample","first batch is:", "second batch is:"),
"section"=c("introduction","content","summary"), stringsAsFactors = FALSE)
x <- data.frame(text = "Sample Report", stringsAsFactors = FALSE)
dplyr::bind_rows(set,x )
Increase the width of matrix printout
Andrie's answer is good, though sometimes one uses a super duper monitor and 9999 is not enough. ;-)
Here's my function for setting the display width:
setWidth <- function (width = NULL)
{
if (is.null(width)) {
columns <- as.numeric(Sys.getenv("COLUMNS"))
if (!is.na(columns)) {
options(width = columns)
}
else {
options(width = 100)
}
}
else {
options(width = width)
}
}
This has been addressed previously, though.
So, to improve on just the changing of width, another trick that I recommend: change the number of digits used in numeric output - set options(digits = ...)
to a smaller value. See ?options
for more info.
Count columns with certain values in data frame
Your reprex:
temp <- as.data.frame(
cbind(
c("x3", "x2", "x1", NA ),
c("x5", "x2", "x1", NA ),
c("x2", "x3", "x1", NA ),
c("x3", "x2", "x1", "x4"),
c("x1", "x2", NA , NA )
)
)
target <- c("x3", "x2", "x1")
Then if you want to check that the column only contains those 3 levels:
sum(sapply(temp, function(x) setequal(target, levels(x))))
setequal()
checks if two sets are equal regardless of order. levels
(since you didn't set stringsAsFactors = FALSE
tells you what all is in the column.
This will do the same thing:
sum(sapply(temp, function(x) setequal(target, na.omit(x))))
If you want to check that each element occurs the same number of times, try identical()
, along with as.character()
to turn your vectors back into characters.
sum(sapply(temp, function(x) {
identical(sort(target), sort(as.character(na.omit(x))))
}))
(Or just set stringsAsFactors = FALSE
in your original dataset and you won't have to use as.character()
here.)
add recursive number with condition in dataframe R
It is not entirely clear from the description. Based on the expected output, an option is to create a list
column by looping over the 'id', get the seq
uence after multiplying by '4' and then unnest
the list
column
library(dplyr)
library(purrr)
library(tidyr)
df1 %>%
mutate(id = map(id*100, seq, length.out = 4)) %>%
unnest(c(id))
# A tibble: 8 x 3
# id word count
# <dbl> <chr> <int>
#1 100 aa 2
#2 101 aa 2
#3 102 aa 2
#4 103 aa 2
#5 200 bb 3
#6 201 bb 3
#7 202 bb 3
#8 203 bb 3
Or another option is to replicate the rows (uncount
), grouped by 'word', modify the 'id'
df1 %>%
uncount(4) %>%
group_by(word) %>%
mutate(id = seq(100 * first(id), length.out = n()))
data
df1 <- structure(list(id = 1:2, word = c("aa", "bb"), count = 2:3),
class = "data.frame", row.names = c("1",
"2"))
R View() does not display all columns of data frame
I also see this problem with x <- matrix(1:200,nrow=1); View(x)
in RStudio, but not in vanilla R. It is a known limitation and they're working on it. You can contact the devs on their forum to give your feedback (and have done so, I see).
How to rowbind two datasets with different number of columns using R
One way would be to create a new matrix ("m1") with appropriate dimensions, ie. the nrow
of "m1" will be the sum of rows of "one", and "two", likewise, the ncol
is the length of all unique columns in both datasets. Create 'name' indexes ('onenm', 'twonm') that are exclusively present in one dataset, or unique column names in both datasets ('nm2'), or the names common in both ('nm1'). By use of appropriate 'row/column' index, we can assign the elements from 'one', 'two' datasets to the newly created xts
dataset ("xt1" created from "m1").
nm1 <- intersect(colnames(one), colnames(two))
onenm <- setdiff(colnames(one), colnames(two))
twonm <- setdiff(colnames(two), colnames(one))
nm2 <- union(colnames(one), colnames(two))
m1 <- matrix(0, nrow=nrow(one)+nrow(two), ncol=length(nm2),
dimnames=list(NULL, nm2))
xt1 <- xts(m1, order.by=c(index(one), index(two)))
xt1[index(one), onenm] <- one[,onenm]
xt1[index(two), twonm] <- two[,twonm]
xt1[,nm1] <- rbind(one[,nm1], two[,nm1])
dim(xt1)
#[1] 12 11
Update
You could also use rbindlist
from data.table
(or bind_rows
from dplyr
). Convert the xts
objects to "data.frame", place it in a list and use rbindlist
with fill=TRUE
option. Convert the output ('dt1') to xts
('xt1'), change the "NA" values to "0".
library(data.table)
dt1 <- rbindlist(list(as.data.frame(one),
as.data.frame(two)), fill=TRUE)
#or
#library(dplyr)
#dt1 <- bind_rows(list(as.data.frame(one), as.data.frame(two)))
xt2 <- xts(dt1, order.by=c(index(one), index(two)))
xt2[is.na(xt2)] <- 0
identical(xt1, xt2)
#[1] TRUE
Related Topics
Global Variables in Packages in R
Creating a Local R Package Repository
Identify All Objects of Given Class for Further Processing
Convert from Billion to Million and Vice Versa
Change Background and Text of Strips Associated to Multiple Panels in R/Lattice
What's the Difference Between Lapply and Do.Call
Why Is Message() a Better Choice Than Print() in R for Writing a Package
Find Start and End Positions/Indices of Runs/Consecutive Values
Colour Points in a Plot Differently Depending on a Vector of Values
Split Dataframe by Levels of a Factor and Name Dataframes by Those Levels
Elegantly Assigning Multiple Columns in Data.Table with Lapply()
R Keep Rows with at Least One Column Greater Than Value
Command Lines Error in Rstudio Console
Embedded Nul in String' Error When Importing CSV with Fread
Finding Out Which Functions Are Called Within a Given Function