Merge error : negative length vectors are not allowed
You are getting this error because the data.frame
/ data.table
created by the join has more than 2^31 - 1
rows (2,147,483,647).
Due to the way vectors are constructed internally by R, the maximum length of any vector is 2^31 - 1
elements (see: https://stackoverflow.com/a/5234293/2341679). Since a data.frame
/ data.table
is really a list()
of vectors, this limit also applies to the number of rows.
As other people have commented and answered, unfortunately you won't be able to construct this data.table
, and its likely there are that many rows because of duplicate matches between your two data.tables
(these may or may not be intentional on your part).
The good news is, if the duplicate matches are not errors, and you still want to perform the join, there is a way around it: you just need to do whatever computation you wanted to do on the resulting data.table
in the same call as the join using the data.table[]
operator, e.g.
:
dt_left[dt_right, on = .(GVKEY, YEAR),
j = .(sum(firm_related_wealth), mean(fracdirafterindep),
by = .EACHI]
If you're not familiar with the data.table
syntax, you can perform calculations on columns within a data.table
as shown above using the j
argument. When performing a join using this syntax, computation in j
is performed on the data.table
created by the join.
The key here is the by = .EACHI
argument. This breaks the join (and subsequent computation in j
) down into smaller components: one data.table
for each row in dt_right
and its matches in dt_left
, avoiding the problem of creating a data.table
with > 2^31 - 1
rows.
Rcpp R vector size limit (negative length vectors are not allowed)
The problem is multiplication overflow. When you do
size * (size - 1) / 2
order of operations bites you, because
size * (size - 1)
can overflow even if the overall expression doesn't.
We can see this by adding a printing statement:
IntegerVector test(int size) {
int veclen = size * (size - 1) / 2;
Rcpp::Rcout << veclen << std::endl;
IntegerVector vec(veclen);
return vec;
}
vec <- test(47000)
# -1043007148
So, we can fix it by changing up how we do that operation:
IntegerVector test(int size) {
int veclen = (size / 2) * (size - 1);
Rcpp::Rcout << veclen << std::endl;
IntegerVector vec(veclen);
return vec;
}
which gives no issue
vec <- test(47000)
# 1104476500
str(vec)
# int [1:1104476500] 0 0 0 0 0 0 0 0 0 0 ...
Update: The problem with odd numbers
Eli Korvigo brings up an excellent point in the comments about integer division behavior with odd numbers. To illustrate consider calling the function with the even number 4 and the odd number 5
even <- 4
odd <- 5
even * (even - 1) / 2
# [1] 6
odd * (odd - 1) / 2
# [1] 10
It should create vectors of length 6 and 10 respectively.
But, what happens?
test(4)
# 6
# [1] 0 0 0 0 0 0
test(5)
# 8
# [1] 0 0 0 0 0 0 0 0
Oh no!5 / 2
in integer division is 2, not 2.5, so this does not quite do what we want in the odd case.
However, luckily we can easily address this with a simple flow control:
IntegerVector test2(int size) {
int veclen;
if ( size % 2 == 0 ) {
veclen = (size / 2) * (size - 1);
} else {
veclen = size * ((size - 1) / 2);
}
Rcpp::Rcout << veclen << std::endl;
IntegerVector vec(veclen);
return vec;
}
We can see this handles the odd and even cases both just fine:
test2(4)
# 6
# [1] 0 0 0 0 0 0
test2(5)
# 10
# [1] 0 0 0 0 0 0 0 0 0 0
R - joining more than 2^31 rows with data.table
Update
- If you just want to query the common neighbors, I don't suggest you build up a huge look-up table. Instead, you can use the following code to get the result for your query:
find_common_neighbors <- function(g, Vs) {
which(colSums(distances(g, Vs) == 1) == length(Vs))
}
such that
> find_common_neighbors(g, c(4, 8))
integer(0)
> find_common_neighbors(g, c(4, 5))
[1] 8
- If you need a look-up table, an alternative is to use
Neighbours
as the key to search its associated node, e.g.,
res <- transform(
data.frame(Neighbours = which(degree(g) >= 2)),
Nodes = sapply(
Neighbours,
function(x) toString(neighbors(g, x))
)
)
Previous Answer
I think you can use ego
over g
directly to generate res
, e.g.,
setNames(
data.frame(
t(do.call(
cbind,
lapply(
Filter(function(x) length(x) > 2, ego(g, 1)),
function(x) {
rbind(combn(x[-1], 2), x[1])
}
)
))
),
c("V1", "V2", "Neighbours")
)
which gives
V1 V2 Neighbours
1 4 5 8
2 4 10 8
3 5 10 8
Not enough memory to row bind two large datasets
If the combined dataset can fit into memory, you could try combining the tables in a CSV via fwrite
with append = TRUE
.
library(data.table)
fwrite(
rbindlist(
list(
fread("A.csv", nrows = 1L),
fread("B.csv")
),
fill = TRUE
)[-1],
"AB.csv"
)
fwrite(
fread("A.csv"),
"AB.csv",
append = TRUE
)
# maybe restart R here
AB <- fread("AB.csv", fill = TRUE)
iterate through 2 big dataframes with different length (if, else)
We can do a join
with data.table
for efficiently creating the column 'Location'
library(data.table)
setDT(df1)[df2, Location := Location, on = .(ID)]
Related Topics
How to Determine If a Character Vector Is a Valid Numeric or Integer Vector
Ggplot Piecharts on a Ggmap: Labels Destroy the Small Plots
Setting Working Directory: Julia Versus R
How to Do Str_Extract with Base R
Inserting a Table Under the Legend in a Ggplot2 and Saving Everything to a File
R-How to Generate Random Sample of a Discrete Random Variables
How to Save a Data Frame in a Txt or Excel File Separated by Columns
R- Plot Numbers Instead of Points
Setting an Individual Color Palette for the Group Variable in Geom_Smooth
Why Is R Dplyr::Mutate Inconsistent with Custom Functions
Plot Line on Top of Stacked Bar Chart in Ggplot2
How to Draw a Contour Plot When Data Are Not on a Regular Grid
Creating Igraph with Isolated Nodes
How to Run a Job Array in R Using the Rscript Command from the Command Line
Is There a Command Similar to Matlab's "Close All" in R? (How to Close All Graphics Devices)