Error - replacement has [x] rows, data has [y]
You could use cut
df$valueBin <- cut(df$value, c(-Inf, 250, 500, 1000, 2000, Inf),
labels=c('<=250', '250-500', '500-1,000', '1,000-2,000', '>2,000'))
data
set.seed(24)
df <- data.frame(value= sample(0:2500, 100, replace=TRUE))
Error in `$-.data.frame ..... replacement has X rows, data has Y
Try a dplyr
approach to replacing the strange values:
library(dplyr)
dat %>%
dplyr::mutate(
FixedGender = dplyr::case_when(Gender == "¦ֳ«ֳ" ~ "Male",
Gender == "°ֳ·ֳ¡ֳ₪" ~ "Female",
TRUE ~ as.character(Gender))) %>%
select(Gender, FixedGender) # This line is just to compare the two side-by-side
How to fix Error in `$-.data.frame` replacement has x rows, data has y?
In your code the subset function $
looks for a column named i
instead of evaluating i
. You can choose to subset the data.frame differently either with [, i]
or [[i]]
:
x <- data.frame(x = c(10,20,30), y = c("yes", "no", "no"), z = c("Big", "Small", "Average"))
# here is a vector that we are going to use inside our if statement
column_factor_names <- c("y", "z")
# for each column in df
for (i in names(df)) {
print(i)
# if it's a factor, convert into factor, else convert it into integer
if (i %in% column_factor_names) {
print("it's a factor")
x[[i]] <- as.factor(x[[i]])
} else {
print("it's an integer")
x[[i]] <- as.integer(x[[i]])
}
}
See help("$")
for more infos.
If you don't mind loosing the status message you could also do it without the need for a loop:
x[, i] <- as.factor(x[, i])
Adding column to df: Error in `$-.data.frame: replacement has x rows, data has 153
You guessed correctly that the problem is that lm
removes the missing values, so the result vector is the wrong length and R doesn't know how to add it back into the data frame.
You have few options:
(1) use a modelling function that can live with missing variables such as xgboost
(2) impute a value for the missing data
(3) leave the model as is, but then the predictions are undefined where there is missing data.
(1) and (2) you could write a whole book about, but to achieve (3) you can do the following:
df$result <- NA ## actually, this line is not necessary
df$result[complete.cases(df[,c("Ozone","Temp")])] <- ozone.ols$residuals
Error message in R: replacement has (x) rows, data has (y)
The error you are encountering is in relation to your subset operation: db$Type["Main Session"] = "Main Training"
.
Using the mtcars
dataset in R we can reproduce this error:
str(iris)
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
class(iris$Species)
#> [1] "factor"
iris$Species<- as.character(iris$Species)
iris$Species["setosa"] <- "new name"
#> Error in `$<-.data.frame`(`*tmp*`, Species, value = structure(c("setosa", : replacement has 151 rows, data has 150
Created on 2018-09-03 by the reprex package (v0.2.0).
Inside the square brackets you need to subset the vector using a logical operation (i.e. one that evaluates to TRUE or FALSE.
str(iris)
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris$Species<- as.character(iris$Species)
unique(iris$Species)
#> [1] "setosa" "versicolor" "virginica"
iris$Species[iris$Species == "setosa"] <- "new name"
unique(iris$Species)
#> [1] "new name" "versicolor" "virginica"
Created on 2018-09-03 by the reprex package (v0.2.0).
Getting R error replacement has x rows, data has y when creating a new boolean column in dataframe based on matches with a different dataframe
In base R
transform(g0, g_eGenes_nsPre = apply(g0, 1, function(x)
as.integer(x["gene_id_name"] %in% nsPre$gene_id_name)))
# gene_id_name pLI g_eGenes_general g_eGenes_nsPre
#1 ENSG00000005020|SKAP2 0.008230 0 1
#2 ENSG00000039319|ZFYVE16 0.121040 0 1
#3 ENSG00000087884|AAMDC 0.135390 1 0
#4 ENSG00000027869|SH2D2A 0.002489 1 1
#5 ENSG00000124608|AARS2 0.325000 0 0
Instrad of as.integer
you can also use the unary +
operator.
Or using dplyr
library(dplyr)
g0 %>%
mutate(g_eGenes_nsPre = +(gene_id_name %in% nsPre$gene_id_name))
# gene_id_name pLI g_eGenes_general g_eGenes_nsPre
#1 ENSG00000005020|SKAP2 0.008230 0 1
#2 ENSG00000039319|ZFYVE16 0.121040 0 1
#3 ENSG00000087884|AAMDC 0.135390 1 0
#4 ENSG00000027869|SH2D2A 0.002489 1 1
#5 ENSG00000124608|AARS2 0.325000 0 0
Or using data.table
library(data.table)
setDT(g0)[, g_eGenes_nsPre := +(gene_id_name %in% nsPre$gene_id_name)]
Sample data
nsPre <- read.table(text =
"gene_id_name
ENSG00000005020|SKAP2
ENSG00000017260|ATP2C1
ENSG00000027869|SH2D2A
ENSG00000039319|ZFYVE16", header = T)
g0 <- read.table(text =
"gene_id_name pLI g_eGenes_general
ENSG00000005020|SKAP2 0.00823 0
ENSG00000039319|ZFYVE16 0.12104 0
ENSG00000087884|AAMDC 0.13539 1
ENSG00000027869|SH2D2A 0.002489 1
ENSG00000124608|AARS2 0.32500 0", header = T)
How to fix 'Replacement has [x] rows, data has [y]' error within custom ggplot2 function?
You don't need to pass the whole data frame and vectors separately (see comment above). If you want to be flexible on variable names, the quickest way to fix this might be:
niceViolin <- function (Group, Response, ManualColour=F, ylabel, compare=F, comp1=NULL, comp2=NULL) {
Data <- data.frame(Group, Response)
And then call the function as follows:
niceViolin(Group = Dataset$Condition, Response = Dataset$Outcome, ManualColour = F, ylabel = "Dependent Variable", compare = T, comp1 = 1, comp2 = 2)
Replacement has 0 rows, data has 25 error
There are several things that appear wrong with your functions.
makeCounts
is referencingpswd
, butFinal_DF
hasPswd
andpswd_length
. R is doing a partial match for, and I'm guessing that it is not the one you want. Let's prove what it is using, first by setting an option[1]:options(warnPartialMatchDollar = TRUE) # see ?options
worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
# Warning in Final_DF$pswd : partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
### ...repeated...Worse, if you look at this variable (part of troubleshooting your problem is to check the variables you are making and using), you'll see that it is effectively empty/useless, where all values are
0
:str(worst.ct)
# List of 25
# $ password :List of 1
# ..$ count: int 0
# $ 123456 :List of 1
# ..$ count: int 0
# $ 12345678 :List of 1
# ..$ count: int 0
# $ qwerty :List of 1
# ..$ count: int 0
### ...truncated...If you change your function to use the correct column name, it provides no such warning, and it does contain some non-zero elements:
makeCounts <- function(x) {
return(x=list("count"=sum(grepl(x, Final_DF$Pswd, ignore.case=TRUE))))
}
table(unlist(worst.ct))
# 0 1
# 19 6
str(worst.ct)
# List of 25
# $ password :List of 1
# ..$ count: int 1
# $ 123456 :List of 1
# ..$ count: int 0
# $ 12345678 :List of 1
# ..$ count: int 0
# $ qwerty :List of 1
# ..$ count: int 0
### ...truncated...Within your
printCounts
function, you are referencingnrow(Final_DF$Pswd)
, which is always going to produceNULL
. Have you tried this?nrow(Final_DF$Pswd)
# NULL
nrow(Final_DF)
# [1] 50Instead, rewrite that line to be
tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
Not a syntax error, but your function relying on a variable that is neither defined within it nor passed to it is bad practice: it means the function can behave differently when the same parameters are passed to it, which breaks reproducibility (and it can make troubleshooting rather difficult).
I suggest making
Final_DF
an argument for the function, and passing it every time.printCounts <- function(ct, Final_DF) {
tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
print(tmp[order(-tmp$Count),], row.names=FALSE)
}
printCounts(worst.ct)
# Error in nrow(Final_DF) : argument "Final_DF" is missing, with no default
printCounts(worst.ct, Final_DF) # no error hereFor this case, I'm recommending that you do not provide a default value for it. This also enabled you to use the same function with different "final" frames of passwords, in case you are testing (unit-testing) or testing (train/test sampling) or testing (troubleshooting).
After those changes, I get this:
printCounts(worst.ct, Final_DF)
# Term Count Percent
# password 1 2.00%
# monkey 1 2.00%
# dragon 1 2.00%
# iloveyou 1 2.00%
# superman 1 2.00%
# sunshine 1 2.00%
# 123456 0 0.00%
# 12345678 0 0.00%
# qwerty 0 0.00%
# abc123 0 0.00%
# 1234567 0 0.00%
# Qwertyuiop 0 0.00%
# 123 0 0.00%
# 000000 0 0.00%
# 1111111 0 0.00%
# 1234 0 0.00%
# 12345 0 0.00%
# 1234567890 0 0.00%
# 1q2w3e4r5t 0 0.00%
# ashely 0 0.00%
# shadow 0 0.00%
# 123123 0 0.00%
# 654321 0 0.00%
# tinkle 0 0.00%
# football 0 0.00%
Note:
I have
options(warnPartialMatchDollar=TRUE, warnPartialMatchAttr=TRUE)
set in my~/.Rprofile
(and any project-specific.Rprofile
init file) for just this reason: the$
silently does partial matching, and this can be very problematic. With the warning, at least you can see what R is inferring in the background. There is a third option,warnPartialMatchArgs
, that has the same intent ... but wError - Replacement Has [X] Rows, Data Has [Y]aay too many package authors out there are inadvertently relying on this behavior, so lacking the time/ability to fix them all, I have chosen to muffle this noise-maker.Especially if this partial-matching behavior is a surprise to you, I strongly encourage you to set the first two options yourself. In the best-case, it produces no warnings and you have the comfort of knowing that you are taking steps to produce more resilient code; at worst, it is noisy and you eventually get tired of the noise and fix the lazy code.
See
?options
for these three among many other available options. (Packages can set their own options as well; an option is similar in concept to Windows' registry, for better or worse, in that it is global to R, and can have arbitrary keys and values.)
Related Topics
Starting Shiny App After Password Input
How R Formats Posixct With Fractional Seconds
What Is the Width Argument in Position_Dodge
Applying a Function to Every Row of a Table Using Dplyr
How to Divide Each Row of a Matrix by Elements of a Vector in R
Convert Unix Epoch to Date Object
Extract Regression Coefficient Values
Access Variable Value Where the Name of Variable Is Stored in a String
How to Change the Y-Axis Figures into Percentages in a Barplot
How to Remove All Whitespace from a String
Pass Arguments to Dplyr Functions
How to Merge Color, Line Style and Shape Legends in Ggplot
Intelligent Point Label Placement in R
How to Change the Default Library Path For R Packages
Custom Legend For Multiple Layer Ggplot