Error - Replacement Has [X] Rows, Data Has [Y]

Error - replacement has [x] rows, data has [y]

You could use cut

 df$valueBin <- cut(df$value, c(-Inf, 250, 500, 1000, 2000, Inf), 
    labels=c('<=250', '250-500', '500-1,000', '1,000-2,000', '>2,000'))

data

 set.seed(24)
 df <- data.frame(value= sample(0:2500, 100, replace=TRUE))

Error in `$-.data.frame ..... replacement has X rows, data has Y

Try a dplyr approach to replacing the strange values:

library(dplyr)

dat %>%
    dplyr::mutate(
        FixedGender = dplyr::case_when(Gender == "¦ֳ«ֳ" ~ "Male",
                                       Gender == "°ֳ·ֳ¡ֳ₪" ~ "Female",
                                       TRUE ~ as.character(Gender))) %>%
    select(Gender, FixedGender) # This line is just to compare the two side-by-side

How to fix Error in `$-.data.frame` replacement has x rows, data has y?

In your code the subset function $ looks for a column named i instead of evaluating i. You can choose to subset the data.frame differently either with [, i] or [[i]]:

x <- data.frame(x = c(10,20,30), y = c("yes", "no", "no"), z = c("Big", "Small", "Average"))

# here is a vector that we are going to use inside our if statement
column_factor_names <- c("y", "z")

# for each column in df
for (i in names(df)) {

  print(i)

  # if it's a factor, convert into factor, else convert it into integer

  if (i %in% column_factor_names) {
    print("it's a factor")
    x[[i]] <- as.factor(x[[i]])
  } else {
    print("it's an integer")
    x[[i]] <- as.integer(x[[i]])
  }
}

See help("$") for more infos.

If you don't mind loosing the status message you could also do it without the need for a loop:

x[, i] <- as.factor(x[, i])

Adding column to df: Error in `$-.data.frame: replacement has x rows, data has 153

You guessed correctly that the problem is that lm removes the missing values, so the result vector is the wrong length and R doesn't know how to add it back into the data frame.

You have few options:
(1) use a modelling function that can live with missing variables such as xgboost
(2) impute a value for the missing data
(3) leave the model as is, but then the predictions are undefined where there is missing data.

(1) and (2) you could write a whole book about, but to achieve (3) you can do the following:

df$result <- NA ## actually, this line is not necessary
df$result[complete.cases(df[,c("Ozone","Temp")])] <- ozone.ols$residuals

Error message in R: replacement has (x) rows, data has (y)

The error you are encountering is in relation to your subset operation: db$Type["Main Session"] = "Main Training".

Using the mtcars dataset in R we can reproduce this error:

str(iris)
#> 'data.frame':    150 obs. of  5 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
class(iris$Species)
#> [1] "factor"
iris$Species<- as.character(iris$Species)
iris$Species["setosa"] <- "new name"
#> Error in `$<-.data.frame`(`*tmp*`, Species, value = structure(c("setosa", : replacement has 151 rows, data has 150

Created on 2018-09-03 by the reprex package (v0.2.0).

Inside the square brackets you need to subset the vector using a logical operation (i.e. one that evaluates to TRUE or FALSE.

str(iris)
#> 'data.frame':    150 obs. of  5 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris$Species<- as.character(iris$Species)
unique(iris$Species)
#> [1] "setosa"     "versicolor" "virginica"
iris$Species[iris$Species == "setosa"] <- "new name"
unique(iris$Species)
#> [1] "new name"   "versicolor" "virginica"

Created on 2018-09-03 by the reprex package (v0.2.0).

Getting R error replacement has x rows, data has y when creating a new boolean column in dataframe based on matches with a different dataframe

In base R

transform(g0, g_eGenes_nsPre = apply(g0, 1, function(x)
    as.integer(x["gene_id_name"] %in% nsPre$gene_id_name)))
#             gene_id_name      pLI g_eGenes_general g_eGenes_nsPre
#1   ENSG00000005020|SKAP2 0.008230                0              1
#2 ENSG00000039319|ZFYVE16 0.121040                0              1
#3   ENSG00000087884|AAMDC 0.135390                1              0
#4  ENSG00000027869|SH2D2A 0.002489                1              1
#5   ENSG00000124608|AARS2 0.325000                0              0

Instrad of as.integer you can also use the unary + operator.

Or using dplyr

library(dplyr)
g0 %>%
    mutate(g_eGenes_nsPre = +(gene_id_name %in% nsPre$gene_id_name))
#             gene_id_name      pLI g_eGenes_general g_eGenes_nsPre
#1   ENSG00000005020|SKAP2 0.008230                0              1
#2 ENSG00000039319|ZFYVE16 0.121040                0              1
#3   ENSG00000087884|AAMDC 0.135390                1              0
#4  ENSG00000027869|SH2D2A 0.002489                1              1
#5   ENSG00000124608|AARS2 0.325000                0              0

Or using data.table

library(data.table)
setDT(g0)[, g_eGenes_nsPre := +(gene_id_name %in% nsPre$gene_id_name)]

Sample data

nsPre <- read.table(text =
    "gene_id_name
ENSG00000005020|SKAP2
ENSG00000017260|ATP2C1
ENSG00000027869|SH2D2A
ENSG00000039319|ZFYVE16", header = T)

g0 <- read.table(text =
    "gene_id_name            pLI       g_eGenes_general
ENSG00000005020|SKAP2   0.00823   0
ENSG00000039319|ZFYVE16 0.12104   0
ENSG00000087884|AAMDC   0.13539   1
ENSG00000027869|SH2D2A  0.002489  1
ENSG00000124608|AARS2   0.32500   0", header = T)

How to fix 'Replacement has [x] rows, data has [y]' error within custom ggplot2 function?

You don't need to pass the whole data frame and vectors separately (see comment above). If you want to be flexible on variable names, the quickest way to fix this might be:

niceViolin <- function (Group, Response, ManualColour=F, ylabel, compare=F, comp1=NULL, comp2=NULL) {
  Data <- data.frame(Group, Response)

And then call the function as follows:

niceViolin(Group = Dataset$Condition, Response = Dataset$Outcome, ManualColour = F, ylabel = "Dependent Variable", compare = T, comp1 = 1, comp2 = 2)

Replacement has 0 rows, data has 25 error

There are several things that appear wrong with your functions.

makeCounts is referencing pswd, but Final_DF has Pswd and pswd_length. R is doing a partial match for, and I'm guessing that it is not the one you want. Let's prove what it is using, first by setting an option[1]:

options(warnPartialMatchDollar = TRUE) # see ?options
worst.ct <- sapply(worst.pass, makeCounts, simplify=FALSE)
# Warning in Final_DF$pswd : partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
# Warning: partial match of 'pswd' to 'pswd_length'
### ...repeated...

Worse, if you look at this variable (part of troubleshooting your problem is to check the variables you are making and using), you'll see that it is effectively empty/useless, where all values are 0:

str(worst.ct)
# List of 25
#  $ password  :List of 1
#   ..$ count: int 0
#  $ 123456    :List of 1
#   ..$ count: int 0
#  $ 12345678  :List of 1
#   ..$ count: int 0
#  $ qwerty    :List of 1
#   ..$ count: int 0
### ...truncated...

If you change your function to use the correct column name, it provides no such warning, and it does contain some non-zero elements:

makeCounts <- function(x) {
  return(x=list("count"=sum(grepl(x, Final_DF$Pswd, ignore.case=TRUE))))  
}
table(unlist(worst.ct))
#  0  1 
# 19  6 

str(worst.ct)
# List of 25
#  $ password  :List of 1
#   ..$ count: int 1
#  $ 123456    :List of 1
#   ..$ count: int 0
#  $ 12345678  :List of 1
#   ..$ count: int 0
#  $ qwerty    :List of 1
#   ..$ count: int 0
### ...truncated...

Within your printCounts function, you are referencing nrow(Final_DF$Pswd), which is always going to produce NULL. Have you tried this?
```
nrow(Final_DF$Pswd)
# NULL
nrow(Final_DF)
# [1] 50
```
Instead, rewrite that line to be
```
  tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
```
Not a syntax error, but your function relying on a variable that is neither defined within it nor passed to it is bad practice: it means the function can behave differently when the same parameters are passed to it, which breaks reproducibility (and it can make troubleshooting rather difficult).
I suggest making Final_DF an argument for the function, and passing it every time.
```
printCounts <- function(ct, Final_DF) {
  tmp <- data.frame(Term=names(ct), Count=as.numeric(unlist(ct)))
  tmp$Percent <- sprintf("%3.2f%%", ((tmp$Count / nrow(Final_DF) * 100)))
  print(tmp[order(-tmp$Count),], row.names=FALSE)
}

printCounts(worst.ct)
# Error in nrow(Final_DF) : argument "Final_DF" is missing, with no default

printCounts(worst.ct, Final_DF) # no error here
```
For this case, I'm recommending that you do not provide a default value for it. This also enabled you to use the same function with different "final" frames of passwords, in case you are testing (unit-testing) or testing (train/test sampling) or testing (troubleshooting).

After those changes, I get this:

printCounts(worst.ct, Final_DF)
#        Term Count Percent
#    password     1   2.00%
#      monkey     1   2.00%
#      dragon     1   2.00%
#    iloveyou     1   2.00%
#    superman     1   2.00%
#    sunshine     1   2.00%
#      123456     0   0.00%
#    12345678     0   0.00%
#      qwerty     0   0.00%
#      abc123     0   0.00%
#     1234567     0   0.00%
#  Qwertyuiop     0   0.00%
#         123     0   0.00%
#      000000     0   0.00%
#     1111111     0   0.00%
#        1234     0   0.00%
#       12345     0   0.00%
#  1234567890     0   0.00%
#  1q2w3e4r5t     0   0.00%
#      ashely     0   0.00%
#      shadow     0   0.00%
#      123123     0   0.00%
#      654321     0   0.00%
#      tinkle     0   0.00%
#    football     0   0.00%

Note:

I have options(warnPartialMatchDollar=TRUE, warnPartialMatchAttr=TRUE) set in my ~/.Rprofile (and any project-specific .Rprofile init file) for just this reason: the $ silently does partial matching, and this can be very problematic. With the warning, at least you can see what R is inferring in the background. There is a third option, warnPartialMatchArgs, that has the same intent ... but wError - Replacement Has [X] Rows, Data Has [Y]aay too many package authors out there are inadvertently relying on this behavior, so lacking the time/ability to fix them all, I have chosen to muffle this noise-maker.
Especially if this partial-matching behavior is a surprise to you, I strongly encourage you to set the first two options yourself. In the best-case, it produces no warnings and you have the comfort of knowing that you are taking steps to produce more resilient code; at worst, it is noisy and you eventually get tired of the noise and fix the lazy code.
See ?options for these three among many other available options. (Packages can set their own options as well; an option is similar in concept to Windows' registry, for better or worse, in that it is global to R, and can have arbitrary keys and values.)

Error - Replacement Has [X] Rows, Data Has [Y]