R Function for Returning All Factors

R Function for returning ALL factors

To follow up on my comment (thanks to @Ramnath for my typo), the brute force method seems to work reasonably well here on my 64 bit 8 gig machine:

FUN <- function(x) {
    x <- as.integer(x)
    div <- seq_len(abs(x))
    factors <- div[x %% div == 0L]
    factors <- list(neg = -factors, pos = factors)
    return(factors)
}

A few examples:

> FUN(100)
$neg
[1]   -1   -2   -4   -5  -10  -20  -25  -50 -100

$pos
[1]   1   2   4   5  10  20  25  50 100

> FUN(-42)
$neg
[1]  -1  -2  -3  -6  -7 -14 -21 -42

$pos
[1]  1  2  3  6  7 14 21 42

#and big number

> system.time(FUN(1e8))
   user  system elapsed 
   1.95    0.18    2.14

Return factor level from a function, not an integer in R

You can use the levels of the variable Classes and the output of the ifelse statement as follows:

data <- data.frame(a = 1:10)

find_class <- function(i) {

  classes <- factor(c('A', 'B', 'C'))

  idx <- ifelse(i %in% c(1, 3, 5), classes[1],
                ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))

  res <- levels(classes)[idx]
  factor(res, levels(classes))
}

data$class <- find_class(data$a)

data$class
# [1] A B A B A C C C B C
# Levels: A B C

data
#     a class
# 1   1     A
# 2   2     B
# 3   3     A
# 4   4     B
# 5   5     A
# 6   6     C
# 7   7     C
# 8   8     C
# 9   9     B
# 10 10     C

Making list of factors in a function but return warning error

Solution is simple, you just need double brackets around "Factors" :)

In the function

result[["Factors"]] <- unique(dataset[[3]])

That should be the line.

The double brackets return an element, single brackets return that selection as a list.

Sounds silly, by try this

test <- list()

class(test["Factors"])
class(test[["Factors"]])

The first class will be of type 'list'. The second will be of type 'NULL'. This is because the single brackets returns a subset as a list, and the double brackets return the element itself. It's useful depending on the scenario. The element in this case is "NULL" because nothing has been assigned to it.

The error "number of items to replace is not a multiple of replacement length" is because you've asked it to put 3 things into a single element (that element is a list). When you use double brackets you actually put it inside a list, where you can have multiple elements, so it can work!

Hope that makes sense!

Factor function in R returning NAs

We can't see your data, but the behavior is indicative of Data$Month containing a value that is not included in your level term xnames. Is anything misspelled? I would suggest you compare levels(as.factor(Data$Month)) and xnames - it will certainly show you the issue.

Example dataset that shows the same problem you have:

yums <- c('soup', 'salad', 'bread')
nums <- c(10, 14, 5)
df1 <- data.frame(yums, nums)

yum.levels <- c('soup', 'salad', 'bread', 'pasta')
ggplot(df1, aes(x=factor(yums, yum.levels), y=nums)) + geom_col()

That gives you this:

Sample Image

...but if we mispell one of them (like capitalizing "Soup" in yums), you get this:

yums1 <- c('Soup', 'salad', 'bread')
nums <- c(10, 14, 5)
df2 <- data.frame(yums1, nums)

yum.levels <- c('soup', 'salad', 'bread', 'pasta')
ggplot(df2, aes(x=factor(yums1, yum.levels), y=nums)) + geom_col()

Sample Image

R - table() returns repeated factors

The output of table here is confusing because your factor levels (1 to 5) look like row numbers, and there are some blank ("") responses to the Startrek variable which makes it appear like the data is only under the No and Yes columns.

So, the data here is a 5 by 3 table, with the rows representing the score from Anakin (1 to 5) and the columns representing 3 types of response to Startrek ("", No, Yes).

Note that where there are NA's in Anakin, this data is ingored in the table. To count these too, use addNA:

table(addNA(starwars$Anakin), starwars$Startrek)

Giving a conditional onto a function that performs for factors

An option is to split the code before we use the if i.e. select the columns that are factor and create a new object ('weight_sub'), then check the length on the 'weight_sub', if it is greater than 0, do the rest of model.matrix and assign it back to 'weight'

weight_sub <- weight %>% 
 select(where(is.factor)) 
 
if(length(weight_sub) > 0) {
  weight_sub %>%
   map_dfc(~ model.matrix(~ .x -1) %>% 
             as_tibble) %>% 
     rename_all(~ str_remove(., "\\.x")) %>%
      bind_cols(weight, .) -> weight

  }

-output

#   wgt hgt age    id    sex black brown white female male
#1   64  57   8 black female     1     0     0      1    0
#2   71  59  10 black female     1     0     0      1    0
#3   53  49   6 black female     1     0     0      1    0
#4   67  62  11 black female     1     0     0      1    0
#5   55  51   8 white female     0     0     1      1    0
#6   58  50   7 white female     0     0     1      1    0
#7   77  55  10 white   male     0     0     1      0    1
#8   57  48   9 white   male     0     0     1      0    1
#9   56  42  10 brown   male     0     1     0      0    1
#10  51  42   6 brown   male     0     1     0      0    1
#11  76  61  12 brown   male     0     1     0      0    1
#12  68  57   9 brown   male     0     1     0      0    1

As a negative test, do this by checking if it is a character class column

weight_sub <- weight %>% 
 select(where(is.character)) 
 
if(length(weight_sub) > 0) {
  weight_sub %>%
   map_dfc(~ model.matrix(~ .x -1) %>% 
             as_tibble) %>% 
     rename_all(~ str_remove(., "\\.x")) %>%
      bind_cols(weight, .) -> weight

  }

No output as the if condition returns FALSE, thus the 'weight' dataset remains the same without adding any new columns

In the update, if the OP is also using numeric columns to be passed into model.matrix, it just returns the same column i.e. one column (as we are looping over columns with map) with the column name as .x (from model.matrix formula). This .x column name is removed with rename_all when we use str_remove, leaving a blank column name, which by default is filled with a column name assigned as 'col' from _dfc. To prevent, that, we can use an if/else condition before doing this to append the original column name as suffix for those having one column output and is a numeric one

weight %>%
    imap_dfc(~ {
        nm1 <- .y
        tmp <- model.matrix(~ .x - 1) %>%
           as_tibble
       if(ncol(tmp) == 1 && class(tmp[[1]]) == 'numeric') {
          names(tmp) <- paste0(names(tmp), nm1)
       }
      tmp
     }) %>% 
      rename_all(~ str_remove(., "\\.x"))

-output

# A tibble: 12 x 8
#     wgt   hgt   age black brown white female  male
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
# 1    64    57     8     1     0     0      1     0
# 2    71    59    10     1     0     0      1     0
# 3    53    49     6     1     0     0      1     0
# 4    67    62    11     1     0     0      1     0
# 5    55    51     8     0     0     1      1     0
# 6    58    50     7     0     0     1      1     0
# 7    77    55    10     0     0     1      0     1
# 8    57    48     9     0     0     1      0     1
# 9    56    42    10     0     1     0      0     1
#10    51    42     6     0     1     0      0     1
#11    76    61    12     0     1     0      0     1
#12    68    57     9     0     1     0      0     1

Or we do this with Map in base R

 out <- do.call(cbind, unname(Map(function(x, y) {
      tmp <- as.data.frame(model.matrix(~x -1))
      if(ncol(tmp) == 1 & class(tmp[[1]]) == 'numeric') {
          names(tmp) <- paste0(names(tmp), y)}
          tmp
          }, weight, names(weight))))
names(out) <- sub('^x', '', names(out))
out
#   wgt hgt age black brown white female male
#1   64  57   8     1     0     0      1    0
#2   71  59  10     1     0     0      1    0
#3   53  49   6     1     0     0      1    0
#4   67  62  11     1     0     0      1    0
#5   55  51   8     0     0     1      1    0
#6   58  50   7     0     0     1      1    0
#7   77  55  10     0     0     1      0    1
#8   57  48   9     0     0     1      0    1
#9   56  42  10     0     1     0      0    1
#10  51  42   6     0     1     0      0    1
#11  76  61  12     0     1     0      0    1
#12  68  57   9     0     1     0      0    1

return index of all factor variables that don't have a predefined name

## Desired result?
names(data)=="BINARY"
# [1] FALSE FALSE  TRUE FALSE

## Desired method
response.variable.name<-"RESPONSE"
sapply(data,function(x) nlevels(as.factor(x))==2) & !names(data) %in% response.variable.name
# RESPONSE   FACTOR   BINARY  NUMERIC 
#    FALSE    FALSE     TRUE    FALSE 
## same values, has names too (bonus!)
## wrap in `unname()` if you don't like names

How to loop many factors into one function

Using a for loop:

IndGraph <- vector()
for(i in CovidFake$Industry){
  IndGraph <- rbind(IndGraph, MainFunction(CovidFake, i))}

Output:

> IndGraph
# A tibble: 1,600 x 5
   Date       DailyCases CumSum Industry         SevenDayAve
   <date>          <dbl>  <dbl> <chr>                  <dbl>
 1 2020-10-01          0      0 Minor or Student      NA    
 2 2020-10-02          0      0 Minor or Student      NA    
 3 2020-10-03          1      1 Minor or Student      NA    
 4 2020-10-04          0      1 Minor or Student      NA    
 5 2020-10-05          0      1 Minor or Student      NA    
 6 2020-10-06          0      1 Minor or Student      NA    
 7 2020-10-07          1      2 Minor or Student       0.286
 8 2020-10-08          1      3 Minor or Student       0.429
 9 2020-10-09          2      5 Minor or Student       0.714
10 2020-10-10          0      5 Minor or Student       0.571
# ... with 1,590 more rows

Returning factors when applying a function to transform a dataframe of integers to decimal

Instead of apply, we can use lapply and assign it back to the original dataset

CI[] <- lapply(CI, format, nsmall = 2)

Or another option is dplyr

library(dplyr)
CI_NEW <- CI %>%
               mutate_all(format, nsmall = 2)

The issue is that the apply returns character output and as.data.frame by default use stringsAsFactors = TRUE for non-numeric elements. It can be corrected with stringsAsFactors = FALSE in the original code

as.data.frame(apply(CI, 2, function(x) 
       format(x, nsmall=3)), stringsAsFactors = FALSE)

R Function for Returning All Factors