R Function for returning ALL factors
To follow up on my comment (thanks to @Ramnath for my typo), the brute force method seems to work reasonably well here on my 64 bit 8 gig machine:
FUN <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x))
factors <- div[x %% div == 0L]
factors <- list(neg = -factors, pos = factors)
return(factors)
}
A few examples:
> FUN(100)
$neg
[1] -1 -2 -4 -5 -10 -20 -25 -50 -100
$pos
[1] 1 2 4 5 10 20 25 50 100
> FUN(-42)
$neg
[1] -1 -2 -3 -6 -7 -14 -21 -42
$pos
[1] 1 2 3 6 7 14 21 42
#and big number
> system.time(FUN(1e8))
user system elapsed
1.95 0.18 2.14
Return factor level from a function, not an integer in R
You can use the levels of the variable Classes
and the output of the ifelse statement as follows:
data <- data.frame(a = 1:10)
find_class <- function(i) {
classes <- factor(c('A', 'B', 'C'))
idx <- ifelse(i %in% c(1, 3, 5), classes[1],
ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))
res <- levels(classes)[idx]
factor(res, levels(classes))
}
data$class <- find_class(data$a)
data$class
# [1] A B A B A C C C B C
# Levels: A B C
data
# a class
# 1 1 A
# 2 2 B
# 3 3 A
# 4 4 B
# 5 5 A
# 6 6 C
# 7 7 C
# 8 8 C
# 9 9 B
# 10 10 C
Making list of factors in a function but return warning error
Solution is simple, you just need double brackets around "Factors" :)
In the function
result[["Factors"]] <- unique(dataset[[3]])
That should be the line.
The double brackets return an element, single brackets return that selection as a list.
Sounds silly, by try this
test <- list()
class(test["Factors"])
class(test[["Factors"]])
The first class will be of type 'list'. The second will be of type 'NULL'. This is because the single brackets returns a subset as a list, and the double brackets return the element itself. It's useful depending on the scenario. The element in this case is "NULL" because nothing has been assigned to it.
The error "number of items to replace is not a multiple of replacement length" is because you've asked it to put 3 things into a single element (that element is a list). When you use double brackets you actually put it inside a list, where you can have multiple elements, so it can work!
Hope that makes sense!
Factor function in R returning NAs
We can't see your data, but the behavior is indicative of Data$Month
containing a value that is not included in your level term xnames
. Is anything misspelled? I would suggest you compare levels(as.factor(Data$Month))
and xnames
- it will certainly show you the issue.
Example dataset that shows the same problem you have:
yums <- c('soup', 'salad', 'bread')
nums <- c(10, 14, 5)
df1 <- data.frame(yums, nums)
yum.levels <- c('soup', 'salad', 'bread', 'pasta')
ggplot(df1, aes(x=factor(yums, yum.levels), y=nums)) + geom_col()
That gives you this:
...but if we mispell one of them (like capitalizing "Soup" in yums
), you get this:
yums1 <- c('Soup', 'salad', 'bread')
nums <- c(10, 14, 5)
df2 <- data.frame(yums1, nums)
yum.levels <- c('soup', 'salad', 'bread', 'pasta')
ggplot(df2, aes(x=factor(yums1, yum.levels), y=nums)) + geom_col()
R - table() returns repeated factors
The output of table here is confusing because your factor levels (1 to 5) look like row numbers, and there are some blank (""
) responses to the Startrek variable which makes it appear like the data is only under the No and Yes columns.
So, the data here is a 5 by 3 table, with the rows representing the score from Anakin
(1 to 5) and the columns representing 3 types of response to Startrek
(""
, No
, Yes
).
Note that where there are NA's in Anakin
, this data is ingored in the table. To count these too, use addNA
:
table(addNA(starwars$Anakin), starwars$Startrek)
Giving a conditional onto a function that performs for factors
An option is to split the code before we use the if
i.e. select
the columns that are factor
and create a new object ('weight_sub'), then check the length
on the 'weight_sub', if
it is greater than 0, do the rest of model.matrix
and assign it back to 'weight'
weight_sub <- weight %>%
select(where(is.factor))
if(length(weight_sub) > 0) {
weight_sub %>%
map_dfc(~ model.matrix(~ .x -1) %>%
as_tibble) %>%
rename_all(~ str_remove(., "\\.x")) %>%
bind_cols(weight, .) -> weight
}
-output
# wgt hgt age id sex black brown white female male
#1 64 57 8 black female 1 0 0 1 0
#2 71 59 10 black female 1 0 0 1 0
#3 53 49 6 black female 1 0 0 1 0
#4 67 62 11 black female 1 0 0 1 0
#5 55 51 8 white female 0 0 1 1 0
#6 58 50 7 white female 0 0 1 1 0
#7 77 55 10 white male 0 0 1 0 1
#8 57 48 9 white male 0 0 1 0 1
#9 56 42 10 brown male 0 1 0 0 1
#10 51 42 6 brown male 0 1 0 0 1
#11 76 61 12 brown male 0 1 0 0 1
#12 68 57 9 brown male 0 1 0 0 1
As a negative test, do this by checking if it is a character
class column
weight_sub <- weight %>%
select(where(is.character))
if(length(weight_sub) > 0) {
weight_sub %>%
map_dfc(~ model.matrix(~ .x -1) %>%
as_tibble) %>%
rename_all(~ str_remove(., "\\.x")) %>%
bind_cols(weight, .) -> weight
}
No output as the if
condition returns FALSE
, thus the 'weight' dataset remains the same without adding any new columns
In the update, if the OP is also using numeric
columns to be passed into model.matrix
, it just returns the same column i.e. one column (as we are looping over columns with map
) with the column name as .x
(from model.matrix
formula). This .x
column name is removed with rename_all
when we use str_remove
, leaving a blank column name, which by default is filled with a column name assigned as 'col' from _dfc
. To prevent, that, we can use an if/else
condition before doing this to append the original column name as suffix for those having one column output and is a numeric one
weight %>%
imap_dfc(~ {
nm1 <- .y
tmp <- model.matrix(~ .x - 1) %>%
as_tibble
if(ncol(tmp) == 1 && class(tmp[[1]]) == 'numeric') {
names(tmp) <- paste0(names(tmp), nm1)
}
tmp
}) %>%
rename_all(~ str_remove(., "\\.x"))
-output
# A tibble: 12 x 8
# wgt hgt age black brown white female male
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 64 57 8 1 0 0 1 0
# 2 71 59 10 1 0 0 1 0
# 3 53 49 6 1 0 0 1 0
# 4 67 62 11 1 0 0 1 0
# 5 55 51 8 0 0 1 1 0
# 6 58 50 7 0 0 1 1 0
# 7 77 55 10 0 0 1 0 1
# 8 57 48 9 0 0 1 0 1
# 9 56 42 10 0 1 0 0 1
#10 51 42 6 0 1 0 0 1
#11 76 61 12 0 1 0 0 1
#12 68 57 9 0 1 0 0 1
Or we do this with Map
in base R
out <- do.call(cbind, unname(Map(function(x, y) {
tmp <- as.data.frame(model.matrix(~x -1))
if(ncol(tmp) == 1 & class(tmp[[1]]) == 'numeric') {
names(tmp) <- paste0(names(tmp), y)}
tmp
}, weight, names(weight))))
names(out) <- sub('^x', '', names(out))
out
# wgt hgt age black brown white female male
#1 64 57 8 1 0 0 1 0
#2 71 59 10 1 0 0 1 0
#3 53 49 6 1 0 0 1 0
#4 67 62 11 1 0 0 1 0
#5 55 51 8 0 0 1 1 0
#6 58 50 7 0 0 1 1 0
#7 77 55 10 0 0 1 0 1
#8 57 48 9 0 0 1 0 1
#9 56 42 10 0 1 0 0 1
#10 51 42 6 0 1 0 0 1
#11 76 61 12 0 1 0 0 1
#12 68 57 9 0 1 0 0 1
return index of all factor variables that don't have a predefined name
## Desired result?
names(data)=="BINARY"
# [1] FALSE FALSE TRUE FALSE
## Desired method
response.variable.name<-"RESPONSE"
sapply(data,function(x) nlevels(as.factor(x))==2) & !names(data) %in% response.variable.name
# RESPONSE FACTOR BINARY NUMERIC
# FALSE FALSE TRUE FALSE
## same values, has names too (bonus!)
## wrap in `unname()` if you don't like names
How to loop many factors into one function
Using a for loop:
IndGraph <- vector()
for(i in CovidFake$Industry){
IndGraph <- rbind(IndGraph, MainFunction(CovidFake, i))}
Output:
> IndGraph
# A tibble: 1,600 x 5
Date DailyCases CumSum Industry SevenDayAve
<date> <dbl> <dbl> <chr> <dbl>
1 2020-10-01 0 0 Minor or Student NA
2 2020-10-02 0 0 Minor or Student NA
3 2020-10-03 1 1 Minor or Student NA
4 2020-10-04 0 1 Minor or Student NA
5 2020-10-05 0 1 Minor or Student NA
6 2020-10-06 0 1 Minor or Student NA
7 2020-10-07 1 2 Minor or Student 0.286
8 2020-10-08 1 3 Minor or Student 0.429
9 2020-10-09 2 5 Minor or Student 0.714
10 2020-10-10 0 5 Minor or Student 0.571
# ... with 1,590 more rows
Returning factors when applying a function to transform a dataframe of integers to decimal
Instead of apply
, we can use lapply
and assign it back to the original dataset
CI[] <- lapply(CI, format, nsmall = 2)
Or another option is dplyr
library(dplyr)
CI_NEW <- CI %>%
mutate_all(format, nsmall = 2)
The issue is that the apply
returns character
output and as.data.frame
by default use stringsAsFactors = TRUE
for non-numeric elements. It can be corrected with stringsAsFactors = FALSE
in the original code
as.data.frame(apply(CI, 2, function(x)
format(x, nsmall=3)), stringsAsFactors = FALSE)
Related Topics
Dplyr - Using Mutate() Like Rowmeans()
R: How to Get the Week Number of the Month
Bigrams Instead of Single Words in Termdocument Matrix Using R and Rweka
How to Access and Edit Rprofile
How to Determine the Namespace of a Function
Stop an R Program Without Error
Smaller Gap Between Two Legends in One Plot (E.G. Color and Size Scale)
Reading Global Variables Using Foreach in R
Join Two Data Frames in R Based on Closest Timestamp
Install.Packages Fails in Knitr Document: "Trying to Use Cran Without Setting a Mirror"
Importing CSV File into R - Numeric Values Read as Characters
Find All Functions (Including Private) in a Package
Data Input via Shinytable in R Shiny Application
R: How to Split a Data Frame into Training, Validation, and Test Sets