modify glm function to adopt user-specified link function in R
I'm basically following the form of the example in ?family
which shows a user-specified link of the form qlogis(mu^(1/days))
.
We want a link of the form eta = log(exp(y)-1)
(so the inverse link is y=log(exp(eta)+1)
, and mu.eta = dy/d(eta) = 1/(1+exp(-eta))
vlog <- function() {
## link
linkfun <- function(y) log(exp(y)-1)
## inverse link
linkinv <- function(eta) log(exp(eta)+1)
## derivative of invlink wrt eta
mu.eta <- function(eta) { 1/(exp(-eta) + 1) }
valideta <- function(eta) TRUE
link <- "log(exp(y)-1)"
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta,
name = link),
class = "link-glm")
}
Basic checks:
vv <- vlog()
vv$linkfun(vv$linkinv(27)) ## check invertibility
library("numDeriv")
all.equal(grad(vv$linkinv,2),vv$mu.eta(2)) ## check derivative
Example:
set.seed(101)
n <- 1000
x <- runif(n)
sh <- 2
y <- rgamma(n,scale=vv$linkinv(2+3*x)/sh,shape=sh)
glm(y~x,family=Gamma(link=vv))
##
## Call: glm(formula = y ~ x, family = Gamma(link = vv))
##
## Coefficients:
## (Intercept) x
## 1.956 3.083
##
## Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
## Null Deviance: 642.2
## Residual Deviance: 581.8 AIC: 4268
##
calling the glm() function within a user-defined function
You could use any of the following:
Using
substitute
:
to_analyze <- function(dep, indep, data){
glm(substitute(dep ~ factor(indep)), data=data)
}
to_analyze(dep=age, indep=sex, data=dsn)
Advantage: Can write the independent as a formula.
eg
to_analyze(Petal.Width, Sepal.Length + Sepal.Width, data = iris)
Using reformulate as stated by @NelsonGon
to_analyze <- function(dep, indep, data){
glm(reformulate(sprintf("factor(%s)",indep), dep), data = data)
}
Note that to call this function, the variables aught to be of type character
to_analyze(dep= "age", indep="sex", data=dsn)
Recall
glm
can also take a string that can be parsed to a formula:
to_analyze <- function(dep, indep, data){
glm(sprintf("%s~factor(%s)", dep, indep), data = data)
}
to_analyze("age", "sex", data=dsn)
or even:
to_analyze <- function(dep, indep, data){
glm(paste(dep,"~ factor(",indep,")"), data = data)
}
to_analyze("age", "sex", data=dsn)
LASTLY: to combine both the substitute and paste:
to_analyze <- function(dep, indep, data){
glm(paste(substitute(dep),"~ factor(",substitute(indep),")"), data = data)
}
will work for both symbols and characters. eg:
to_analyze(age, sex, data=dsn)
to_analyze("age", "sex", data=dsn)
Shall I request author premision to modify r function
As far as I know, if the pkg is on CRAN and if the Licence is GPL (>=2), your are allowed to copy and modify the content as long as the modified content is still in GPL and that you state that you modified the content. So you don't need to ask for the permission of the pkg creator.
A good practice would be to create your own package, calling it 'pkgextra' (where pkg is the name of the package) and stating in the DESCRIPTION that the package is built on top of another package e.g tidystringdist which is built on top of stringdist or ggExtra which is built on top of ggpot. Also, as R packages have a Dependencies component, you're clearly stating in the DESCRIPTION that you built your package depending on other packages.
To wrap up, no, you don't need the permission from the package author for as long as you distribute the created work with the same licence and that you state that you depend on this package.
R GLM function omitting data
You are correct in that na.omit
will omit the missing values and run your model. In fact, you should see identical outputs when you run summary(model_1)
and summary(model_2)
.
However, the nagelkerke
function that you are using runs into issues when there are NA values in one variable from the original dataset. From there documentation...
The fitted model and the null model should be properly nested. That is, the terms of one need to be a subset of the the other, and they should have the same set of observations. One issue arises when there are NA values in one variable but not another, and observations with NA are removed in the model fitting. The result may be fitted and null models with different sets of observations. Setting restrictNobs to TRUE ensures that only observations in the fit model are used in the null model. This appears to work for lm and some glm models, but causes the function to fail for other model object types
If you set restrictNobs
to TRUE
you should see the same output
How to dynamically name variables in formula in lm() function?
The core issue to understand here is that lm()
takes a type formula
as the first parameter that specifies the regression.
You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.
To simplify your example, start with:
y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))
df <- as.data.frame(cbind(y1,x1,x2,x3))
predictors = c("x1", "x2", "x3")
Now you can dynamically create a formula as as concatenated string (paste0
) and convert it to a formula. Then pass this formula to your lm()
call:
form1 = as.formula(paste0("y1~", predictors[1]))
lm(form1, data = df)
As akrun pointed out, you can then start doing things like create loops to dynamically generate these.
You can also do things like:
my_formula = as.formula(paste0("y1~", paste0(predictors, collapse="+")))
## generates y1 ~ x1 + x2 + x3
lm(my_formula, data = df)
See also: Formula with dynamic number of variables
One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate
. From ?reformulate
:
reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.
meaning of family quasi and the link function inverse in glm
The inverse link function is just f(x) = 1/x. If you create a family
object with the command
fam <- quasi(link = "inverse")
the link function is set to the inverse function:
fam$linkfun
# function (mu)
# 1/mu
# <environment: namespace:stats>
By default, the link function for quasi
is "identity"
, i.e., f(x) = x.
The details of quasi
can be found in the function. Have a look at the structure with
str(quasi())
# List of 12
# $ family : chr "quasi"
# $ link : chr "identity"
# $ linkfun :function (mu)
# $ linkinv :function (eta)
# $ variance :function (mu)
# $ dev.resids:function (y, mu, wt)
# $ aic :function (y, n, mu, wt, dev)
# $ mu.eta :function (eta)
# $ initialize: expression({ n <- rep.int(1, nobs) mustart <- y })
# $ validmu :function (mu)
# $ valideta :function (eta)
# $ varfun : chr "constant"
# - attr(*, "class")= chr "family"
You can access the elements with $
, for example
quasi()$variance
# function (mu)
# rep.int(1, length(mu))
# <bytecode: 0x100f30060>
# <environment: 0x101be4940>
to find details of quasi
. By default, quasi
assumes constant variance.
Related Topics
What Does the @ Symbol Mean in R
Annotate Ggplot with an Extra Tick and Label
Label X Axis in Time Series Plot Using R
List of Word Frequencies Using R
Ggplot2 Legend for Stat_Summary
Possible to Create Latex Multicolumns in Xtable
Crop for Spatialpolygonsdataframe
Output Error/Warning Log (Txt File) When Running R Script Under Command Line
Avoiding the Infamous "Eval(Parse())" Construct
Strange Formatting of Legend in Ggplotly in R
Submit Form with No Submit Button in Rvest
Ggplot2, Geom_Bar, Dodge, Order of Bars