Extract Survival Probabilities in Survfit by Groups

Extract survival probabilities in Survfit by groups

I can do it fairly easily with package 'rms' which has a survest function:

install.packages(rms, dependencies=TRUE);require(rms)
cfit <- cph(Surv(time, status) ~ x, data = aml, surv=TRUE)
survest(cfit, newdata=expand.grid(x=levels(aml$x)) ,
times=seq(0,50,by=10)
)$surv

0 10 20 30 40 50
1 1 0.8690765 0.7760368 0.6254876 0.4735880 0.21132505
2 1 0.7043047 0.5307801 0.3096943 0.1545671 0.02059005
Warning message:
In survest.cph(cfit, newdata = expand.grid(x = levels(aml$x)), times = seq(0, :
S.E. and confidence intervals are approximate except at predictor means.
Use cph(...,x=TRUE,y=TRUE) (and don't use linear.predictors=) for better estimates.

With package survival you can find a worked example on pages 264-265 of Therneau and Grambsch's book, but this would still require code to put out the values at particular times.

 fit <- coxph(Surv(time, status) ~ x, data = aml) 
temp=data.frame(x=levels(aml$x))
expfit <- survfit(fit, temp)
plot(expfit, xscale=365.25, ylab="Expected")

Sample Image

> expfit$surv
1 2
[1,] 0.9508567 0.88171694
[2,] 0.8975825 0.76343993
[3,] 0.8690765 0.70430463
[4,] 0.8405707 0.64800519
[5,] 0.8105393 0.59170883
[6,] 0.8105393 0.59170883
[7,] 0.7760369 0.53078004
[8,] 0.7057873 0.41876588
[9,] 0.6686309 0.36584513
[10,] 0.6686309 0.36584513
[11,] 0.6254878 0.30969426
[12,] 0.5773770 0.25357160
[13,] 0.5292685 0.20403922
[14,] 0.4735881 0.15456706
[15,] 0.4179153 0.11309373
[16,] 0.3484162 0.07179820
[17,] 0.2113251 0.02059003
[18,] 0.2113251 0.02059003

> expfit$time
[1] 5 8 9 12 13 16 18 23 27 28 30 31 33 34 43
[16] 45 48 161

As noted by @Skaul below: In survival package, the specific times can be inserted as: summary(expfit,time=c(0,10,20,30))$surv

Obtaining Survival Estimates in R

If all that you say is true, then this is a typical way of generating a list using indices as arguments:

 t4list <- lapply(1:11, function(x) summary(survplot[x], times=4)[1] )
t4list

If you really meant that you wanted a vector of survival estimates based at that time, then sapply would make an attempt to simply the result to an atomic form such as a numeric vector or a matrix in the case where the results were "multidimensional". I would have thought that you could have gotten a useful result with just:

summary(survplot, times=4)[1]

That should have succeeded in giving you a vector of predicted survival times (assuming such times exist.) If you get too greedy and push out the 'times' value past where there are estimates, then you will throw an error. Ironically that error will not be thrown if there is at least one time where all levels of the covariates have an estimate. Using the example in the help page as a starting point:

fit <- survfit(Surv(time, status) ~ x, data = aml) 
summary(fit, times=c(10, 20, 60) )[1]
#$surv
#[1] 0.9090909 0.7159091 0.1840909 0.6666667 0.5833333
# not very informative about which times and covariates were estimated
# and which are missing

# this is more informative
as.data.frame( summary(fit, times=c(10, 20, 60) )[c("surv", "time", "strata")])
surv time strata
1 0.9090909 10 x=Maintained
2 0.7159091 20 x=Maintained
3 0.1840909 60 x=Maintained
4 0.6666667 10 x=Nonmaintained
5 0.5833333 20 x=Nonmaintained

Whereas if you just use 60 you get an error message:

> summary(fit, times=c( 60) )[1]
Error in factor(rep(1:nstrat, scount), labels = names(fit$strata)) :
invalid labels; length 2 should be 1 or 1

Extract the survival times vector from survival object in R

To extract data from a survival object (Surv() function), you can use subsetting:

library(survival)

set.seed(123)
x <- Surv(rexp(100, rate = 0.001))

# Subsetting
x[,1]

R Extract Data From SurvFit

You can extract the data from the summary of the fitted object using lapply

sfit <- summary(mfit2)
str(sfit)

List of 24
$ n : int [1:2] 631 753
$ time : num [1:359] 1 2 3 4 5 6 7 8 9 10 ...
$ n.risk : int [1:359, 1:3] 631 610 599 595 588 587 581 580 573 569 ...
$ n.event : int [1:359, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
$ n.censor : num [1:359] 1 0 0 0 0 0 0 0 0 1 ...
$ pstate : num [1:359, 1:3] 0.968 0.951 0.944 0.933 0.932 ...
$ p0 : num [1:2, 1:3] 1 1 0 0 0 0
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "sex=F" "sex=M"
.. ..$ : chr [1:3] "(s0)" "pcm" "death"
$ strata : Factor w/ 2 levels "sex=F","sex=M": 1 1 1 1 1 1 1 1 1 1 ...
...

I think the columns you need are the time, pstate and `strata. But some others, such as the numbers at risk may be useful.

cols <- lapply(c(2:6, 8, 16, 17), function(x) sfit[x])

Then combine these columns into a data frame with do.call

data <- do.call(data.frame, cols)
str(data)
'data.frame': 359 obs. of 21 variables:
$ time : num 1 2 3 4 5 6 7 8 9 10 ...
$ n.risk.1 : int 631 610 599 595 588 587 581 580 573 569 ...
$ n.risk.2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ n.risk.3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ n.event.1: int 0 0 0 0 0 0 0 0 0 0 ...
$ n.event.2: int 0 2 0 1 0 1 0 0 2 1 ...
$ n.event.3: int 20 9 4 6 1 5 1 7 2 2 ...
$ n.censor : num 1 0 0 0 0 0 0 0 0 1 ...
$ pstate.1 : num 0.968 0.951 0.944 0.933 0.932 ...
$ pstate.2 : num 0 0.00317 0.00317 0.00476 0.00476 ...
$ pstate.3 : num 0.0317 0.046 0.0523 0.0619 0.0634 ...
$ strata : Factor w/ 2 levels "sex=F","sex=M": 1 1 1 1 1 1 1 1 1 1 ...
$ lower.1 : num 0.955 0.934 0.927 0.914 0.912 ...
$ lower.2 : num NA 0.000796 0.000796 0.00154 0.00154 ...
$ lower.3 : num 0.0206 0.0322 0.0375 0.0456 0.047 ...
$ upper.1 : num 0.982 0.968 0.963 0.953 0.952 ...
$ upper.2 : num NA 0.0127 0.0127 0.0147 0.0147 ...
$ upper.3 : num 0.0488 0.0656 0.0729 0.0838 0.0856 ...

This data is in wide format, best to reshape to long for the graph.

mgus3 <- data %>%
pivot_longer(cols=-c(time, strata, n.censor),
names_to=c(".value","state"),
names_pattern="(.+).(.+)") %>%
filter(state!=1) %>% # Exclude the censored state
mutate(state=factor(state, labels=c("pcm","death")),
group=interaction(strata, state))

Then plot it.

library(ggplot)

mgus3 %>%
ggplot(aes(x=time, y=pstate, col=group)) +
geom_line(aes(linetype=group)) +
ylab("Probability in State") +
theme_bw()

Sample Image

You should be able to add confidence bands and make it more pretty.

R - Extract summary table from Survfit with Strata

It is basically the same as you have there, just an extra column

res <- summary( survfit( Surv(futime, fustat)~rx, data=ovarian))
cols <- lapply(c(2:6, 8:11) , function(x) res[x])
tbl <- do.call(data.frame, cols)
head(tbl)
# time n.risk n.event n.censor surv strata std.err upper lower
# 1 59 13 1 0 0.9230769 rx=1 0.0739053 1.0000000 0.7890186
# 2 115 12 1 0 0.8461538 rx=1 0.1000683 1.0000000 0.6710952
# 3 156 11 1 0 0.7692308 rx=1 0.1168545 1.0000000 0.5711496
# 4 268 10 1 0 0.6923077 rx=1 0.1280077 0.9946869 0.4818501
# 5 329 9 1 0 0.6153846 rx=1 0.1349320 0.9457687 0.4004132
# 6 431 8 1 0 0.5384615 rx=1 0.1382642 0.8906828 0.3255265

How to calculate confidence intervals for crude survival rates?

The print result of survfit gives confidnce intervals by group for median survivla time. I'm guessing the NA's for the estimates of median times is occurring because your groups are not having enough events to actually get to a median survival. You should show the output of plot(fit) to see whether my guess is correct.

You might try to plot the KM curves, noting that the plot.survfit function does have a confidence interval option constructed around proportions:

plot(fit, conf.int=0.95, col=1:2)

Please read ?summary.survfit. It is the class of generic summary functions which are typically used by package authors to deliver the parameter estimates and confidence intervals. There you will see that it is not "rates" which are summarized by summary.survfit, but rather estimates of survival proportion. These proportions can either be medians (in which case the estimate is on the time scale) or they can be estimates at particular times (and in that instance the estimates are of proportions.)

If you actually do want rates then use a functions designed for that sort of model, perhaps using ?survreg. Compare what you get from using survreg versus survfit on the supplied dataset ovarian:

> reg.fit <- survreg( Surv(futime, fustat)~rx, data=ovarian)
> summary(reg.fit)

Call:
survreg(formula = Surv(futime, fustat) ~ rx, data = ovarian)
Value Std. Error z p
(Intercept) 6.265 0.778 8.05 8.3e-16
rx 0.559 0.529 1.06 0.29
Log(scale) -0.121 0.251 -0.48 0.63

Scale= 0.886

Weibull distribution
Loglik(model)= -97.4 Loglik(intercept only)= -98
Chisq= 1.18 on 1 degrees of freedom, p= 0.28
Number of Newton-Raphson Iterations: 5
n= 26

#-------------

> fit <- survfit( Surv(futime, fustat)~rx, data=ovarian)
> summary(fit)
Call: survfit(formula = Surv(futime, fustat) ~ rx, data = ovarian)

rx=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
59 13 1 0.923 0.0739 0.789 1.000
115 12 1 0.846 0.1001 0.671 1.000
156 11 1 0.769 0.1169 0.571 1.000
268 10 1 0.692 0.1280 0.482 0.995
329 9 1 0.615 0.1349 0.400 0.946
431 8 1 0.538 0.1383 0.326 0.891
638 5 1 0.431 0.1467 0.221 0.840

rx=2
time n.risk n.event survival std.err lower 95% CI upper 95% CI
353 13 1 0.923 0.0739 0.789 1.000
365 12 1 0.846 0.1001 0.671 1.000
464 9 1 0.752 0.1256 0.542 1.000
475 8 1 0.658 0.1407 0.433 1.000
563 7 1 0.564 0.1488 0.336 0.946

Might have been easier if I had used "exponential" instead of "weibull" as the distribution type. Exponential fits have a single parameter that is estimated and are more easily back-transformed to give estimates of rates.

Note: I answered an earlier question about survfit, although the request was for survival times rather than for rates. Extract survival probabilities in Survfit by groups



Related Topics



Leave a reply



Submit