Frequency Tables with Weighted Data in R

Frequency tables with weighted data in R

You can use function svytable from package survey, or wtd.table from rgrs.

EDIT : rgrs is now called questionr :

df <- data.frame(var = c("A", "A", "B", "B"), wt = c(30, 10, 20, 40))

library(questionr)
wtd.table(x = df$var, weights = df$wt)
# A B
# 40 60

That's also possible with dplyr :

library(dplyr)
count(x = df, var, wt = wt)
# # A tibble: 2 x 2
# var n
# <fctr> <dbl>
# 1 A 40
# 2 B 60

Weighted Frequency Table in R

In base R, we can make use of xtabs/prop.table. Based on the OP's code, the cumsum is calculated from the order of occurrence of unique valuess in 'INTERVIEW_DAY'. So, to avoid the sorting based on the integer value, convert to factor with levels specified, get the sum of 'WEIGHT' by 'INTERVIEW_DAY' with xtabs, use prop.table to return the proportion, and then apply cumsum on that output

df$INTERVIEW_DAY <- factor(df$INTERVIEW_DAY, levels = unique(df$INTERVIEW_DAY))
tbl1 <- xtabs(WEIGHT ~ INTERVIEW_DAY, df)
Prop <- prop.table(tbl1)
Cum <- cumsum(100 * Prop / sum(Prop))
Cum
# 5 6 4 1 2 7 3
# 15.71029 39.30705 72.86967 76.02470 88.68935 89.66260 100.00000

out <- data.frame(INTERVIEW_DAY = names(tbl1), Freq = as.numeric(tbl1),
Prop = as.numeric(Prop), Cum = as.numeric(Cum))
row.names(out) <- NULL
out
# INTERVIEW_DAY Freq Prop Cum
#1 5 8155462.7 0.157102906 15.71029
#2 6 12249456.5 0.235967631 39.30705
#3 4 17422888.0 0.335626124 72.86967
#4 1 1637826.3 0.031550297 76.02470
#5 2 6574426.8 0.126646592 88.68935
#6 7 505227.2 0.009732453 89.66260
#7 3 5366309.3 0.103373998 100.00000

If we need a weighted frequency, use count

library(dplyr)
df %>%
mutate(INTERVIEW_DAY = factor(INTERVIEW_DAY, levels = unique(INTERVIEW_DAY))) %>%
count(INTERVIEW_DAY, wt = WEIGHT, sort = FALSE) %>%
mutate(Prop = n / sum(n),
Cum = cumsum(100 * Prop/sum(Prop)))
# A tibble: 7 x 4
# INTERVIEW_DAY n Prop Cum
# <fct> <dbl> <dbl> <dbl>
#1 5 8155463. 0.157 15.7
#2 6 12249456. 0.236 39.3
#3 4 17422888 0.336 72.9
#4 1 1637826. 0.0316 76.0
#5 2 6574427. 0.127 88.7
#6 7 505227. 0.00973 89.7
#7 3 5366309. 0.103 100.

Or with data.table

library(data.table)
setDT(df)[, .(Freq = sum(WEIGHT)), by = INTERVIEW_DAY
][, Prop := Freq / sum(Freq)][, Cum := cumsum(100 * Prop / sum(Prop))][]
# INTERVIEW_DAY Freq Prop Cum
#1: 5 8155462.7 0.157102906 15.71029
#2: 6 12249456.5 0.235967631 39.30705
#3: 4 17422888.0 0.335626124 72.86967
#4: 1 1637826.3 0.031550297 76.02470
#5: 2 6574426.8 0.126646592 88.68935
#6: 7 505227.2 0.009732453 89.66260
#7: 3 5366309.3 0.103373998 100.00000

data

df <- structure(list(TUCASEID = c(2.00301e+13, 2.00301e+13, 2.00301e+13, 
2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13,
2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13,
2.00301e+13, 2.00301e+13), INTERVIEW_DAY = c(5L, 6L, 6L, 4L,
4L, 4L, 1L, 2L, 6L, 4L, 6L, 7L, 6L, 3L, 6L), WEIGHT = c(8155462.7,
1735322.5, 3830527.5, 6622023, 3068387.3, 3455424.9, 1637826.3,
6574426.8, 1528296.3, 4277052.8, 1961482.3, 505227.2, 2135476.8,
5366309.3, 1058351.1)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"))

Table in r to be weighted

Try this

GDAtools::wtable(df$sex, df$age, w = df$wgt)

Output

       0-15 16-29 30-44 45+ NA tot
Female 56 73 60 76 0 265
Male 76 99 106 90 0 371
NA 0 0 0 0 0 0
tot 132 172 166 166 0 636

Update

In case you do not want to install the whole package, here are two essential functions you need:

wtable and dichotom

Source them and you should be able to use wtable without any problem.

How to do a three-way weighted table in R - similar to wtd.table

First, here are some sample data (try to include these in your questions, even if it requires creating a sample data set like this). Note that I am using the tidyverse packages here:

test <-
tibble(
var1 = "A"
, var2 = "b"
, var3 = "alpha") %>%
complete(
var1 = c("A", "B")
, var2 = c("a", "b")
, var3 = c("alpha", "beta")) %>%
mutate(wt = 1:n())

So, the data are:

# A tibble: 8 x 4
var1 var2 var3 wt
<chr> <chr> <chr> <int>
1 A a alpha 1
2 A a beta 2
3 A b alpha 3
4 A b beta 4
5 B a alpha 5
6 B a beta 6
7 B b alpha 7
8 B b beta 8

The function you are looking for then is xtabs:

xtabs(wt ~ var1 + var2 + var3
, data = test)

gives:

 , , var3 = alpha

var2
var1 a b
A 1 3
B 5 7

, , var3 = beta

var2
var1 a b
A 2 4
B 6 8

If you don't need the result to have the table class, you can also do this by just using count from dplyr (part of the tidyverse):

test %>%
count(var1, var2, var3
, wt = wt)

gives a tibble (a modified data.frame) with your results:

# A tibble: 8 x 4
var1 var2 var3 n
<chr> <chr> <chr> <int>
1 A a alpha 1
2 A a beta 2
3 A b alpha 3
4 A b beta 4
5 B a alpha 5
6 B a beta 6
7 B b alpha 7
8 B b beta 8

And you can then perform whatever calculations you want on it, e.g. the percent within each var3:

test %>%
count(var1, var2, var3
, wt = wt) %>%
group_by(var3) %>%
mutate(prop_in_var3 = n / sum(n))

gives:

# A tibble: 8 x 5
# Groups: var3 [2]
var1 var2 var3 n prop_in_var3
<chr> <chr> <chr> <int> <dbl>
1 A a alpha 1 0.0625
2 A a beta 2 0.1
3 A b alpha 3 0.188
4 A b beta 4 0.2
5 B a alpha 5 0.312
6 B a beta 6 0.3
7 B b alpha 7 0.438
8 B b beta 8 0.4

Raw counts and percentages weighted by survey weight in R table?

You can use a combination of the survey and gtsummary packages. There is an option in survey::svydesign to add weights. Then, the survey object is piped into tbl_svysummary. However, depending on your expected output, you might need to use a different statistic or adjust some of the other settings.

library(gtsummary)
library(dplyr)

results <-
survey::svydesign(~ 1, data = dat, weights = ~ sv_weight) %>%
tbl_svysummary(
by = year,
include = c(sex, race),
statistic = list(all_categorical() ~ "{n_unweighted} ({p}%)")
)

Output

Sample Image

How to show zeros in weighted table with r function wtable?

The blank values are actually NA values displayed differently. You can capture them with is.na.

tab <- GDAtools::wtable(mtcars$cyl,mtcars$gear, weights=mtcars$qsec, stat='freq')
tab[is.na(tab)] <- 0
tab

# 3 4 5 Sum
#4 20.0 156.9 33.6 210.5
#6 39.7 70.7 15.5 125.8
#8 205.7 0.0 29.1 0.0
#Sum 265.4 0.0 78.2 0.0

How to get the right frequency table weighted and unweighted in complex survey?

You don't need the survey package at all for the unweighted sample table, you can just use the table or xtabs function.

Or, if you only have the data conveniently available in the survey object

with(model.frame(dsub), table(edu,smok))

As a reproducible example

library(survey)
data(api)

with(apistrat, table(comp.imp, sch.wide))
#stratified sample
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
svytable(~comp.imp+sch.wide, design=dstrat)
with(model.frame(dstrat), table(comp.imp, sch.wide))

Frequency table with second variable as analytic weight in R

I found a non-elegant solution according to Stata help files.
I just added up the line

timeuse_2003$N_WEIGHT <- timeuse_2003$WEIGHT * 20720/ sum(timeuse_2003$WEIGHT)

and kept the code with

Table_WEIGHT <- xtabs(N_WEIGHT ~ INTERVIEW_DAY, timeuse_2003)
Prop <- prop.table(Table_WEIGHT)
Cum <- cumsum(100 * Prop / sum(Prop))
Cum
Freq_Table <- data.frame(INTERVIEW_DAY = names(Table_WEIGHT), Freq = as.numeric(Table_WEIGHT),
Prop = as.numeric(Prop), Cum = as.numeric(Cum))
Freq_Table

The table was then correct such as:

> Freq_Table
INTERVIEW_DAY Freq Prop Cum
1 1 2974.1424 0.14353969 14.353969
2 2 3065.6819 0.14795762 29.149731
3 3 2919.3688 0.14089618 43.239349
4 4 2916.1739 0.14074198 57.313547
5 5 2941.0530 0.14194271 71.507819
6 6 2962.0832 0.14295769 85.803587
7 7 2941.4968 0.14196413 100.000000

If someone could clarify how to substitute the number of observations I put in manually for something automatic (this code will be used in different datasets, so I can't update every single one, switching the number of observations everytime. Something like ".N" would be very fine!

Thank you!



Related Topics



Leave a reply



Submit