Frequency tables with weighted data in R
You can use function svytable
from package survey
, or wtd.table
from rgrs
.
EDIT : rgrs
is now called questionr
:
df <- data.frame(var = c("A", "A", "B", "B"), wt = c(30, 10, 20, 40))
library(questionr)
wtd.table(x = df$var, weights = df$wt)
# A B
# 40 60
That's also possible with dplyr
:
library(dplyr)
count(x = df, var, wt = wt)
# # A tibble: 2 x 2
# var n
# <fctr> <dbl>
# 1 A 40
# 2 B 60
Weighted Frequency Table in R
In base R
, we can make use of xtabs/prop.table
. Based on the OP's code, the cumsum
is calculated from the order of occurrence of unique valuess in 'INTERVIEW_DAY'. So, to avoid the sort
ing based on the integer value, convert to factor
with levels
specified, get the sum
of 'WEIGHT' by 'INTERVIEW_DAY' with xtabs
, use prop.table
to return the proportion, and then apply cumsum
on that output
df$INTERVIEW_DAY <- factor(df$INTERVIEW_DAY, levels = unique(df$INTERVIEW_DAY))
tbl1 <- xtabs(WEIGHT ~ INTERVIEW_DAY, df)
Prop <- prop.table(tbl1)
Cum <- cumsum(100 * Prop / sum(Prop))
Cum
# 5 6 4 1 2 7 3
# 15.71029 39.30705 72.86967 76.02470 88.68935 89.66260 100.00000
out <- data.frame(INTERVIEW_DAY = names(tbl1), Freq = as.numeric(tbl1),
Prop = as.numeric(Prop), Cum = as.numeric(Cum))
row.names(out) <- NULL
out
# INTERVIEW_DAY Freq Prop Cum
#1 5 8155462.7 0.157102906 15.71029
#2 6 12249456.5 0.235967631 39.30705
#3 4 17422888.0 0.335626124 72.86967
#4 1 1637826.3 0.031550297 76.02470
#5 2 6574426.8 0.126646592 88.68935
#6 7 505227.2 0.009732453 89.66260
#7 3 5366309.3 0.103373998 100.00000
If we need a weighted frequency, use count
library(dplyr)
df %>%
mutate(INTERVIEW_DAY = factor(INTERVIEW_DAY, levels = unique(INTERVIEW_DAY))) %>%
count(INTERVIEW_DAY, wt = WEIGHT, sort = FALSE) %>%
mutate(Prop = n / sum(n),
Cum = cumsum(100 * Prop/sum(Prop)))
# A tibble: 7 x 4
# INTERVIEW_DAY n Prop Cum
# <fct> <dbl> <dbl> <dbl>
#1 5 8155463. 0.157 15.7
#2 6 12249456. 0.236 39.3
#3 4 17422888 0.336 72.9
#4 1 1637826. 0.0316 76.0
#5 2 6574427. 0.127 88.7
#6 7 505227. 0.00973 89.7
#7 3 5366309. 0.103 100.
Or with data.table
library(data.table)
setDT(df)[, .(Freq = sum(WEIGHT)), by = INTERVIEW_DAY
][, Prop := Freq / sum(Freq)][, Cum := cumsum(100 * Prop / sum(Prop))][]
# INTERVIEW_DAY Freq Prop Cum
#1: 5 8155462.7 0.157102906 15.71029
#2: 6 12249456.5 0.235967631 39.30705
#3: 4 17422888.0 0.335626124 72.86967
#4: 1 1637826.3 0.031550297 76.02470
#5: 2 6574426.8 0.126646592 88.68935
#6: 7 505227.2 0.009732453 89.66260
#7: 3 5366309.3 0.103373998 100.00000
data
df <- structure(list(TUCASEID = c(2.00301e+13, 2.00301e+13, 2.00301e+13,
2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13,
2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13,
2.00301e+13, 2.00301e+13), INTERVIEW_DAY = c(5L, 6L, 6L, 4L,
4L, 4L, 1L, 2L, 6L, 4L, 6L, 7L, 6L, 3L, 6L), WEIGHT = c(8155462.7,
1735322.5, 3830527.5, 6622023, 3068387.3, 3455424.9, 1637826.3,
6574426.8, 1528296.3, 4277052.8, 1961482.3, 505227.2, 2135476.8,
5366309.3, 1058351.1)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"))
Table in r to be weighted
Try this
GDAtools::wtable(df$sex, df$age, w = df$wgt)
Output
0-15 16-29 30-44 45+ NA tot
Female 56 73 60 76 0 265
Male 76 99 106 90 0 371
NA 0 0 0 0 0 0
tot 132 172 166 166 0 636
Update
In case you do not want to install the whole package, here are two essential functions you need:
wtable and dichotom
Source them and you should be able to use wtable
without any problem.
How to do a three-way weighted table in R - similar to wtd.table
First, here are some sample data (try to include these in your questions, even if it requires creating a sample data set like this). Note that I am using the tidyverse
packages here:
test <-
tibble(
var1 = "A"
, var2 = "b"
, var3 = "alpha") %>%
complete(
var1 = c("A", "B")
, var2 = c("a", "b")
, var3 = c("alpha", "beta")) %>%
mutate(wt = 1:n())
So, the data are:
# A tibble: 8 x 4
var1 var2 var3 wt
<chr> <chr> <chr> <int>
1 A a alpha 1
2 A a beta 2
3 A b alpha 3
4 A b beta 4
5 B a alpha 5
6 B a beta 6
7 B b alpha 7
8 B b beta 8
The function you are looking for then is xtabs
:
xtabs(wt ~ var1 + var2 + var3
, data = test)
gives:
, , var3 = alpha
var2
var1 a b
A 1 3
B 5 7
, , var3 = beta
var2
var1 a b
A 2 4
B 6 8
If you don't need the result to have the table
class, you can also do this by just using count
from dplyr
(part of the tidyverse
):
test %>%
count(var1, var2, var3
, wt = wt)
gives a tibble (a modified data.frame) with your results:
# A tibble: 8 x 4
var1 var2 var3 n
<chr> <chr> <chr> <int>
1 A a alpha 1
2 A a beta 2
3 A b alpha 3
4 A b beta 4
5 B a alpha 5
6 B a beta 6
7 B b alpha 7
8 B b beta 8
And you can then perform whatever calculations you want on it, e.g. the percent within each var3
:
test %>%
count(var1, var2, var3
, wt = wt) %>%
group_by(var3) %>%
mutate(prop_in_var3 = n / sum(n))
gives:
# A tibble: 8 x 5
# Groups: var3 [2]
var1 var2 var3 n prop_in_var3
<chr> <chr> <chr> <int> <dbl>
1 A a alpha 1 0.0625
2 A a beta 2 0.1
3 A b alpha 3 0.188
4 A b beta 4 0.2
5 B a alpha 5 0.312
6 B a beta 6 0.3
7 B b alpha 7 0.438
8 B b beta 8 0.4
Raw counts and percentages weighted by survey weight in R table?
You can use a combination of the survey
and gtsummary
packages. There is an option in survey::svydesign
to add weights. Then, the survey
object is piped into tbl_svysummary
. However, depending on your expected output, you might need to use a different statistic or adjust some of the other settings.
library(gtsummary)
library(dplyr)
results <-
survey::svydesign(~ 1, data = dat, weights = ~ sv_weight) %>%
tbl_svysummary(
by = year,
include = c(sex, race),
statistic = list(all_categorical() ~ "{n_unweighted} ({p}%)")
)
Output
How to show zeros in weighted table with r function wtable?
The blank values are actually NA
values displayed differently. You can capture them with is.na
.
tab <- GDAtools::wtable(mtcars$cyl,mtcars$gear, weights=mtcars$qsec, stat='freq')
tab[is.na(tab)] <- 0
tab
# 3 4 5 Sum
#4 20.0 156.9 33.6 210.5
#6 39.7 70.7 15.5 125.8
#8 205.7 0.0 29.1 0.0
#Sum 265.4 0.0 78.2 0.0
How to get the right frequency table weighted and unweighted in complex survey?
You don't need the survey package at all for the unweighted sample table, you can just use the table
or xtabs
function.
Or, if you only have the data conveniently available in the survey object
with(model.frame(dsub), table(edu,smok))
As a reproducible example
library(survey)
data(api)
with(apistrat, table(comp.imp, sch.wide))
#stratified sample
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
svytable(~comp.imp+sch.wide, design=dstrat)
with(model.frame(dstrat), table(comp.imp, sch.wide))
Frequency table with second variable as analytic weight in R
I found a non-elegant solution according to Stata help files.
I just added up the line
timeuse_2003$N_WEIGHT <- timeuse_2003$WEIGHT * 20720/ sum(timeuse_2003$WEIGHT)
and kept the code with
Table_WEIGHT <- xtabs(N_WEIGHT ~ INTERVIEW_DAY, timeuse_2003)
Prop <- prop.table(Table_WEIGHT)
Cum <- cumsum(100 * Prop / sum(Prop))
Cum
Freq_Table <- data.frame(INTERVIEW_DAY = names(Table_WEIGHT), Freq = as.numeric(Table_WEIGHT),
Prop = as.numeric(Prop), Cum = as.numeric(Cum))
Freq_Table
The table was then correct such as:
> Freq_Table
INTERVIEW_DAY Freq Prop Cum
1 1 2974.1424 0.14353969 14.353969
2 2 3065.6819 0.14795762 29.149731
3 3 2919.3688 0.14089618 43.239349
4 4 2916.1739 0.14074198 57.313547
5 5 2941.0530 0.14194271 71.507819
6 6 2962.0832 0.14295769 85.803587
7 7 2941.4968 0.14196413 100.000000
If someone could clarify how to substitute the number of observations I put in manually for something automatic (this code will be used in different datasets, so I can't update every single one, switching the number of observations everytime. Something like ".N" would be very fine!
Thank you!
Related Topics
Twitter Sentiment Analysis W R Using German Language Set Sentiws
Use Loop to Split a List into Multiple Dataframes
Why Is R Dplyr::Mutate Inconsistent with Custom Functions
R: I Have to Do Softmatch in String
How to Use "Cast" in Reshape Without Aggregation
Error When Exporting Dataframe to Text File in R
Data.Table Join and J-Expression Unexpected Behavior
System Is Computationally Singular: Reciprocal Condition Number in R
In R: Joining Vector Elements by Row, Converting Vector Rows to Strings
Tidyr Separate Only First N Instances
Rstudio Shiny Not Able to Use Ggvis
How to Bookmark and Restore Dynamically Added Modules
Split Data.Frame into Groups by Column Name
Is There a Fast Parser for Date
R: Ggplot2: Adding Count Labels to Histogram with Density Overlay
Efficiently Transform Multiple Columns of a Data Frame