Creating a table with individual trials from a frequency table in R (inverse of table function)
You may try this:
# create 'result' vector
# repeat 1s and 0s the number of times given in the respective 'count' column
result <- rep(rep(c(1, 0), nrow(df)), unlist(df[ , c("success.count", "fail.count")]))
# repeat each row in df the number of times given by the sum of 'count' columns
data.frame(df[rep(1:nrow(df), rowSums(df[ , c("success.count", "fail.count")]) ), c("factor.A", "factor.B")], result)
# factor.A factor.B result
# 1 0 1 0
# 1.1 0 1 0
# 2 1 1 1
# 2.1 1 1 1
# 2.2 1 1 0
Weighted Frequency Table in R
In base R
, we can make use of xtabs/prop.table
. Based on the OP's code, the cumsum
is calculated from the order of occurrence of unique valuess in 'INTERVIEW_DAY'. So, to avoid the sort
ing based on the integer value, convert to factor
with levels
specified, get the sum
of 'WEIGHT' by 'INTERVIEW_DAY' with xtabs
, use prop.table
to return the proportion, and then apply cumsum
on that output
df$INTERVIEW_DAY <- factor(df$INTERVIEW_DAY, levels = unique(df$INTERVIEW_DAY))
tbl1 <- xtabs(WEIGHT ~ INTERVIEW_DAY, df)
Prop <- prop.table(tbl1)
Cum <- cumsum(100 * Prop / sum(Prop))
Cum
# 5 6 4 1 2 7 3
# 15.71029 39.30705 72.86967 76.02470 88.68935 89.66260 100.00000
out <- data.frame(INTERVIEW_DAY = names(tbl1), Freq = as.numeric(tbl1),
Prop = as.numeric(Prop), Cum = as.numeric(Cum))
row.names(out) <- NULL
out
# INTERVIEW_DAY Freq Prop Cum
#1 5 8155462.7 0.157102906 15.71029
#2 6 12249456.5 0.235967631 39.30705
#3 4 17422888.0 0.335626124 72.86967
#4 1 1637826.3 0.031550297 76.02470
#5 2 6574426.8 0.126646592 88.68935
#6 7 505227.2 0.009732453 89.66260
#7 3 5366309.3 0.103373998 100.00000
If we need a weighted frequency, use count
library(dplyr)
df %>%
mutate(INTERVIEW_DAY = factor(INTERVIEW_DAY, levels = unique(INTERVIEW_DAY))) %>%
count(INTERVIEW_DAY, wt = WEIGHT, sort = FALSE) %>%
mutate(Prop = n / sum(n),
Cum = cumsum(100 * Prop/sum(Prop)))
# A tibble: 7 x 4
# INTERVIEW_DAY n Prop Cum
# <fct> <dbl> <dbl> <dbl>
#1 5 8155463. 0.157 15.7
#2 6 12249456. 0.236 39.3
#3 4 17422888 0.336 72.9
#4 1 1637826. 0.0316 76.0
#5 2 6574427. 0.127 88.7
#6 7 505227. 0.00973 89.7
#7 3 5366309. 0.103 100.
Or with data.table
library(data.table)
setDT(df)[, .(Freq = sum(WEIGHT)), by = INTERVIEW_DAY
][, Prop := Freq / sum(Freq)][, Cum := cumsum(100 * Prop / sum(Prop))][]
# INTERVIEW_DAY Freq Prop Cum
#1: 5 8155462.7 0.157102906 15.71029
#2: 6 12249456.5 0.235967631 39.30705
#3: 4 17422888.0 0.335626124 72.86967
#4: 1 1637826.3 0.031550297 76.02470
#5: 2 6574426.8 0.126646592 88.68935
#6: 7 505227.2 0.009732453 89.66260
#7: 3 5366309.3 0.103373998 100.00000
data
df <- structure(list(TUCASEID = c(2.00301e+13, 2.00301e+13, 2.00301e+13,
2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13,
2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13,
2.00301e+13, 2.00301e+13), INTERVIEW_DAY = c(5L, 6L, 6L, 4L,
4L, 4L, 1L, 2L, 6L, 4L, 6L, 7L, 6L, 3L, 6L), WEIGHT = c(8155462.7,
1735322.5, 3830527.5, 6622023, 3068387.3, 3455424.9, 1637826.3,
6574426.8, 1528296.3, 4277052.8, 1961482.3, 505227.2, 2135476.8,
5366309.3, 1058351.1)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"))
R converting from short form to long form with counts in the short form
Update
You can try the following:
DT[, rep(names(.SD), .SD), by = ID]
# ID V1
# 1: 1 A
# 2: 1 A
# 3: 1 C
# 4: 2 B
# 5: 3 B
# 6: 3 C
# 7: 3 C
# 8: 4 A
Keeps the order you want too...
You can try the following. I've never used expandRows
on what would become ~ 300 million rows, but it's basically rep
, so it shouldn't be slow.
This uses melt
+ expandRows
from my "splitstackshape" package. It works with data.frame
s or data.table
s, so you might as well use data.table
for the faster melting....
library(reshape2)
library(splitstackshape)
expandRows(melt(mydf, id.vars = "ID"), "value")
# The following rows have been dropped from the input:
#
# 2, 3, 5, 8, 10, 12
#
# ID variable
# 1 1 A
# 1.1 1 A
# 4 4 A
# 6 2 B
# 7 3 B
# 9 1 C
# 11 3 C
# 11.1 3 C
seq() function in R
I think you have some misconceptions going on here.
seq
generates a sequence of a a priori known pattern. You mentioned one example with seq(from=1, to=10)
. Another version would be just to use multiples of two like
seq(from=2, to=10, by=2)
What you are doing is to write down your desired numbers hard-coded. Thus, you just could put them into a vector using c
(which is probably the most basic R function I know of...)
c(2,4,6,8,10,12,10,9,7,3,1)
For further details, see ?seq
or ?c
.
Cut and Table function in R
For your question # 2 it depends on what you mean by "better".
Here is one option:
library(TeachingDemos)
x[ 30 %<% x %<=% 40 ]
Or, instead of using cut
you can use findInterval
:
y <- findInterval(x, c(30,40,50,60))
x[ y==1 ]
You could also look at the subset
function.
If these don't match your definition of "better", then tell us more about what you want.
Print varible names in table() with 2 binary variables in R
You can use the dnn
option :
table(df$tr,df$fall_term) # impossible to tell the difference
0 1
0 18 33
1 15 34
table(df$tr,df$fall_term,dnn=c('tr','fall_term')) # you have the names
fall_term
tr 0 1
0 18 33
1 15 34
Note that it's easier (and safer) to do table(df$tr,df$fall_term,dnn=colnames(df))
Related Topics
Robust Standard Errors for Mixed-Effects Models in Lme4 Package of R
How to Change The Character Encoding of .R File in Rstudio
R -Apply- Convert Many Columns from Numeric to Factor
Adding Row to a Data Frame with Missing Values
Manually Set Order of Fill Bars in Arbitrary Order Using Ggplot2
How to Add Columnn Titles in a Sankey Chart Networkd3
How to Debug Methods from Reference Classes
How to Add Multiple Columns to a Tibble
Schedule a Rscript Crontab Everyminute
How to Simulate Bimodal Distribution
Importing an Excel File with Greek Characters into R in The Correct Encoding
How to Create Dynamic Number of Observeevent in Shiny
How to Remove Rows with Nas Only If They Are Present in More Than Certain Percentage of Columns
R: Remove Repeating Row Entries in Gridextra Table
How to Position Annotate Text in The Blank Area of Facet Ggplot