Creating a Table with Individual Trials from a Frequency Table in R (Inverse of Table Function)

Creating a table with individual trials from a frequency table in R (inverse of table function)

You may try this:

# create 'result' vector
# repeat 1s and 0s the number of times given in the respective 'count' column
result <- rep(rep(c(1, 0), nrow(df)), unlist(df[ , c("success.count", "fail.count")]))

# repeat each row in df the number of times given by the sum of 'count' columns
data.frame(df[rep(1:nrow(df), rowSums(df[ , c("success.count", "fail.count")]) ), c("factor.A", "factor.B")], result)

#     factor.A factor.B result
# 1          0        1      0
# 1.1        0        1      0
# 2          1        1      1
# 2.1        1        1      1
# 2.2        1        1      0

Weighted Frequency Table in R

In base R, we can make use of xtabs/prop.table. Based on the OP's code, the cumsum is calculated from the order of occurrence of unique valuess in 'INTERVIEW_DAY'. So, to avoid the sorting based on the integer value, convert to factor with levels specified, get the sum of 'WEIGHT' by 'INTERVIEW_DAY' with xtabs, use prop.table to return the proportion, and then apply cumsum on that output

df$INTERVIEW_DAY <- factor(df$INTERVIEW_DAY, levels = unique(df$INTERVIEW_DAY))
tbl1 <- xtabs(WEIGHT ~ INTERVIEW_DAY, df)
Prop <- prop.table(tbl1)
Cum <- cumsum(100 * Prop / sum(Prop))
Cum
#        5         6         4         1         2         7         3 
# 15.71029  39.30705  72.86967  76.02470  88.68935  89.66260 100.00000 

out <- data.frame(INTERVIEW_DAY = names(tbl1), Freq = as.numeric(tbl1),
            Prop = as.numeric(Prop), Cum = as.numeric(Cum))
row.names(out) <- NULL
out
#  INTERVIEW_DAY       Freq        Prop       Cum
#1             5  8155462.7 0.157102906  15.71029
#2             6 12249456.5 0.235967631  39.30705
#3             4 17422888.0 0.335626124  72.86967
#4             1  1637826.3 0.031550297  76.02470
#5             2  6574426.8 0.126646592  88.68935
#6             7   505227.2 0.009732453  89.66260
#7             3  5366309.3 0.103373998 100.00000

If we need a weighted frequency, use count

library(dplyr)
df %>% 
  mutate(INTERVIEW_DAY = factor(INTERVIEW_DAY, levels = unique(INTERVIEW_DAY))) %>%
  count(INTERVIEW_DAY, wt = WEIGHT, sort = FALSE) %>% 
  mutate(Prop = n / sum(n),
         Cum = cumsum(100 * Prop/sum(Prop)))
# A tibble: 7 x 4
#  INTERVIEW_DAY         n    Prop   Cum
#  <fct>             <dbl>   <dbl> <dbl>
#1 5              8155463. 0.157    15.7
#2 6             12249456. 0.236    39.3
#3 4             17422888  0.336    72.9
#4 1              1637826. 0.0316   76.0
#5 2              6574427. 0.127    88.7
#6 7               505227. 0.00973  89.7
#7 3              5366309. 0.103   100.

Or with data.table

library(data.table)
setDT(df)[, .(Freq = sum(WEIGHT)), by = INTERVIEW_DAY
  ][, Prop := Freq / sum(Freq)][, Cum := cumsum(100 * Prop / sum(Prop))][]
#  INTERVIEW_DAY       Freq        Prop       Cum
#1:             5  8155462.7 0.157102906  15.71029
#2:             6 12249456.5 0.235967631  39.30705
#3:             4 17422888.0 0.335626124  72.86967
#4:             1  1637826.3 0.031550297  76.02470
#5:             2  6574426.8 0.126646592  88.68935
#6:             7   505227.2 0.009732453  89.66260
#7:             3  5366309.3 0.103373998 100.00000

data

df <- structure(list(TUCASEID = c(2.00301e+13, 2.00301e+13, 2.00301e+13, 
2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 
2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 2.00301e+13, 
2.00301e+13, 2.00301e+13), INTERVIEW_DAY = c(5L, 6L, 6L, 4L, 
4L, 4L, 1L, 2L, 6L, 4L, 6L, 7L, 6L, 3L, 6L), WEIGHT = c(8155462.7, 
1735322.5, 3830527.5, 6622023, 3068387.3, 3455424.9, 1637826.3, 
6574426.8, 1528296.3, 4277052.8, 1961482.3, 505227.2, 2135476.8, 
5366309.3, 1058351.1)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15"))

R converting from short form to long form with counts in the short form

Update

You can try the following:

DT[, rep(names(.SD), .SD), by = ID]
#    ID V1
# 1:  1  A
# 2:  1  A
# 3:  1  C
# 4:  2  B
# 5:  3  B
# 6:  3  C
# 7:  3  C
# 8:  4  A

Keeps the order you want too...

You can try the following. I've never used expandRows on what would become ~ 300 million rows, but it's basically rep, so it shouldn't be slow.

This uses melt + expandRows from my "splitstackshape" package. It works with data.frames or data.tables, so you might as well use data.table for the faster melting....

library(reshape2)
library(splitstackshape)
expandRows(melt(mydf, id.vars = "ID"), "value")
# The following rows have been dropped from the input: 
# 
# 2, 3, 5, 8, 10, 12
# 
#      ID variable
# 1     1        A
# 1.1   1        A
# 4     4        A
# 6     2        B
# 7     3        B
# 9     1        C
# 11    3        C
# 11.1  3        C

seq() function in R

I think you have some misconceptions going on here.

seq generates a sequence of a a priori known pattern. You mentioned one example with seq(from=1, to=10). Another version would be just to use multiples of two like

seq(from=2, to=10, by=2)

What you are doing is to write down your desired numbers hard-coded. Thus, you just could put them into a vector using c (which is probably the most basic R function I know of...)

c(2,4,6,8,10,12,10,9,7,3,1)

For further details, see ?seq or ?c.

Cut and Table function in R

For your question # 2 it depends on what you mean by "better".

Here is one option:

library(TeachingDemos)
x[ 30 %<% x %<=% 40 ]

Or, instead of using cut you can use findInterval:

y <- findInterval(x, c(30,40,50,60))
x[ y==1 ]

You could also look at the subset function.

If these don't match your definition of "better", then tell us more about what you want.

Print varible names in table() with 2 binary variables in R

You can use the dnn option :

table(df$tr,df$fall_term) # impossible to tell the difference

     0  1
  0 18 33
  1 15 34

table(df$tr,df$fall_term,dnn=c('tr','fall_term')) # you have the names
   fall_term
tr   0  1
  0 18 33
  1 15 34

Note that it's easier (and safer) to do table(df$tr,df$fall_term,dnn=colnames(df))

Creating a Table with Individual Trials from a Frequency Table in R (Inverse of Table Function)