r - How to add row index to a data frame, based on combination of factors
This is probably going to look like cheating since I am passing a vector into a function which I then totally ignore except to get its length:
df$Index <- ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=function(x) 1:length(x) )
The ave
function returns a vector of the same length as its first argument but computed within categories defined by all of the factors between the first argument and the argument named FUN
. (I often forget to put the "FUN=" in for my function and get a cryptic error message along the lines of unique() applies only to vectors
, since it was trying to determine how many unique values an anonymous function possesses and it fails.
There's actually another even more compact way of expressing function(x) 1:length(x)
using the seq_along
function whch is probably safer since it would fail properly if passed a vector of length zero whereas the anonymous function form would fail improperly by returning 1:0
instead of numeric(0)
:
ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=seq_along )
r - Adding a row index based on a combination of multiple columns in a large dataframe
Use a data.table
:
library(data.table)
DT <- as.data.table(dat)
DT[, index := seq_len(.N), by = user_id]
timestamp user_id index
1: 2013-11-07 ff268cef0c29 1
2: 2013-11-02 12bb7af7a842 1
3: 2013-11-30 e45abb10ae0b 1
4: 2013-11-06 e45abb10ae0b 2
5: 2013-11-25 f266f8c9580e 1
R add index column to data frame based on row values
If you use data.table
, there is a "symbol" .GRP
which records this information ( a simple group counter)
library(data.table)
DT <- data.table(temp)
DT[, index := .GRP, by = list(Dim1, Dim2)]
DT
# Dim1 Dim2 Value index
# 1: A 100 10 1
# 2: A 100 2 1
# 3: A 100 9 1
# 4: A 100 4 1
# 5: A 200 6 2
# 6: A 200 1 2
# 7: B 100 8 3
# 8: B 200 7 4
R - Add row index to a data frame but handle ties with minimum rank
You would want to use the rank
function with ties.method="min"
within your ave
call:
df$Index <- ave(-df$fant.pts.passing, df$season, df$week,
FUN=function(x) rank(x, ties.method="min"))
df
# season week player.name fant.pts.passing Index
# 3 2014 1 Cam Newton 29 1
# 1 2014 1 Matt Ryan 28 2
# 4 2014 1 Matthew Stafford 28 2
# 2 2014 1 Peyton Manning 19 4
# 7 2014 2 Aaron Rodgers 29 1
# 6 2014 2 Andrew Luck 22 2
# 8 2014 2 Chad Henne 22 2
# 5 2014 2 Carson Palmer 18 4
Create subsets from a dataframe by a combination of factors
combn
accepts a function so you can perform t.test
for every combination in the function itself. With sapply
you can do this on every column in ls2
.
sapply(ls2, function(y) combn(c("a", "b", "c"), 2, function(x) {
data.x <- subset(df, T %in% x)
t.test(reformulate('T', y), data = data.x, var.equal = TRUE)[["p.value"]]
}))
# G H I
#[1,] 0.0155 0.1599 0.0434
#[2,] 0.0086 0.0383 0.0282
#[3,] 0.6681 0.0804 0.5531
Inserting data into a data frame based on the unique combination of two factors
Let's suppose you have the file names in a vector datafiles
such that files 1-4 are the data for all assays for samples 1-384, 5-8 for all assays for samples 385-768, and so on, and that you want to end up with a data frame that is 1536 rows by 162 columns.
library(reshape)
## read all files into a list of data frames:
alldata <- lapply(datafiles,read.table)
Split into four chunks:
splitdata <- split(alldata,rep(1:4,each=4))
A function to take a list of n
data sets, each containing m
assays from k
individuals (i.e. each one is k*m
rows by 4 columns: SampleID
, Well
, Assay
, Value
) and combine them into a single data set that is k
rows by n*m+2
columns long:
mergefun <- function(X) {
cdata <- lapply(X,
cast,
formula=SampleID+Well~Assay,
value="Value")
## produces data sets of the form
## SampleID Well V3 V4
## 1 SID1 A01 0 0
## 2 SID2 A02 1 2
## ...
Reduce(cdata,merge)
}
Now apply this to each of the chunks:
merged_data <- lapply(splitdata,mergefun)
Now combine the chunks:
final <- do.call(rbind,merged_data)
I'm not sure this will work, but it might. You should take the pieces apart and examine what they do separately if it doesn't work on the first try -- I may have screwed up somewhere.
R - find row indices where each combination of factors occurs
We can try data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'Dim1', and 'Dim2', get the row index (.I
) in a list
, which we can extract.
library(data.table)
res <- setDT(df1)[, list(Rows = list(.I)), by = .(Dim1, Dim2)]
res
# Dim1 Dim2 Rows
#1: A 100 1, 3, 4
#2: A 200 2, 5
#3: B 200 6, 7
#4: B 100 8
res$Rows
#[[1]]
#[1] 1 3 4
#[[2]]
#[1] 2 5
#[[3]]
#[1] 6 7
#[[4]]
#[1] 8
Create an Index of a combination of data.frame columns in R
The interaction
function will work well:
foo = structure(list(avg = c(0.246985988921473, 0.481522354272779, 0.575400762275067, 0.14651009243539, 0.489308880181752, 0.523678968337178), i_ID = c("H", "H", "C", "C", "H", "S"), j_ID = c("P", "P", "P", "P", "P", "P")), .Names = c("avg", "i_ID", "j_ID"), row.names = 7:12, class = "data.frame")
foo$idx <- as.integer(interaction(foo$i_ID, foo$j_ID))
> foo
avg i_ID j_ID idx
7 0.2469860 H P 2
8 0.4815224 H P 2
9 0.5754008 C P 1
10 0.1465101 C P 1
11 0.4893089 H P 2
12 0.5236790 S P 3
Ah, I didn't read carefully enough. There is probably a more elegant solution, but you can use outer
function and upper and lower triangles:
# lets assign some test values
x <- c('a', 'b', 'c')
foo$idx <- c('a b', 'b a', 'b c', 'c b', 'a a', 'b a')
mat <- outer(x, x, FUN = 'paste') # gives all possible combinations
uppr_ok <- mat[upper.tri(mat, diag=TRUE)]
mat_ok <- mat
mat_ok[lower.tri(mat)] <- mat[upper.tri(mat)]
Then you can match indexes found in mat
with those found in mat_ok
:
foo$idx <- mat_ok[match(foo$idx, mat)]
Add variable to group data by unique combinations of variables
We can use .GRP
from data.table
after grouping by 'Date', 'Location'
library(data.table)
setDT(df)[, Combo := .GRP, .(Date, Location)]
df
# Date Location Var1 Var2 Combo
#1: 2018 Ohio A 1 1
#2: 2018 Ohio B 2 1
#3: 2018 Arizona C 3 2
#4: 2018 Arizona D 4 2
#5: 2018 Nebraska E 5 3
#6: 2017 Nebraska F 6 4
#7: 2017 New Mexico G 7 5
#8: 2016 Idaho H 8 6
Or using rleid
setDT(df)[, Combo := rleid(Date, Location)]
Related Topics
How Calculate Growth Rate in Long Format Data Frame
Trycatch with Parlapply (Parallel Package) in R
Warning: Unable to Access Index for Repository Https://Www.Stats.Ox.Ac.Uk/Pub/Rwin/Src/Contrib:
Convert a Vector into a List, Each Element in the Vector as an Element in the List
Provide Shades Between Dates on X Axis
Create All Possible Combiations of 0,1, or 2 "1"S of a Binary Vector of Length N
How to Plot the Linear Regression in R
R Stacked Bar Graph Plotting Geom_Text
Merging Data Frames with Different Number of Rows and Different Columns
Plotting Survival Curves in R with Ggplot2
Import Multiple Text Files in R and Assign Them Names from a Predetermined List
Keyboard Shortcut for Inserting Roxygen #' Comment Start
Reshaping Several Variables Wide with Cast
How to Manipulate Null Elements in a Nested List
Getting Both Column Counts and Proportions in the Same Table in R
Delete Rows Based on Multiple Conditions with Dplyr
The Art of R Programming:Where Else Could I Find the Information