Adding an Repeated Index for Factors in Data Frame

Adding an repeated index for factors in data frame

One way is:

unlist(lapply(split(x, x), seq_along))

where x is your factor as a vector.

R> x <- factor(rep(letters[1:3], times = c(5,5,4))) ## your data
R> data.frame(factor = x, index = unlist(lapply(split(x, x), seq_along), 
+             use.names = FALSE))
   factor index
1       a     1
2       a     2
3       a     3
4       a     4
5       a     5
6       b     1
7       b     2
8       b     3
9       b     4
10      b     5
11      c     1
12      c     2
13      c     3
14      c     4

Another way, on a similar theme is to use table() and seq_len():

unlist(sapply(table(x), seq_len), use.names = FALSE)

And another way is to use the run-length encoding via rle():

R> rle(as.character(x))$lengths
[1] 5 5 4

which we can plug into the sapply() code instead of the table() call:

R> unlist(sapply(rle(as.character(x))$lengths, seq_len), use.names = FALSE)
 [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4

indexing duplicated cases of a data frame in R

We can use ave to create a sequence column using 'id' and 'date' as grouping variables.

 df1$datnno <- with(df1, ave(seq_along(id), id, date, FUN=seq_along))

r - How to add row index to a data frame, based on combination of factors

This is probably going to look like cheating since I am passing a vector into a function which I then totally ignore except to get its length:

 df$Index <- ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=function(x) 1:length(x) )

The ave function returns a vector of the same length as its first argument but computed within categories defined by all of the factors between the first argument and the argument named FUN. (I often forget to put the "FUN=" in for my function and get a cryptic error message along the lines of unique() applies only to vectors, since it was trying to determine how many unique values an anonymous function possesses and it fails.

There's actually another even more compact way of expressing function(x) 1:length(x) using the seq_along function whch is probably safer since it would fail properly if passed a vector of length zero whereas the anonymous function form would fail improperly by returning 1:0 instead of numeric(0):

ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=seq_along )

Adding counts of a factor to a dataframe

using jmsigner's data you could do:

dt$count <- ave(dt$school, dt$school,  FUN = length)

insert multiple rows in to data frame based on index position

Here is one way using pd.factorize on the first index level to get kind of order for this level, once you concat both dataframes.

np.random.seed(1)
df3 = pd.concat([df1, df2])

df3 = (
    df3.set_index( # add two index level for sorting
         [list(range(len(df3))), # to have current order of rows
          pd.factorize(df3.index.get_level_values('first'))[0]], # to have order of first index 
         append=True) # to not replace original index
       .sort_index(level=[-1, -2]) # sort as wanted
       .droplevel([-2,-1]) # delete the extra index
)
print(df3)
                     0
first second          
bar   one     1.624345
      two    -0.611756
      one     0.319039
      two    -0.249370
      three   1.462108
baz   one    -0.528172
      two    -1.072969
foo   one     0.865408
      two    -2.301539
qux   one     1.744812
      two    -0.761207
      one    -2.060141
      two    -0.322417
      three  -0.384054
      four    1.133769

Note that you could do the same adding the two levels for sorting as columns and use sort_values.

Add an index (numeric ID) column to large data frame

You can add a sequence of numbers very easily with

data$ID <- seq.int(nrow(data))

If you are already using library(tidyverse), you can use

data <- tibble::rowid_to_column(data, "ID")

Repeat dataframe rows based on cumsum index

This is closer to what I was looking for:

df %>%
  mutate(str_split_content = str_split(content, " ")) %>%
  unnest()

Someone posted, then revised/removed a while ago.

The original str_split content was by punctuation, actually. So not exactly purely splitting by number of words.

Adding an Repeated Index for Factors in Data Frame