Creating a New Column Based on Unique Id With Values in R

Creating a new column based on unique ID with values in r

Here is a base R version:

df = data_frame(ID = c(1124, 1123))
expand.grid(ID = df$ID, Age = 0:5)

## ID Age
## 1 1124 0
## 2 1123 0
## 3 1124 1
## 4 1123 1
## 5 1124 2
## 6 1123 2
## 7 1124 3
## 8 1123 3
## 9 1124 4
## 10 1123 4
## 11 1124 5
## 12 1123 5

This is sorted differently from the tidyr::expand result.

EDIT

As @thelatemail suggested, you can do the following to avoid renaming df

expand.grid(c(Age=list(0:5), df))

or

merge(df, list(Age=0:5))

EDIT 2

Here is a data.table example:

library(data.table)
setDT(df) # Convert df to a data.table.
df[, do.call(CJ, list(ID = ID, Age = 0:5))]

For large data sets, one might want to benchmark the various methods.

Group dataframe rows by creating a unique ID column based on the amount of time passed between entries and variable values

Here's a dplyr approach that calculates the gap and rolling avg gap within each Name/Item group, then flags large gaps, and assigns a new group for each large gap or change in Name or Item.

df1 %>%
group_by(Name,Item) %>%
mutate(purch_num = row_number(),
time_since_first = Date - first(Date),
gap = Date - lag(Date, default = as.Date(-Inf)),
avg_gap = time_since_first / (purch_num-1),
new_grp_flag = gap > 180 | gap > 3*avg_gap) %>%
ungroup() %>%
mutate(group = cumsum(new_grp_flag))

Add unique ID column based on values in two other columns (lat, long)

We could use match

transform(d, Cluster_ID = match(paste0(LAT, LONG), unique(paste0(LAT, LONG))))

Or convert the 'LAT', 'LONG' to sequence and then do the interaction

transform(d, Cluster_ID = as.integer(interaction(match(LAT, 
unique(LAT)), match(LONG, unique(LONG)), drop=TRUE, lex.order = FALSE)))

Create a new column with unique identifier for each group

Try with groupby ngroup + 1, use sort=False to ensure groups are enumerated in the order they appear in the DataFrame:

df['idx'] = df.groupby(['ID', 'phase'], sort=False).ngroup() + 1

df:

   ID phase side  values  idx
0 r1 ph1 l 12 1
1 r1 ph1 r 34 1
2 r1 ph2 l 93 2
3 s4 ph3 l 21 3
4 s3 ph2 l 88 4
5 s3 ph2 r 54 4

Creating a new data frame in R based on unique values and time stamp

You can do:

df <- data.frame(ID = c(234, 546, 678, 546, 234),
PRIORITY = c("Reading", "Writing", "Communication", "Communication", "Writing"),
TIME = c("10/29", "10/30", "10/29", "11/1", "11/1"))

library(tidyverse)

df %>%
group_by(ID) %>%
mutate(ID_count = 1:n()) %>%
ungroup() %>%
pivot_wider(id_cols = ID,
values_from = c(PRIORITY, TIME),
names_from = ID_count)

which gives:

# A tibble: 3 x 5
ID PRIORITY_1 PRIORITY_2 TIME_1 TIME_2
<dbl> <chr> <chr> <chr> <chr>
1 234 Reading Writing 10/29 11/1
2 546 Writing Communication 10/30 11/1
3 678 Communication <NA> 10/29 <NA>

How to create a new column based on flag on a different column R

we can use dplyr package

df |> group_by(id) |> 
mutate(base_value = result[which(flag == "Y")] ,
percentage_change = (result - base_value)/base_value * 100) |>
ungroup()
  • output
# A tibble: 8 × 5
id result flag base_value percentage_change
<dbl> <dbl> <chr> <dbl> <dbl>
1 1 12 "" 13 -7.69
2 1 33 "" 13 153.84
3 1 13 "Y" 13 0
4 1 44 "" 13 238.46
5 2 23 "Y" 23 0
6 2 44 "" 23 91.3
7 2 52 "" 23 126.08
8 2 11 "" 23 -52.17


Related Topics



Leave a reply



Submit