Change rows into columns in R with values yes/no (1/0)
This solution works for your example...
library(tidyverse)
library(sjmisc)
d <- tibble(
IDBILL = c(111, 111, 112, 113, 113),
IDPRODUCT = c("ABC123", "ABC124", "BCH134", "ABC123", "GDF345"),
)
d %>%
tidyr::spread(key = IDPRODUCT, value = IDPRODUCT) %>%
sjmisc::rec_if(is.character, rec = "NA=0;else=1", append = F) %>%
tibble::add_column(IDBILL = unique(d$IDBILL), .before = 1)
#> # A tibble: 3 x 5
#> IDBILL ABC123_r ABC124_r BCH134_r GDF345_r
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 111 1 1 0 0
#> 2 112 0 0 1 0
#> 3 113 1 0 0 1
Step-by-step:
1) Spread the variable accross columns, but key
and value
are identical in this case, because of duplicated values: tidyr::spread(key = IDPRODUCT, value = IDPRODUCT)
would throw this error: Error: Duplicate identifiers for rows (1, 2), (4, 5)
d %>%
tidyr::spread(key = IDPRODUCT, value = IDPRODUCT)
#> # A tibble: 3 x 5
#> IDBILL ABC123 ABC124 BCH134 GDF345
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 111 ABC123 ABC124 <NA> <NA>
#> 2 112 <NA> <NA> BCH134 <NA>
#> 3 113 ABC123 <NA> <NA> GDF345
2) Recode NA
into 0, and all other values (else
) into 1 - but only for character vectors (else, column IDBILL
would be recoded as well). Note that rec_if()
, just like e.g. select_if()
, drops all other columns where .predicate
does not apply.
d %>%
tidyr::spread(key = IDPRODUCT, value = IDPRODUCT) %>%
sjmisc::rec_if(is.character, rec = "NA=0;else=1", append = F)
#> # A tibble: 3 x 4
#> ABC123_r ABC124_r BCH134_r GDF345_r
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 0
#> 2 0 0 1 0
#> 3 1 0 0 1
3) ... so we need to add back the ID column:
d %>%
tidyr::spread(key = IDPRODUCT, value = IDPRODUCT) %>%
sjmisc::rec_if(is.character, rec = "NA=0;else=1", append = F) %>%
tibble::add_column(IDBILL = unique(d$IDBILL), .before = 1)
#> # A tibble: 3 x 5
#> IDBILL ABC123_r ABC124_r BCH134_r GDF345_r
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 111 1 1 0 0
#> 2 112 0 0 1 0
#> 3 113 1 0 0 1
Need to change data.table columns' value from Yes ,No to 1,0
You could do this with the set
functionality in data.table:
1: Create a vector of columnnames in which you want to change the Yes
to 1
and the No
to 0
(like @Frank said in the comments)
cols <- grep("^HasProduct", names(DT), value = TRUE)
2: Change the values with the following for(...) set(...)
implementation (as rightfully pointed out by @Arun in the comments, you can also use as.integer
instead of just +
):
for (col in cols) set(DT, j = col, value = +(DT[[col]] == "Yes"))
this results in:
> DT
x HasProduct1 HasProduct2 HasProduct3 HasProduct4 HasProduct5 HasProduct6 HasProduct7 HasProduct8 HasProduct9 HasProduct10
1: 23 0 1 0 1 0 0 1 0 0 0
2: 74 1 0 1 1 0 1 1 1 1 1
3: 35 1 1 0 0 0 1 1 1 0 1
4: 7 1 1 1 1 0 1 1 0 0 1
5: 92 0 1 1 1 1 1 0 1 1 0
---
9996: 56 0 0 1 0 1 0 0 0 1 0
9997: 59 1 0 1 1 0 1 1 1 1 0
9998: 85 0 1 0 1 1 1 1 1 1 1
9999: 93 1 0 0 0 0 0 0 0 1 1
10000: 29 0 1 1 0 0 1 0 1 1 1
Timings:
user system elapsed
0.007 0.000 0.007
Used data:
set.seed(654)
product <- c("HasProduct1","HasProduct2","HasProduct3","HasProduct4","HasProduct5","HasProduct6","HasProduct7","HasProduct8","HasProduct9","HasProduct10")
DT <- as.data.table(data.frame(x=sample(1:100),sapply(product,function(x){x <-sample(c("Yes","No"),10000,replace = T)})))
Convert row values to column values with 1 for presence and 0 for absence in R dataframe
You can use dcast
from data.table
here:
dcast(dt, ID ~ CODE, fun.aggregate = function(x) 1L, fill = 0L)
ID F17 T65 Z49
1: 1245 0 0 1
2: 5567 1 0 1
3: 6623 0 1 1
data
library(data.table)
dt <- fread("ID Date CODE
5567 2012-09-02 F17
5567 2012-10-15 F17
5567 2012-11-11 Z49
1245 2010-09-02 Z49
6623 2012-10-15 T65
6623 2012-11-11 Z49")
Converting rows into columns based on the values in the rows, in R
You may try
library(tidyverse)
df %>%
rowwise %>%
mutate(reservation_main = str_split(reservation,'_' ,simplify = T)[1],
reservation_no = paste0('_',str_split(reservation,'_' ,simplify = T)[2])) %>%
select(id, response_id, reservation_main, reservation_no) %>%
pivot_wider(names_from = reservation_no, values_from = response_id)
id reservation_main `_1` `_2` `_3`
<dbl> <chr> <dbl> <dbl> <dbl>
1 31100 A 1 1 0
2 31100 B 1 1 0
3 31100 C 1 0 0
How to translate values in a column to yes and no values for a multiple regression in R
I would create a new column - see two options below.
(NB in lm()
you don't have to specify SB_xlsx13$
each time you add a covariate if you list it as the data =
argument once! This will make your output easier to read.)
Tidyverse approach: mutate
and case_when
:
library(dplyr)
SB_xlsx13 <- SB_xlsx13 %>%
mutate(dnr_d3 = case_when(dnrday <= 3 ~ "yes",
dnrday > 3 ~ "no",
TRUE ~ NA_character_))
MLR_3 <- lm(hospdead ~ dzclass + age + sex + num.co + sps + dnr_d3,
data = SB_xlsx13)
Base R approach:
SB_xlsx13$dnr_d3[SB_xlsx13$dnrday <= 3] <- "yes"
SB_xlsx13$dnr_d3[SB_xlsx13$dnrday > 3] <- "no"
MLR_4 <- lm(hospdead ~ dzclass + age + sex + num.co + sps + dnr_d3,
data = SB_xlsx13)
Related Topics
Creating a New Column Based on Unique Id With Values in R
Concatenate String Columns and Order in Alphabetical Order
Conditional Replacement of a Comma With a Dot in a Numeric Column
Overlap Join With Start and End Positions
Difference Between Require() and Library()
Add Column Which Contains Binned Values of a Numeric Column
Cluster Analysis in R: Determine the Optimal Number of Clusters
Why Does Data.Table Update Names(Dt) by Reference, Even If I Assign to Another Variable
What Exactly Is Copy-On-Modify Semantics in R, and Where Is the Canonical Source
Lm' Summary Not Display All Factor Levels
Remove Last N Rows in Data Frame With the Arbitrary Number of Rows
How to Remove the Negative Values from a Data Frame in R
Linear Regression and Group by in R