How to Combine Multiple Variable Data to a Single Variable Data

How to combine multiple variable data to a single variable data?

names <- c("red","white","water")
df2 <- setNames(data.frame(matrix(ncol = length(names), nrow = nrow(df))),names)

for(col in names){
df2[,col] <- rowSums(df[,grep(col,tolower(names(df)))])
}

here

grep(col,tolower(names(df)))

looks for all the column names that contain the strings like "red" in the names of your vector. You then just sum them in a new data.frame df2 defined with the good lengths

Combining multiple columns/variables into a single column

That is what dplyr::coalesce was made for:

library(dplyr)
df$v4 <- coalesce(!!!df)

#Also works:
df %>%
mutate(v4 = coalesce(v1, v2, v3))

output

   v1 v2 v3 v4
1 1 NA NA 1
2 3 NA NA 3
3 6 NA NA 6
4 NA 5 NA 5
5 NA 1 NA 1
6 NA 3 NA 3
7 NA NA 4 4
8 NA NA 2 2
9 NA NA 1 1
10 NA NA NA NA

How to merge multiple variables and create a new data set?

You'll have to join those four tables, not combine using c.

And the join type is left_join so that all batsman are included in the output. Those who didn't face any balls or hit any boundaries will have NA, but you can easily replace these with 0.

I've ignored the by since dplyr will assume you want c("id", "inning", "batsman"), the only 3 common columns in all four data sets.

batsman_aggregate <- left_join(batsman_score_in_a_match, balls_faced) %>%
left_join(fours_hit) %>%
left_join(sixes_hit) %>%
select(id, inning, batsman, total_batsman_runs, deliveries_played, fours_hit, sixes_hit) %>%
replace(is.na(.), 0)

# A tibble: 11,335 x 7
id inning batsman total_batsman_runs deliveries_played fours_hit sixes_hit
<int> <int> <fct> <int> <dbl> <dbl> <dbl>
1 1 1 DA Warner 14 8 2 1
2 1 1 S Dhawan 40 31 5 0
3 1 1 MC Henriques 52 37 3 2
4 1 1 Yuvraj Singh 62 27 7 3
5 1 1 DJ Hooda 16 12 0 1
6 1 1 BCJ Cutting 16 6 0 2
7 1 2 CH Gayle 32 21 2 3
8 1 2 Mandeep Singh 24 16 5 0
9 1 2 TM Head 30 22 3 0
10 1 2 KM Jadhav 31 16 4 1
# ... with 11,325 more rows

There are also 2 batsmen who didn't face any delivery:

batsman_aggregate %>% filter(deliveries_played==0)
# A tibble: 2 x 7
id inning batsman total_batsman_runs deliveries_played fours_hit sixes_hit
<int> <int> <fct> <int> <dbl> <dbl> <dbl>
1 482 2 MK Pandey 0 0 0 0
2 7907 1 MJ McClenaghan 2 0 0 0

One of which apparently scored 2 runs! So I think the batsman_runs column has some errors. The game is here and clearly says that on the second last delivery of the first innings, 2 wides were scored, not runs to the batsman.

How can I combine several columns into one variable, tacking each onto the end of the other and grouping by values in an ID variable?

Try to set the inputs of the function pivot_longer()correctly as cols and values_to. cols=... defines the columns which you are taking the values from. values_to = ... defines the new name of the column where you are writing the values you took from 'cols'. Actually I think you were doing good, just pivot_longer returns always the names of the columns which values you are taking from, unless you try other trickier things.

library(tidyverse)

df = data.frame(
a = c("string1","string2"),
b= c("string11","string12"),
c = c("string21", "string22"),
ID = c("1111","2222")
)

df %>%
pivot_longer(cols = names(df)[1:3],
values_to = "newvar") %>%
select(newvar, ID)

Output:

# A tibble: 6 x 2
newvar ID
<chr> <chr>
1 string1 1111
2 string11 1111
3 string21 1111
4 string2 2222
5 string12 2222
6 string22 2222

How to combine info from multiple variables into one in SPSS?

The example is easily solved like this:

if Russian="Yes" Nationality=1.
if German="Yes" Nationality=2.
if Chinese="Yes" Nationality=3.
exe.

If the problem is more complex than that please edit your post and explain what other problems need to be solved, and I can add to the solution accordingly.

Combining Multiple Columns to Create a Single Variable

One way to do this is to pivot your column names as a column, group values by respondent, then drop the NA values. Then just choose the ethnicity value that remains for each group, switching to "multiple" when necessary. Here's a way to do that with tidyverse:

library(tidyverse)

df %>%
rownames_to_column("respondent") %>%
pivot_longer(-respondent) %>%
group_by(respondent) %>%
filter(!is.na(value)) %>%
summarise(eth = ifelse(n() == 1, name, "multiple"))

# A tibble: 4 x 2
respondent eth
<chr> <chr>
1 1 Black
2 2 White
3 3 Hispanic
4 4 multiple

You won't be able to store numbers, as numeric types, with a string like "variable" - so you have a choice. Either stick with the ethnicity labels (like the solution above), or convert labels to numbers and then numbers to the string representations of those numbers. That seems a little unwieldy, but if you want to do that, here's how:

df %>% 
rownames_to_column("respondent") %>%
pivot_longer(-respondent) %>%
mutate(eth_num = as.character(as.numeric(fct_inorder(name)))) %>%
group_by(respondent) %>%
filter(!is.na(value)) %>%
summarise(eth = ifelse(n() == 1, eth_num, "multiple"))

# A tibble: 4 x 2
respondent eth
<chr> <chr>
1 1 1
2 2 2
3 3 4
4 4 multiple


Related Topics



Leave a reply



Submit