Is There More Efficient or Concise Way to Use Tidyr::Gather to Make My Data Look 'Tidy'

Is there more efficient or concise way to use tidyr::gather to make my data look 'tidy'?

gather has been retired in favor of pivot_longer which makes such transformation simpler.

tidyr::pivot_longer(d, cols = -day, 
names_to = c('sym', '.value'), names_sep = '_')

# A tibble: 20 x 4
# day sym x y
#* <int> <chr> <dbl> <dbl>
#1 1 a -0.560 -1.07
#2 1 b 1.22 0.426
#3 2 a -0.230 -0.218
#4 2 b 0.360 -0.295
#...
#...

Can this iteration be written in a tidy functional way

Update

Based on your updated question here is an updated version of my answer.

This time I just used your inputs as is and did not create a named function. Instead I put everything in one pipe. The column found should indicate how many times a pattern was found, so you should not need different objects as not_unique, matched_not_found, matches_found.

I picked up the idea from GenesRus (in the comments of your question) to create a list-column and unnnest it, but I did not take the approach further using spread/pivot-wider and instead chose map2 to loop over the description and desc_map columns.

library(tidyverse)

data %>%
mutate(pattern = list(data_map)) %>%
unnest %>%
rename(row_id = "id", map_id = "id1") %>%
mutate(v = map2_lgl(description, desc_map,
~ str_detect(.x, .y))) %>%
group_by(row_id) %>%
mutate(found = sum(v),
desc_map = ifelse(found == F, NA, desc_map),
map_id = ifelse(found == F, NA, map_id)) %>%
filter(v == T | (v == F & found == 0)) %>%
distinct %>%
select(-v)

Old answer

Below is a more tidyverse-based approach which should yield the same result. 'Should' because I can only guess how your input data and expected result looks like. A few notes: (1) I choose normal character vectors as inputs. Row ids are generated on-the-fly. (2) I put your approach into a function called match_tbl. (3) I used tidyverse functions in combination with the pipe-operator. This makes the whole approach easy to read and the appearance seems to be 'tidyverse-ish'. However, when you look into actual functions of tidyverse packages you will see that authors usually refrain from using the pipe operator inside functions, since it can easily throw errors. Use the RStudio debugger on a pipe operation and try to dig deeper into whats going on and you will see it is pretty messy. Therefore, if you want to make a real stable function out of it, drop the pipes and use intermediate variables instead.

Data and packages

library(tidyverse)

# some description data (not a dataframe but a normal char vector)
description <- c("This is a text description",
"Some words that won't match",
"Some random text goes here",
"and some more explanation here")

# patterns that we want to find (not a dataframe but a normal char vector)
pattern <- c("explanation","description", "text")

A function generating the desired output: a match table

# a function which replaces your nested for loop
match_tbl <- function(.string, .pattern) {

res <- imap(.pattern,
~ stringr::str_detect(.string, .x) %>%
tibble::enframe(name = "row_id") %>%
dplyr::mutate(map_id = .y) %>%
dplyr::filter(value == T) %>%
dplyr::select(-"value"))

string_tbl <- .string %>%
tibble::enframe(name = "id") %>%
dplyr::select("id")

dplyr::bind_rows(res) %>%
dplyr::right_join(string_tbl, by = c("row_id" = "id"))

}

Function call and output

match_tbl(description, pattern)
> row_id map_id
> <int> <int>
> 1 1 2
> 2 1 3
> 3 2 NA
> 4 3 3
> 5 4 1

transform data frame using tidyr

Try this:

library(tidyr)

haves %>% pivot_longer(cols = -actuals) %>% arrange(value) %>% select(value,actuals)

Output:

  value actuals
1 1 99.1
2 2 99.2
3 3 99.1
4 4 99.2
5 5 99.1
6 6 99.2

Cleaning Data When Variables are Column Names

With dplyr and tidyr:

df %>%
# 1. Pivot the table
gather (g, m, -Timepoint) %>%
# 2. Get the final Group ID in mGroup
separate (g, c("Measure", "mGroup"), -2) %>%
# 3. Spread the actual Error and Measure in two columns
spread (Measure, m) %>%
# 4. Assign the correct names to final columns
select (Timepoint, Group = mGroup, Measure = Group, Error = Error_Group) %>%
# 5. Sort as requested
arrange (Group, Timepoint)

Sum subset of a variable for tidy data r

A factor can be recoded with forcats::fct_recode but this isn't necessarily shorter.

library(dplyr)
library(forcats)

df %>%
mutate(food = fct_recode(food, fruit = 'apple', fruit = 'pear')) %>%
group_by(food) %>%
summarise(value = sum(value))
## A tibble: 3 x 2
# food value
# <fct> <dbl>
#1 fruit 7
#2 carbs 10
#3 protein 12

Edit.

I will post the code in this comment here, since comments are more often deleted than answers. The result is the same as above.

df %>%
group_by(food = fct_recode(food, fruit = 'apple', fruit = 'pear')) %>%
summarise(value = sum(value))

Gather multiple sets of columns

This approach seems pretty natural to me:

df %>%
gather(key, value, -id, -time) %>%
extract(key, c("question", "loop_number"), "(Q.\\..)\\.(.)") %>%
spread(question, value)

First gather all question columns, use extract() to separate into question and loop_number, then spread() question back into the columns.

#>    id       time loop_number         Q3.2        Q3.3
#> 1 1 2009-01-01 1 0.142259203 -0.35842736
#> 2 1 2009-01-01 2 0.061034802 0.79354061
#> 3 1 2009-01-01 3 -0.525686204 -0.67456611
#> 4 2 2009-01-02 1 -1.044461185 -1.19662936
#> 5 2 2009-01-02 2 0.393808163 0.42384717


Related Topics



Leave a reply



Submit