Is there more efficient or concise way to use tidyr::gather to make my data look 'tidy'?
gather
has been retired in favor of pivot_longer
which makes such transformation simpler.
tidyr::pivot_longer(d, cols = -day,
names_to = c('sym', '.value'), names_sep = '_')
# A tibble: 20 x 4
# day sym x y
#* <int> <chr> <dbl> <dbl>
#1 1 a -0.560 -1.07
#2 1 b 1.22 0.426
#3 2 a -0.230 -0.218
#4 2 b 0.360 -0.295
#...
#...
Can this iteration be written in a tidy functional way
Update
Based on your updated question here is an updated version of my answer.
This time I just used your inputs as is and did not create a named function. Instead I put everything in one pipe. The column found
should indicate how many times a pattern was found, so you should not need different objects as not_unique
, matched_not_found
, matches_found
.
I picked up the idea from GenesRus (in the comments of your question) to create a list-column and unnnest it, but I did not take the approach further using spread/pivot-wider and instead chose map2 to loop over the description
and desc_map
columns.
library(tidyverse)
data %>%
mutate(pattern = list(data_map)) %>%
unnest %>%
rename(row_id = "id", map_id = "id1") %>%
mutate(v = map2_lgl(description, desc_map,
~ str_detect(.x, .y))) %>%
group_by(row_id) %>%
mutate(found = sum(v),
desc_map = ifelse(found == F, NA, desc_map),
map_id = ifelse(found == F, NA, map_id)) %>%
filter(v == T | (v == F & found == 0)) %>%
distinct %>%
select(-v)
Old answer
Below is a more tidyverse-based approach which should yield the same result. 'Should' because I can only guess how your input data and expected result looks like. A few notes: (1) I choose normal character vectors as inputs. Row ids are generated on-the-fly. (2) I put your approach into a function called match_tbl
. (3) I used tidyverse functions in combination with the pipe-operator. This makes the whole approach easy to read and the appearance seems to be 'tidyverse-ish'. However, when you look into actual functions of tidyverse packages you will see that authors usually refrain from using the pipe operator inside functions, since it can easily throw errors. Use the RStudio debugger on a pipe operation and try to dig deeper into whats going on and you will see it is pretty messy. Therefore, if you want to make a real stable function out of it, drop the pipes and use intermediate variables instead.
Data and packages
library(tidyverse)
# some description data (not a dataframe but a normal char vector)
description <- c("This is a text description",
"Some words that won't match",
"Some random text goes here",
"and some more explanation here")
# patterns that we want to find (not a dataframe but a normal char vector)
pattern <- c("explanation","description", "text")
A function generating the desired output: a match table
# a function which replaces your nested for loop
match_tbl <- function(.string, .pattern) {
res <- imap(.pattern,
~ stringr::str_detect(.string, .x) %>%
tibble::enframe(name = "row_id") %>%
dplyr::mutate(map_id = .y) %>%
dplyr::filter(value == T) %>%
dplyr::select(-"value"))
string_tbl <- .string %>%
tibble::enframe(name = "id") %>%
dplyr::select("id")
dplyr::bind_rows(res) %>%
dplyr::right_join(string_tbl, by = c("row_id" = "id"))
}
Function call and output
match_tbl(description, pattern)
> row_id map_id
> <int> <int>
> 1 1 2
> 2 1 3
> 3 2 NA
> 4 3 3
> 5 4 1
transform data frame using tidyr
Try this:
library(tidyr)
haves %>% pivot_longer(cols = -actuals) %>% arrange(value) %>% select(value,actuals)
Output:
value actuals
1 1 99.1
2 2 99.2
3 3 99.1
4 4 99.2
5 5 99.1
6 6 99.2
Cleaning Data When Variables are Column Names
With dplyr
and tidyr
:
df %>%
# 1. Pivot the table
gather (g, m, -Timepoint) %>%
# 2. Get the final Group ID in mGroup
separate (g, c("Measure", "mGroup"), -2) %>%
# 3. Spread the actual Error and Measure in two columns
spread (Measure, m) %>%
# 4. Assign the correct names to final columns
select (Timepoint, Group = mGroup, Measure = Group, Error = Error_Group) %>%
# 5. Sort as requested
arrange (Group, Timepoint)
Sum subset of a variable for tidy data r
A factor can be recoded with forcats::fct_recode
but this isn't necessarily shorter.
library(dplyr)
library(forcats)
df %>%
mutate(food = fct_recode(food, fruit = 'apple', fruit = 'pear')) %>%
group_by(food) %>%
summarise(value = sum(value))
## A tibble: 3 x 2
# food value
# <fct> <dbl>
#1 fruit 7
#2 carbs 10
#3 protein 12
Edit.
I will post the code in this comment here, since comments are more often deleted than answers. The result is the same as above.
df %>%
group_by(food = fct_recode(food, fruit = 'apple', fruit = 'pear')) %>%
summarise(value = sum(value))
Gather multiple sets of columns
This approach seems pretty natural to me:
df %>%
gather(key, value, -id, -time) %>%
extract(key, c("question", "loop_number"), "(Q.\\..)\\.(.)") %>%
spread(question, value)
First gather all question columns, use extract()
to separate into question
and loop_number
, then spread()
question back into the columns.
#> id time loop_number Q3.2 Q3.3
#> 1 1 2009-01-01 1 0.142259203 -0.35842736
#> 2 1 2009-01-01 2 0.061034802 0.79354061
#> 3 1 2009-01-01 3 -0.525686204 -0.67456611
#> 4 2 2009-01-02 1 -1.044461185 -1.19662936
#> 5 2 2009-01-02 2 0.393808163 0.42384717
Related Topics
Can't Install Any R Packages on Linux Server
Error with Scale_X_Labels in Ggplot2
Extract Only Folder Name Right Before Filename from Full Path
Dynamic Number of Calls to a Chunk with Knitr
Ggplot: Subset a Layer Where Data Is Passed Using a Pipe
Aws Dynamodb Support for "R" Programming Language
R Package Conflict Between Gam and Mgcv
Clear R Environment of All Objetcs & Packages
Filter by Ranges Supplied by Two Vectors, Without a Join Operation
How to Change Color Scheme in Corrplot
Making Commandargs Comma Delimited or Parsing Spaces
All Paths in Directed Tree Graph from Root to Leaves in Igraph R
Using Sample() with Sample Space Size = 1
How to Position Annotate Text in The Blank Area of Facet Ggplot