Replace column values with column name using dplyr's transmute_all
If you want to stick with a dplyr
solution you almost already had it
library(dplyr)
df <- data_frame(a = c(NA, 1, NA, 1, 1), b = c(1, NA, 1, 1, NA))
df %>%
transmute_all(funs(ifelse(. == 1, deparse(substitute(.)), NA)))
#> # A tibble: 5 x 2
#> a b
#> <chr> <chr>
#> 1 <NA> b
#> 2 a <NA>
#> 3 <NA> b
#> 4 a b
#> 5 a <NA>
Replace value by column name for many columns using R and dplyr
An option is to use tidyr::gather
and then summarise using dplyr
:
library(dplyr)
library(tidyr)
df %>% gather(feelings, value, -id) %>% #Change to long format
filter(value) %>% #Filter for value which are TRUE
group_by(id) %>%
summarise(feelings= paste0(feelings,collapse=","))
# id feelings
# <chr> <chr>
# 1 a tired
# 2 b excited
# 3 c tired,lonely,excited
Changing cell values in data table with column names (R)?
Here's a tidyverse/purrr
option:
map2_df(DT, names(DT), ~ replace(.x, .x==1, .y) %>% replace(. == 0, NA))
# A tibble: 5 x 4
names a b c
<chr> <chr> <chr> <chr>
1 n1 NA b c
2 n2 NA NA NA
3 n3 a NA NA
4 n4 a b c
5 n5 NA NA c
dplyr mutate/transmute: drop only the columns used in the formula
If you're looking to combine the two operations, you can use NULL
in mutate
to specify which columns should be dropped:
df %>% mutate( X=D*E, D=NULL, E=NULL )
Unfortunately, you still have to mention each variable twice, so perhaps it's only marginally more concise.
UPDATE: So, I really like this question because it essentially requests a mutator that has some features of both mutate
and transmute
. Such a mutator will need to parse the provided expression(s) to identify which symbols are being used by the computation and then remove those symbols from the result.
To implement such a mutator, we will need some tools. First, let's define a function that retrieves an expression's abstract syntax tree (AST).
library( tidyverse )
## Recursively constructs the abstract syntax tree (AST) of the provided expression
getAST <- function( ee ) { as.list(ee) %>% map_if(is.call, getAST) }
Here's an example of getAST
in action:
z <- quote( a*log10(x)+b ) ## Captures the expression a*log10(x)+b
getAST( z ) %>% str
# List of 3
# $ : symbol +
# $ :List of 3
# ..$ : symbol *
# ..$ : symbol a
# ..$ :List of 2
# .. ..$ : symbol log10
# .. ..$ : symbol x
# $ : symbol b
Retrieving the list of symbols used by an expression requires nothing more than flattening and deparsing this tree.
## Retrieves all symbols (as strings) used in a given expression
getSyms <- function( ee ) { getAST(ee) %>% unlist %>% map_chr(deparse) }
getSyms(z)
# [1] "+" "*" "a" "log10" "x" "b"
We are now ready to implement our new mutator that computes new columns (similar to mutate
) and removes variables used in the computation (similar to transmute
):
## A new mutator that removes all variables used by the computations
transmutate <- function( .data, ... )
{
## Capture the provided expressions and retrieve their symbols
vSyms <- enquos(...) %>% map( ~getSyms(get_expr(.x)) )
## Identify symbols that are in common with the provided dataset
## These columns are to be removed
vToRemove <- intersect( colnames(.data), unlist(vSyms) )
## Pass on the expressions to mutate to do the work
## Remove the identified columns from the result
mutate( .data, ... ) %>% select( -one_of(vToRemove) )
}
Let's take the new function out for a spin:
## Expected output should include new columns X, Y
## removed columns vs, drat, wt, mpg, and cyl
## and everything else the same
## (Note that in the classical tidyverse spirit, rownames are not preserved)
transmutate( mtcars, X = ifelse( vs, drat, wt ), Y = mpg*cyl )
# disp hp qsec am gear carb X Y
# 1 160.0 110 16.46 1 4 4 2.620 126.0
# 2 160.0 110 17.02 1 4 4 2.875 126.0
# 3 108.0 93 18.61 1 4 1 3.850 91.2
# 4 258.0 110 19.44 0 3 1 3.080 128.4
# ...
transmute over all columns : removing comma and every characters after comma
You can use mutate_all
/transmute_all
and remove everything after comma using sub
.
library(dplyr)
network %>% mutate_all(~sub(",.*", "", .))
# from to
#1 UK Benin
#2 Nantes Widha
#3 London France
#4 America America
#5 La Martinique London
Or in base R with lapply
.
df[] <- lapply(network, function(x) sub(",.*", "", x))
data
Reading data as characters by using stringsAsFactors = FALSE
.
network <- data.frame(from, to, stringsAsFactors = FALSE)
Iterate over column names and separate fields recursively with dplyr
For the sake of completeness, here is also a solution which uses data.table()
.
There are some differences to the other answers posted so far:
- It is not required to identify the columns to be split beforehand. Instead, columns without
"->"
are dropped from the result on the fly. - The regular expression which is used for splitting includes surrounding white space (if any)
" *-> *"
. This avoids to calltrimws()
on the resulting pieces afterwards or to remove white space beforehand.
.
library(data.table)
library(magrittr) # piping used to improve readability
setDT(df)
lapply(names(df), function(x) {
mDT <- df[, tstrsplit(get(x), " *-> *")]
if (ncol(mDT) == 2L) setnames(mDT, paste0(x, c("_Old", "_New")))
}) %>% as.data.table()
v1_Old v1_New v3_Old v3_New
1: Silva Mark James Jacy
2: Brandon Livo NA Jane
3: Mango Apple apple Orange
Transmuting multiple columns based on columns outside .vars
There are a few issues you need to consider here:
seq()
is not vectorised overfrom
andto
, so it will require thatStartDate
andEndDate
are length 1. You can achieve this by usingrowwise()
- The replacement columns will have type
date
, so you will not be able to simply include zeros, as these will be coerced to1970-01-01
(trylubridate::as_date(0)
). The best alternative is probably to useNA
here. transmute()
will drop the columns being used, i.e. it will dropStartDate
andEndDate
. If you want to keep them you should usemutate()
instead- The main issue in your example is that your
.funs
argument is not a function. From the documentation:
A function fun, a quosure style lambda ~ fun(.) or a list of either form.
- Scoped verbs, i.e. functions ending with
_at
,_if
etc are superseded in favour ofacross()
as of {dplyr} 1.0.0.
Here is an example that takes the above into account:
df %>%
rowwise() %>%
dplyr::mutate(
across(1:5, ~lubridate::as_date(ifelse(
. == 1,
sample(seq(StartDate, EndDate, by = "day"), 1),
NA
)))
) %>%
ungroup()
#> # A tibble: 5 x 7
#> A B C D E StartDate EndDate
#> <date> <date> <date> <date> <date> <date> <date>
#> 1 2019-08-31 NA NA 2019-09-30 2019-03-15 2018-03-21 2020-08-02
#> 2 2002-01-10 2001-11-25 NA NA 2003-07-17 1999-02-06 2004-09-15
#> 3 NA NA 2006-06-16 NA 2008-09-03 2004-01-19 2009-07-27
#> 4 NA NA 2015-05-21 NA NA 2000-03-18 2017-04-21
#> 5 1999-09-19 1999-08-30 NA 1999-11-04 NA 1998-05-20 2001-01-24
I give three arguments, the input df, the column I want to clean,the new column I want to be added with cleansed names. Where am I going wrong?
what you want to do is make sure that the output of the functions you're using is either a vector or a list with only one dimension so that you can add it as a new column in the desired data frame. You can verify the class of an object with the Class function which comes within the base package.
The mutate function by itself should do what you want, it returns the same data frame but with the new column:
library(dplyr)
clean_name <- function(df, col_name, new_col_name) {
# first_cleaning_to_colname = The first change you want to make to the col_name column. This should be a vector.
# second_cleaning_to_colname = The change you're going to make to the col_name column after the first one. This should be a vector too.
first_change <- mutate(df, col_name = first_cleaning_to_colname)
second_change <- mutate(first_change, new_col_name = second_cleaning_to_colname)
return(second_change)
}
You can make both this changes at the same time but I thought this way it's easier to read.
Related Topics
Visual Bug When Changing Robinson Projection's Central Meridian with Ggplot2
Adding a New Column to Matrix Error
R - Calculate Test Mse Given a Trained Model from a Training Set and a Test Set
Ggplot2 Force Y-Axis to Start at Origin and Float Y-Axis Upper Limit
Choose Specific Number with Probability
Using Ggplot2 with Columns That Have Spaces in Their Names
Extra Curly Braces When Using Xtable and Knitr, After Specifiying Size
How to Change Gender Factor into an Numerical Coding in R
R Dplyr Subset with Missing Columns
How to Drop Factor Levels While Scraping Data Off Us Census HTML Site
How to Get This Data Structure in R
R: How to Judge Date in the Same Week
Combining Rows Based on a Column