Replace Column Values with Column Name Using Dplyr's Transmute_All

Replace column values with column name using dplyr's transmute_all

If you want to stick with a dplyr solution you almost already had it

library(dplyr)

df <- data_frame(a = c(NA, 1, NA, 1, 1), b = c(1, NA, 1, 1, NA))

df %>%
transmute_all(funs(ifelse(. == 1, deparse(substitute(.)), NA)))

#> # A tibble: 5 x 2
#> a b
#> <chr> <chr>
#> 1 <NA> b
#> 2 a <NA>
#> 3 <NA> b
#> 4 a b
#> 5 a <NA>

Replace value by column name for many columns using R and dplyr

An option is to use tidyr::gather and then summarise using dplyr :

library(dplyr)
library(tidyr)
df %>% gather(feelings, value, -id) %>% #Change to long format
filter(value) %>% #Filter for value which are TRUE
group_by(id) %>%
summarise(feelings= paste0(feelings,collapse=","))

# id feelings
# <chr> <chr>
# 1 a tired
# 2 b excited
# 3 c tired,lonely,excited

Changing cell values in data table with column names (R)?

Here's a tidyverse/purrr option:

map2_df(DT, names(DT), ~  replace(.x, .x==1, .y) %>% replace(. == 0, NA))

# A tibble: 5 x 4
names a b c
<chr> <chr> <chr> <chr>
1 n1 NA b c
2 n2 NA NA NA
3 n3 a NA NA
4 n4 a b c
5 n5 NA NA c

dplyr mutate/transmute: drop only the columns used in the formula

If you're looking to combine the two operations, you can use NULL in mutate to specify which columns should be dropped:

df %>% mutate( X=D*E, D=NULL, E=NULL )

Unfortunately, you still have to mention each variable twice, so perhaps it's only marginally more concise.

UPDATE: So, I really like this question because it essentially requests a mutator that has some features of both mutate and transmute. Such a mutator will need to parse the provided expression(s) to identify which symbols are being used by the computation and then remove those symbols from the result.

To implement such a mutator, we will need some tools. First, let's define a function that retrieves an expression's abstract syntax tree (AST).

library( tidyverse )

## Recursively constructs the abstract syntax tree (AST) of the provided expression
getAST <- function( ee ) { as.list(ee) %>% map_if(is.call, getAST) }

Here's an example of getAST in action:

z <- quote( a*log10(x)+b )   ## Captures the expression a*log10(x)+b
getAST( z ) %>% str
# List of 3
# $ : symbol +
# $ :List of 3
# ..$ : symbol *
# ..$ : symbol a
# ..$ :List of 2
# .. ..$ : symbol log10
# .. ..$ : symbol x
# $ : symbol b

Retrieving the list of symbols used by an expression requires nothing more than flattening and deparsing this tree.

## Retrieves all symbols (as strings) used in a given expression
getSyms <- function( ee ) { getAST(ee) %>% unlist %>% map_chr(deparse) }
getSyms(z)
# [1] "+" "*" "a" "log10" "x" "b"

We are now ready to implement our new mutator that computes new columns (similar to mutate) and removes variables used in the computation (similar to transmute):

## A new mutator that removes all variables used by the computations
transmutate <- function( .data, ... )
{
## Capture the provided expressions and retrieve their symbols
vSyms <- enquos(...) %>% map( ~getSyms(get_expr(.x)) )

## Identify symbols that are in common with the provided dataset
## These columns are to be removed
vToRemove <- intersect( colnames(.data), unlist(vSyms) )

## Pass on the expressions to mutate to do the work
## Remove the identified columns from the result
mutate( .data, ... ) %>% select( -one_of(vToRemove) )
}

Let's take the new function out for a spin:

## Expected output should include new columns X, Y
## removed columns vs, drat, wt, mpg, and cyl
## and everything else the same
## (Note that in the classical tidyverse spirit, rownames are not preserved)
transmutate( mtcars, X = ifelse( vs, drat, wt ), Y = mpg*cyl )
# disp hp qsec am gear carb X Y
# 1 160.0 110 16.46 1 4 4 2.620 126.0
# 2 160.0 110 17.02 1 4 4 2.875 126.0
# 3 108.0 93 18.61 1 4 1 3.850 91.2
# 4 258.0 110 19.44 0 3 1 3.080 128.4
# ...

transmute over all columns : removing comma and every characters after comma

You can use mutate_all/transmute_all and remove everything after comma using sub.

library(dplyr)
network %>% mutate_all(~sub(",.*", "", .))

# from to
#1 UK Benin
#2 Nantes Widha
#3 London France
#4 America America
#5 La Martinique London

Or in base R with lapply.

df[] <- lapply(network, function(x) sub(",.*", "", x))

data

Reading data as characters by using stringsAsFactors = FALSE.

network <- data.frame(from, to, stringsAsFactors = FALSE)

Iterate over column names and separate fields recursively with dplyr

For the sake of completeness, here is also a solution which uses data.table().

There are some differences to the other answers posted so far:

  • It is not required to identify the columns to be split beforehand. Instead, columns without "->" are dropped from the result on the fly.
  • The regular expression which is used for splitting includes surrounding white space (if any)

    " *-> *". This avoids to call trimws() on the resulting pieces afterwards or to remove white space beforehand.

.

library(data.table)
library(magrittr) # piping used to improve readability
setDT(df)
lapply(names(df), function(x) {
mDT <- df[, tstrsplit(get(x), " *-> *")]
if (ncol(mDT) == 2L) setnames(mDT, paste0(x, c("_Old", "_New")))
}) %>% as.data.table()
    v1_Old v1_New v3_Old v3_New
1: Silva Mark James Jacy
2: Brandon Livo NA Jane
3: Mango Apple apple Orange

Transmuting multiple columns based on columns outside .vars

There are a few issues you need to consider here:

  • seq() is not vectorised over from and to, so it will require that StartDate and EndDate are length 1. You can achieve this by using rowwise()
  • The replacement columns will have type date, so you will not be able to simply include zeros, as these will be coerced to 1970-01-01 (try lubridate::as_date(0)). The best alternative is probably to use NA here.
  • transmute() will drop the columns being used, i.e. it will drop StartDate and EndDate. If you want to keep them you should use mutate() instead
  • The main issue in your example is that your .funs argument is not a function. From the documentation:

A function fun, a quosure style lambda ~ fun(.) or a list of either form.

  • Scoped verbs, i.e. functions ending with _at, _if etc are superseded in favour of across() as of {dplyr} 1.0.0.

Here is an example that takes the above into account:

df %>% 
rowwise() %>%
dplyr::mutate(
across(1:5, ~lubridate::as_date(ifelse(
. == 1,
sample(seq(StartDate, EndDate, by = "day"), 1),
NA
)))
) %>%
ungroup()
#> # A tibble: 5 x 7
#> A B C D E StartDate EndDate
#> <date> <date> <date> <date> <date> <date> <date>
#> 1 2019-08-31 NA NA 2019-09-30 2019-03-15 2018-03-21 2020-08-02
#> 2 2002-01-10 2001-11-25 NA NA 2003-07-17 1999-02-06 2004-09-15
#> 3 NA NA 2006-06-16 NA 2008-09-03 2004-01-19 2009-07-27
#> 4 NA NA 2015-05-21 NA NA 2000-03-18 2017-04-21
#> 5 1999-09-19 1999-08-30 NA 1999-11-04 NA 1998-05-20 2001-01-24

I give three arguments, the input df, the column I want to clean,the new column I want to be added with cleansed names. Where am I going wrong?

what you want to do is make sure that the output of the functions you're using is either a vector or a list with only one dimension so that you can add it as a new column in the desired data frame. You can verify the class of an object with the Class function which comes within the base package.

The mutate function by itself should do what you want, it returns the same data frame but with the new column:

     library(dplyr)
clean_name <- function(df, col_name, new_col_name) {

# first_cleaning_to_colname = The first change you want to make to the col_name column. This should be a vector.
# second_cleaning_to_colname = The change you're going to make to the col_name column after the first one. This should be a vector too.

first_change <- mutate(df, col_name = first_cleaning_to_colname)

second_change <- mutate(first_change, new_col_name = second_cleaning_to_colname)

return(second_change)
}

You can make both this changes at the same time but I thought this way it's easier to read.



Related Topics



Leave a reply



Submit