Replace Value with the Name of Its Respective Column

Replace value with the name of its respective column

The coding below enabled me to replace every "true" value (character) into its respective column name.

##Replace every "true" value with its respective column name
w <- which(df=="true",arr.ind=TRUE)
df[w] <- names(df)[w[,"col"]]

Replace column values with column name using dplyr's transmute_all

If you want to stick with a dplyr solution you almost already had it

library(dplyr)

df <- data_frame(a = c(NA, 1, NA, 1, 1), b = c(1, NA, 1, 1, NA))

df %>%
transmute_all(funs(ifelse(. == 1, deparse(substitute(.)), NA)))

#> # A tibble: 5 x 2
#> a b
#> <chr> <chr>
#> 1 <NA> b
#> 2 a <NA>
#> 3 <NA> b
#> 4 a b
#> 5 a <NA>

Replace value by column name for many columns using R and dplyr

An option is to use tidyr::gather and then summarise using dplyr :

library(dplyr)
library(tidyr)
df %>% gather(feelings, value, -id) %>% #Change to long format
filter(value) %>% #Filter for value which are TRUE
group_by(id) %>%
summarise(feelings= paste0(feelings,collapse=","))

# id feelings
# <chr> <chr>
# 1 a tired
# 2 b excited
# 3 c tired,lonely,excited

Replace specific values in pandas dataframe with the corresponding column name, based on a condition,

IIUC try adding the brackets and 'and', then mask out the yes and radd the column names:

new_df = (' and (' + df + ')').mask(df.eq('yes'), '').radd(df.columns)

new_df:

                     column1  column2
0 column1 NaN
1 column1 and (some_string) NaN
2 NaN column2

Breakdown of steps:

new_df = ' and (' + df + ')'
              column1     column2
0 and (yes) NaN
1 and (some_string) NaN
2 NaN and (yes)

mask:

new_df = new_df.mask(df.eq('yes'), '')
              column1 column2
0 NaN
1 and (some_string) NaN
2 NaN

radd:

new_df = new_df.radd(df.columns)
                     column1  column2
0 column1 NaN
1 column1 and (some_string) NaN
2 NaN column2

How to replace a value in a pandas dataframe with column name based on a condition?

One way could be to use replace and pass in a Series mapping column labels to values (those same labels in this case):

>>> dfz.loc[:, 'A':'D'].replace(1, pd.Series(dfz.columns, dfz.columns))
A B C D
0 A B C D
1 0 0 0 0
2 0 0 0 0
3 A B C D
4 0 0 3 0
5 0 B C 0

To make the change permanent, you'd assign the returned DataFrame back to dfz.loc[:, 'A':'D'].

Solutions aside, it's useful to keep in mind that you may lose a lot of performance benefits when you mix numeric and string types in columns, as pandas is forced to use the generic 'object' dtype to hold the values.

Changing cell values in data table with column names (R)?

Here's a tidyverse/purrr option:

map2_df(DT, names(DT), ~  replace(.x, .x==1, .y) %>% replace(. == 0, NA))

# A tibble: 5 x 4
names a b c
<chr> <chr> <chr> <chr>
1 n1 NA b c
2 n2 NA NA NA
3 n3 a NA NA
4 n4 a b c
5 n5 NA NA c

Replacing column values based on a corresponding column r

Maybe put your data in long form:

library(data.table)
setDT(df.wide)

dt.long = melt(df.wide, meas=patterns(IM = "^IM", LV = "^LV"))
dt.long[, variable := c("A","B","C")[variable]]

title variable IM LV
1: A A 0.5 0.7
2: B A 0.1 0.0
3: C A 4.6 2.5
4: D A 5.6 5.0
5: A B 0.2 1.0
6: B B 0.4 2.0
7: C B 2.6 4.5
8: D B 2.2 5.0
9: A C 2.0 3.0
10: B C 1.0 2.0
11: C C 3.0 5.0
12: D C 4.0 1.0

From here, it is easy to make the edit:

dt.long[IM < 2.5, LV := 0]

If you want to use tidyr: As far as I know, gather does not support creating two columns when converting to long form. The next generation of the function, pivot_longer might.


I would suggest continuing to work with the data in long format as long as possible to avoid further fiddling with variable names, but if you need to get back to wide format, there's...

res = dcast(dt.long, title ~ variable, value.var=c("IM", "LV"), sep=".")

title IM_A IM_B IM_C LV_A LV_B LV_C
1: A 0.5 0.2 2 0.0 0.0 0
2: B 0.1 0.4 1 0.0 0.0 0
3: C 4.6 2.6 3 2.5 4.5 5
4: D 5.6 2.2 4 5.0 0.0 1

Further steps are needed if you want the same column order:

setcolorder(res, names(df.wide))

title IM.A LV.A IM.B LV.B IM.C LV.C
1: A 0.5 0.0 0.2 0.0 2 0
2: B 0.1 0.0 0.4 0.0 1 0
3: C 4.6 2.5 2.6 4.5 3 5
4: D 5.6 5.0 2.2 0.0 4 1

Replace value with the average of it's column with Pandas

The first thing to recognize is the columns that have 'x' in them are not integer datatypes. They are object datatypes.

df = pd.read_csv('file.csv')

df

Col1 Col2
0 1 22
1 2 44
2 3 x
3 4 88
4 5 110
5 6 132
6 7 x
7 8 176
8 9 198
9 10 x

df.dtypes

Col1 int64
Col2 object
dtype: object

In order to get the mean of Col2, it needs to be converted to a numeric value.

df['Col2'] = pd.to_numeric(df['Col2'], errors='coerce').astype('Int64')

df.dtypes
Col1 int64
Col2 Int64
dtype: object

The df now looks like so:

df 

Col1 Col2
0 1 22
1 2 44
2 3 <NA>
3 4 88
4 5 110
5 6 132
6 7 <NA>
7 8 176
8 9 198
9 10 <NA>

Now we can use fillna() with df['Col2'].mean():

df['Col2'] = df['Col2'].fillna(df['Col2'].mean())

df
Col1 Col2
0 1 22
1 2 44
2 3 110
3 4 88
4 5 110
5 6 132
6 7 110
7 8 176
8 9 198
9 10 110

Replace column values according to corresponding values of other column in Pandas

Use mask for replace all not missing values with pop for extract column Data:

df = pd.DataFrame({
'A':[4,5] + [np.nan] * 4,
'B':[np.nan,np.nan,9,4,np.nan,np.nan],
'C':[np.nan] * 4 + [7,0],
'Data':list('aaabbb')
})

print (df)
A B C Data
0 4.0 NaN NaN a
1 5.0 NaN NaN a
2 NaN 9.0 NaN a
3 NaN 4.0 NaN b
4 NaN NaN 7.0 b
5 NaN NaN 0.0 b

df = df.mask(df.notnull(), df.pop('Data'), axis=0)
print (df)
A B C
0 a NaN NaN
1 a NaN NaN
2 NaN a NaN
3 NaN b NaN
4 NaN NaN b
5 NaN NaN b


Related Topics



Leave a reply



Submit