Update Subset of Values in a Dataframe Column

Efficient way to update column value for subset of rows on Pandas DataFrame?

This may be what you require:

 df.loc[df.name.str.len() == 4, 'value'] *= 1000

df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)

pandas python Update subset of column A based on subset of one or more other columns

You can do this an easier way by using pandas .loc

Initialize dataframe:

df = pd.DataFrame({'group':['e','e','e','h','h','h'],
'feature':['fail', 'exit', 'job', 'exit', 'fail', 'job'],
'cats':[1, 1, 1, 5, 2, 2],
'jobs':[1, 1, 1, 64, 64, 64],
'rank':[-1, -1, -1, -1, -1, -1],
'topvalue':[100, 0, 4, 37, 0, 3.9],
'freq':[1, 1, 1, 58, 63, 61]
})

We want to rank jobs feature so we just isolate the rank locations using .loc, and then on the right side of the assignment, we isolate the jobs column using .loc and use the .rank() function

Rank job feature, by jobs value:

df.loc[df.feature == 'job', 'rank'] = df.loc[df.feature == 'job', 'jobs'].rank(ascending=False)

Rank failure feature by frequency where top value is not 0:

For this one you do rank the ones that are 0 which seems to go against what you said. So we'll do this two ways.

This way we filter out the 0s to start, and rank everything else. This will have the top_value == 0 ranks stay as -1

df.loc[(df.feature == 'fail') & (df.topvalue != 0), 'rank'] = (
df.loc[(df.feature == 'fail') & (df.topvalue != 0), 'freq']).rank(ascending=True)

This way we don't filter out the 0s.

df.loc[(df.feature == 'fail') & (df.topvalue != 0), 'rank'] = (
df.loc[(df.feature == 'fail') & (df.topvalue != 0), 'freq']).rank(ascending=True)

How to update a subset of Pandas DataFrame rows with new (different) values?

If I understand your problem correctly then you want to change the values in column C based on values in column A and the actual value assigned to C is looked up in a dictionary but still you want to leave those rows untouched where a value in A is not present in the dictionary mapping.

Dictionary m is used for mapping values from column A to the target value:

df = pandas.DataFrame({'A': [1,2,3,4,5,6,7,8,9], 'C': [0,0,0,0,0,0,0,0,0]})
m = {1:1,3:1,6:1,8:1}

Then you need to select all rows in A that match the keys of the dictionary using select. Then you map the values of column A using m and assign the result to the filtered values of column C. The other values remain like before.

select = df['A'].isin(m.keys())
df.loc[select, 'C'] = df.loc[select, 'A'].map(m)

Modifying a subset of rows in a pandas dataframe

Use .loc for label based indexing:

df.loc[df.A==0, 'B'] = np.nan

The df.A==0 expression creates a boolean series that indexes the rows, 'B' selects the column. You can also use this to transform a subset of a column, e.g.:

df.loc[df.A==0, 'B'] = df.loc[df.A==0, 'B'] / 2

I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.

Update subset of values in a dataframe column

Could use ifelse here. Assuming data frame is named df1:

df1$x <- ifelse(df1$y %in% c("a", "b"), df1$x - 1, df1$x)

How to update column value of a data frame from another data frame matching 2 columns?

Here's a way to do it:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')

Explanation:

  • modify df2 so it has Team ID, Group as its index and its only column is Result
  • use join to bring the new scores from df2 into a Result column in df1
  • use loc to update Score values for rows where Result is not null (i.e., rows for which an updated Score is available)
  • drop the Result column.

Full test code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'DEP ID':['001','001','002','002'],
'Team ID':['002','004','002','007'],
'Group':['A','A','A','A'],
'Score':[50,70,50,90]})
df2 = pd.DataFrame({
'DEP ID':['001','001','001'],
'Team ID':['002','003','004'],
'Group':['A','A','A'],
'Result':[80,60,70]})

print(df1)
print(df2)

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'])
df1.loc[df1.Result.notna(), 'Score'] = df1.Result
df1 = df1.drop(columns='Result')
print(df1)

Output:

   index DEP ID Team ID Group  Score
0 0 001 002 A 80
1 1 001 004 A 70
2 2 002 002 A 80
3 3 002 007 A 90

UPDATE:

If Result column in df2 is instead named Score, as asked by OP in a comment, then the code can be adjusted slightly as follows:

df1 = df1.join(df2.drop(columns='DEP ID').set_index(['Team ID', 'Group']), on=['Team ID', 'Group'], rsuffix='_NEW')
df1.loc[df1.Score_NEW.notna(), 'Score'] = df1.Score_NEW
df1 = df1.drop(columns='Score_NEW')

Update a subset of values in data.table column with values from another data.table column

You can use get to grab the i.name variable programmatically in the update join, and stay within standard data.table join operations. Example data and code:

library(data.table)
data <- data.table(snp.gene.key=1:5, dval = letters[1:5])
all_tmp <- data.table(snp.gene.key=1:3, dval=letters[11:13])
setkey(data, snp.gene.key)
setkey(all_tmp, snp.gene.key)

data
# snp.gene.key dval
#1: 1 a
#2: 2 b
#3: 3 c
#4: 4 d
#5: 5 e

Then specify (name) on the RHS of the := assignment so it is interpreted rather than treated literally, along with using get on the LHS to grab the variable you want for the update join.

name <- "dval"
data[all_tmp, (name) := get(paste0("i.", name)) ]

data
# snp.gene.key dval
#1: 1 k
#2: 2 l
#3: 3 m
#4: 4 d
#5: 5 e

Updating filtered data frame in pandas

EDIT: If need replace only missing values by another DataFrame use DataFrame.fillna or DataFrame.combine_first:

df = df_1.fillna(df_2)
#alternative
#df = df_1.combine_first(df_2)

print (df)
Name Surname
index
R222 Katrin Johnes
R343 John Doe
R377 Steven Walkins
R914 Marie Sklodowska-Curie

It not working, because update subset of DataFrame inplace, possible ugly solution is update filtered DataFrame df and add not matched original rows:

m = (df_1["Name"].notna()) & (df_1["Surname"].notna())
df = df_1[m].copy()

df.update(df_2)

df = pd.concat([df, df_1[~m]]).sort_index()
print (df)
Name Surname
index
R222 Pablo Picasso
R343 Jarque Berry
R377 Christofer Bishop
R914 NaN NaN

Possible solution without update:

m = (df_1["Name"].notna()) & (df_1["Surname"].notna())

df_1[m] = df_2
print (df_1)
Name Surname
index
R222 Pablo Picasso
R343 Jarque Berry
R377 Christofer Bishop
R914 NaN NaN

updating column values in pandas based on condition

There is logic problem:

reviews = pd.DataFrame({'Score':range(6)})
print (reviews)
Score
0 0
1 1
2 2
3 3
4 4
5 5

If set all values higher like 3 to 1 it working like need:

reviews.loc[reviews['Score'] > 3, 'Score'] = 1
print (reviews)
Score
0 0
1 1
2 2
3 3
4 1
5 1

Then all vallues without 3 are set to 0, so also are replaced 1 from reviews['Score'] > 3:

reviews.loc[reviews['Score'] <= 2, 'Score'] = 0
print (reviews)
Score
0 0
1 0
2 0
3 3
4 0
5 0

Last are removed 3 rows and get only 0 values:

reviews.drop(reviews[reviews['Score'] == 3].index, inplace = True)
print (reviews)
Score
0 0
1 0
2 0
4 0
5 0

You can change solution:

reviews = pd.DataFrame({'Score':range(6)})
print (reviews)
Score
0 0
1 1
2 2
3 3
4 4
5 5

First removed 3 by filter all rows not equal to 3 in boolean indexing:

reviews = reviews[reviews['Score'] != 3].copy()

And then are set values to 0 and 1:

reviews['Score'] = (reviews['Score'] > 3).astype(int)
#alternative
reviews['Score'] = np.where(reviews['Score'] > 3, 1, 0)
print (reviews)
Score
0 0
1 0
2 0
4 1
5 1

EDIT1:

Your solution should be changed with swap lines - first set 0 and then 1 for avoid overwrite values:

reviews.loc[reviews['Score'] <= 2, 'Score'] = 0
reviews.loc[reviews['Score'] > 3, 'Score'] = 1

reviews.drop(reviews[reviews['Score'] == 3].index, inplace = True)
print (reviews)
Score
0 0
1 0
2 0
4 1
5 1


Related Topics



Leave a reply



Submit