Replace Column if equal to a specific value
You can use the following awk
:
awk -F, '{ $4 = ($4 == "N/A" ? -1 : $4) } 1' OFS=, test.csv
- We set the input and output field separators to
,
to preserve the delimiters in your csv file - We check the forth field if it is equal to "N/A" then we assign it the value
-1
if not we retain the value as is. 1
at the end prints your line with or without modified 4th column depending if our test was successful or not.($4=="N/A"?-1:$4)
is a ternary operator that checks if the condition$4=="N/A"
is true or not. If true?
then we assign-1
and if false:
we keep the field as is.
Test run on sample file:
$ cat file
a,b,c,d,e,f
1,2,3,4,5,6
44,2,1,N/A,4,5
24,sdf,sdf,4,2,254,5
a,f,f,N/A,f,4
$ awk -F, '{ $4 = ($4 == "N/A" ? -1 : $4) } 1' OFS=, file
a,b,c,d,e,f
1,2,3,4,5,6
44,2,1,-1,4,5
24,sdf,sdf,4,2,254,5
a,f,f,-1,f,4
Replace column value based on value in other column
You can use loc
to specify where you want to replace, and pass the replaced series to the assignment:
df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','P')
Output:
ID Area Stage
0 1 P X
1 2 P X
2 3 P X
3 4 Q Y
Replace column values conditional on another column being equal to a vector
We can create the logical vector with %in%
, subset the 'value' and assign it to 0
df$value[df$digit %in% v] <- 0
df
# name value
#1 1 0
#2 2 1
#3 3 0
#4 1 0
Or another option is
df$value <- df$value *(!df$digit %in% v)
Or using dplyr
library(dplyr)
df %>%
mutate(value = replace(value, digit %in% v, 0))
Or with data.table
library(data.table)
setDT(df)[name %chin% v, value := 0]
conditional based find and replace values in fields of csv file
This awk
should do:
cat file
2159,23,45,45,13.512034,78.226233
awk -F, -v OFS="," '$1==2159 && $5==13.512034 {$5="13.49694"} $1==2159 && $6==78.226233 {$6="78.22772"} 1' file
2159,23,45,45,13.49694,78.22772
This $1 ~ /^2159/
does starts with, not equal to. $1=2159
or $1~/^2159$/
Replace value in column B using dictionary if column A is equal to specific value
You can first create replace values then use pandas.mask
and only set values for those rows that have pos==2 or pos==3
.
rep = df['result'].replace(replaceValues)
df['result'] = df['result'].mask(df['pos'].isin([2,3]), rep)
print(df)
pos result
0 1 AA
1 1 AB
2 1 BB
3 2 C
4 2 CA
5 2 AC
6 3 A
7 3 D
8 3 C
9 4 DD
10 4 AB
11 4 BA
Efficiently replace values from a column to another column Pandas DataFrame
Using np.where
is faster. Using a similar pattern as you used with replace
:
df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])
However, using a nested np.where
is slightly faster:
df['col1'] = np.where(df['col1'] == 0,
np.where(df['col2'] == 0, df['col3'], df['col2']),
df['col1'])
Timings
Using the following setup to produce a larger sample DataFrame and timing functions:
df = pd.concat([df]*10**4, ignore_index=True)
def root_nested(df):
df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']), df['col1'])
return df
def root_split(df):
df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])
return df
def pir2(df):
df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)
return df
def pir2_2(df):
slc = (df.values != 0).argmax(axis=1)
return df.values[np.arange(slc.shape[0]), slc]
def andrew(df):
df.col1[df.col1 == 0] = df.col2
df.col1[df.col1 == 0] = df.col3
return df
def pablo(df):
df['col1'] = df['col1'].replace(0,df['col2'])
df['col1'] = df['col1'].replace(0,df['col3'])
return df
I get the following timings:
%timeit root_nested(df.copy())
100 loops, best of 3: 2.25 ms per loop
%timeit root_split(df.copy())
100 loops, best of 3: 2.62 ms per loop
%timeit pir2(df.copy())
100 loops, best of 3: 6.25 ms per loop
%timeit pir2_2(df.copy())
1 loop, best of 3: 2.4 ms per loop
%timeit andrew(df.copy())
100 loops, best of 3: 8.55 ms per loop
I tried timing your method, but it's been running for multiple minutes without completing. As a comparison, timing your method on just the 6 row example DataFrame (not the much larger one tested above) took 12.8 ms.
Change one value based on another value in pandas
One option is to use Python's slicing and indexing features to logically evaluate the places where your condition holds and overwrite the data there.
Assuming you can load your data directly into pandas
with pandas.read_csv
then the following code might be helpful for you.
import pandas
df = pandas.read_csv("test.csv")
df.loc[df.ID == 103, 'FirstName'] = "Matt"
df.loc[df.ID == 103, 'LastName'] = "Jones"
As mentioned in the comments, you can also do the assignment to both columns in one shot:
df.loc[df.ID == 103, ['FirstName', 'LastName']] = 'Matt', 'Jones'
Note that you'll need pandas
version 0.11 or newer to make use of loc
for overwrite assignment operations. Indeed, for older versions like 0.8 (despite what critics of chained assignment may say), chained assignment is the correct way to do it, hence why it's useful to know about even if it should be avoided in more modern versions of pandas.
Another way to do it is to use what is called chained assignment. The behavior of this is less stable and so it is not considered the best solution (it is explicitly discouraged in the docs), but it is useful to know about:
import pandas
df = pandas.read_csv("test.csv")
df['FirstName'][df.ID == 103] = "Matt"
df['LastName'][df.ID == 103] = "Jones"
EXCEL - Replace function | Replace value from one column to replace with other column value
EDIT : For English users replace ; by , in the formula
You can use SEARCH
to get the position of the $ and REPLACE
everything after it.
For example with schema name in column A and schema value in column B -> result in column C.
To find the $ you need to use SEARCH("$";A2)
it will give you the position.
Then you can count the number of characters to replace after the "$" by substracting the position to the length of the schema name with LEN()
. (+1 to get the last char)
Then you can combine everything :
=REPLACE(A2;SEARCH("$";A2);LEN(A2)-SEARCH("$";A2)+1;B2)
Result in my Excel :
Related Topics
Does There Exist Kernel Stack for Each Process
Powershell Connecting from a Linux Client to a Windows Remote
Get X/Y Position of Caret (Input Text Cursor) Under Xorg
Bash: Getting Percentage from a Frequency Table
How to Store Your Github Https Password on Linux in a Terminal Keychain
Windows Command Line Equivalent to "Time" in Linux
How to Run a Cron Job with Arguments and Pass Results to a Log
How to Delete All Files Starting with ._ from The Shell in Linux
Tomcat Server Creating Directories in Tmp
Difference Between Linux Variables $Bash_Subshell Vs $Shlvl
Selecting a Part of a File and Copying That into New File in Linux
Open File in Default Editor from Bash
Source Line Numbers in Perf Call Graph
Setting Environment Variable with Leading Digit in Bash