Replace Negative Values by Zero

Replace negative values by zero

Thanks for the reproducible example. This is pretty basic R stuff. You can assign to selected elements of a vector (note an array has dimensions, and what you've given is a vector not an array):

> pred_precipitation[pred_precipitation<0] <- 0
> pred_precipitation
 [1] 1.2091281 0.0000000 7.7665555 0.0000000 0.0000000 0.0000000 0.5151504 0.0000000 1.8281251
[10] 0.5098688 2.8370263 0.4895606 1.5152191 4.1740177 7.1527742 2.8992215 4.5322934 6.7180530
[19] 0.0000000 1.1914052 3.6152333 0.0000000 0.3778717 0.0000000 1.4940469

Benchmark wars!

@James has found an even faster method and left it in a comment. I upvoted him, if only because I know his victory will be short-lived.

First, I try compiling, but that doesn't seem to help anyone:

p <- rnorm(10000)
gsk3 <- function(x) { x[x<0] <- 0; x }
jmsigner <- function(x) ifelse(x<0, 0, x)
joshua <- function(x) pmin(x,0)
james <- function(x) (abs(x)+x)/2
library(compiler)
gsk3.c <- cmpfun(gsk3)
jmsigner.c <- cmpfun(jmsigner)
joshua.c <- cmpfun(joshua)
james.c <- cmpfun(james)

microbenchmark(joshua(p),joshua.c(p),gsk3(p),gsk3.c(p),jmsigner(p),james(p),jmsigner.c(p),james.c(p))
           expr      min        lq    median        uq      max
1     gsk3.c(p)  251.782  255.0515  266.8685  269.5205  457.998
2       gsk3(p)  256.262  261.6105  270.7340  281.3560 2940.486
3    james.c(p)   38.418   41.3770   43.3020   45.6160  132.342
4      james(p)   38.934   42.1965   43.5700   47.2085 4524.303
5 jmsigner.c(p) 2047.739 2145.9915 2198.6170 2291.8475 4879.418
6   jmsigner(p) 2047.502 2169.9555 2258.6225 2405.0730 5064.334
7   joshua.c(p)  237.008  244.3570  251.7375  265.2545  376.684
8     joshua(p)  237.545  244.8635  255.1690  271.9910  430.566

compiled comparison

But wait! Dirk wrote this Rcpp thing. Can a complete C++ incompetent read his JSS paper, adapt his example, and write the fastest function of them all? Stay tuned, dear listeners.

library(inline)
cpp_if_src <- '
  Rcpp::NumericVector xa(a);
  int n_xa = xa.size();
  for(int i=0; i < n_xa; i++) {
    if(xa[i]<0) xa[i] = 0;
  }
  return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
microbenchmark(joshua(p),joshua.c(p),gsk3(p),gsk3.c(p),jmsigner(p),james(p),jmsigner.c(p),james.c(p), cpp_if(p))
         expr      min        lq    median        uq       max
1   cpp_if(p)    8.233   10.4865   11.6000   12.4090    69.512
2     gsk3(p)  170.572  172.7975  175.0515  182.4035  2515.870
3    james(p)   37.074   39.6955   40.5720   42.1965  2396.758
4 jmsigner(p) 1110.313 1118.9445 1133.4725 1164.2305 65942.680
5   joshua(p)  237.135  240.1655  243.3990  250.3660  2597.429

with rcpp comparison

That's affirmative, captain.

This modifies the input p even if you don't assign to it. If you want to avoid that behavior, you have to clone:

cpp_ifclone_src <- '
  Rcpp::NumericVector xa(Rcpp::clone(a));
  int n_xa = xa.size();
  for(int i=0; i < n_xa; i++) {
    if(xa[i]<0) xa[i] = 0;
  }
  return xa;
'
cpp_ifclone <- cxxfunction(signature(a="numeric"), cpp_ifclone_src, plugin="Rcpp")

Which unfortunately kills the speed advantage.

How to replace negative numbers in Pandas Data Frame by zero

If all your columns are numeric, you can use boolean indexing:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})

In [3]: df
Out[3]: 
   a  b
0  0 -3
1 -1  2
2  2  1

In [4]: df[df < 0] = 0

In [5]: df
Out[5]: 
   a  b
0  0  0
1  0  2
2  2  1

For the more general case, this answer shows the private method _get_numeric_data:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
                           'c': ['foo', 'goo', 'bar']})

In [3]: df
Out[3]: 
   a  b    c
0  0 -3  foo
1 -1  2  goo
2  2  1  bar

In [4]: num = df._get_numeric_data()

In [5]: num[num < 0] = 0

In [6]: df
Out[6]: 
   a  b    c
0  0  0  foo
1  0  2  goo
2  2  1  bar

With timedelta type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df
Out[3]: 
        a       b
0  0 days -3 days
1 -1 days  2 days
2  2 days  1 days

In [4]: for k, v in df.iteritems():
   ...:     v[v < 0] = 0
   ...:     

In [5]: df
Out[5]: 
       a      b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

Update: comparison with a pd.Timedelta works on the whole DataFrame:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df[df < pd.Timedelta(0)] = 0

In [4]: df
Out[4]: 
       a      b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

How to replace negative values with 0 in pyspark dataframe

You can replace null values with 0 (or any value of your choice) across all columns with df.fillna(0) method. However to replace negative values across columns, I don't there is any direct approach, except using case when on each column as below.

from pyspark.sql import functions as F

df.withColumn(
    "col1",
    F.when(df["col1"] < 0, 0).when(F.col("col1").isNull(), 0).otherwise(F.col("col1")),
)

Power Query - Replace negative numbers with zero

Right click the column, transform .. absolute value

edit the end of the code from something that looks like this

= Table.TransformColumns(#"PriorStep",{{"Column1", Number.Abs, type number}})

to look like this

= Table.TransformColumns(#"PriorStep",{{"Column1", each if _ < 0 then 0 else _ , type number}})

What is the fastest way to replace negative values with 0 and values greater than 1 with 1 in an array using Python?

You want to use np.clip:

>>> import numpy as np
>>> list_values = [-0.01, 0, 0.5, 0.9, 1.0, 1.01]
>>> arr = np.array(list_values)
>>> np.clip(arr, 0.0, 1.0)
array([0. , 0. , 0.5, 0.9, 1. , 1. ])

This is likely the fastest approach, if you can ignore the cost of converting to an array. Should be a lot better for larger lists/arrays.

Involving pandas in this operation isn't the way to go unless you eventually want a pandas data structure.

Replacing all negative values in all columns by zero in python

Use pandas.DataFrame.clip:

df.iloc[:, 1:] = df.iloc[:, 1:].clip(0)
print(df)

Output:

             date  T1  T2  T3  T4
0  1-1-2010 00:10  20   0   4   3
1  1-1-2010 00:20  85   0  34  21
2  1-1-2010 00:30   0  22  31  75
3  1-1-2010 00:40   0   5   7   0

Not only clip is faster than mask in your sample, but also in the larger dataset:

# Your sample -> 3x faster
%timeit df.iloc[:, 1:].clip(0)
# 1.74 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.iloc[:,1:].mask(df.iloc[:,1:] < 0, 0)
# 5.25 ms ± 573 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Large Sample -> 1,000,000 elements --> about 30x
large_df = pd.DataFrame(pd.np.random.randint(-5, 5, (1000, 1000)))

%timeit large_df.clip(0)
# 17.2 ms ± 2.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit large_df.mask(large_df< 0, 0)
# 498 ms ± 47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Replace Negative Values by Zero