Python RuntimeWarning: overflow encountered in long scalars
Here's an example which issues the same warning:
import numpy as np
np.seterr(all='warn')
A = np.array([10])
a=A[-1]
a**a
yields
RuntimeWarning: overflow encountered in long_scalars
In the example above it happens because a
is of dtype int32
, and the maximim value storable in an int32
is 2**31-1. Since 10**10 > 2**32-1
, the exponentiation results in a number that is bigger than that which can be stored in an int32
.
Note that you can not rely on np.seterr(all='warn')
to catch all overflow
errors in numpy. For example, on 32-bit NumPy
>>> np.multiply.reduce(np.arange(21)+1)
-1195114496
while on 64-bit NumPy:
>>> np.multiply.reduce(np.arange(21)+1)
-4249290049419214848
Both fail without any warning, although it is also due to an overflow error. The correct answer is that 21! equals
In [47]: import math
In [48]: math.factorial(21)
Out[50]: 51090942171709440000L
According to numpy developer, Robert Kern,
Unlike true floating point errors (where the hardware FPU sets a
flag
whenever it does an atomic operation that overflows), we need to
implement the integer overflow detection ourselves. We do it on
the
scalars, but not arrays because it would be too slow to implement
for
every atomic operation on arrays.
So the burden is on you to choose appropriate dtypes
so that no operation overflows.
Why do I keep getting this error 'RuntimeWarning: overflow encountered in int_scalars'
Python int
are represented in arbitrary precision, so they cannot overflow. But numpy
uses C++ under the hood, so the highest long signed integer is with fixed precision, 2^63 - 1
. Your number is far beyond this value, having in average ((716-1)/2)^86507)
.
When you, in the for
loop, extract the x[0]
this is still a numpy
object. To use the full power of python integers you need to clearly assign it as python int
, like this:
product_0 = 1
product_1 = 1
for x in arr:
t = int(x[0])
product_0 = product_0 * t
and it will not overflow.
Following your comment, which makes your question more specific, your original problem is to calculate the geometric mean of the array for each row/column. Here the solution:
I generate first an array that has the same properties of your array:
arr = np.resize(np.random.randint(1,716,86507*2 ),(86507,2))
Then, calculate the geometric mean for each column/row:
from scipy import stats
gm_0 = stats.mstats.gmean(arr, axis = 0)
gm_1 = stats.mstats.gmean(arr, axis = 1)
gm_0
will be an array that contains the geometric mean of the x
and y
coordinates. gm_1
instead contains the geometric mean of the rows.
Hope this solves your problem!
I get the error RuntimeWarning: overflow encountered in long_scalars when trying to calculate the last digit of a Fibonacci number using python
One of the properties of decimal addition is that the least-significant digit of the sum only depends on the least-significant digit of the two numbers being added. So you can do all the calculations with only one digit:
def calcFibonacciLastDigit(n):
if n <= 1:
return n
a=0
b=1
for i in range(2,n+1):
c = a+b
if c >= 10: #max value for c is 18
c -= 10 #so subtracting 10 will keep c as one digit
a = b
b = c
return c
print(calcFibonacciLastDigit(331)) #prints 9
overflow encountered in long scalers while stress testing for maximum pair-wise product
The error you're getting has to do with your datatype, as is discussed in this similar question.
I think the solution is to specify the datatype as a 64-bit datatype. You can do this when creating your vector:
vector = list(np.float64(np.random.randint(0,1000000, n)))
"np.float64" makes the code work for me. Does it still do what you intend to do? Otherwise you could also look at other 64-bit datatypes, such as "int64" and "uint64".
Overflow error encountered in double scalars
Edit: tl;dr Solution
Ok so here is the minimal reproducible example that I was talking of. I replace your X,Y by the following.
n = 10**2
X = np.linspace(0,10**6,n)
Y = 1.5*X+0.2*10**6*np.random.normal(size=n)
If I then run
b=0
m=0
numberOfIttertions = 1000
m,b = graident_decsent(X , Y ,m,b,numberOfIttertions , 0.001)
I get exactly the problem you describe. The only surprising thing is the ease of the solution. I just replace your alpha by 10**-14 and everything works fine.
Why and how to give a Minimal, Reproducible Example
Your example is not reproducible since we don't have train.csv
. Generally both for understanding your problem yourself and to get concrete answers it is very helpful to have a very small example that people can run and tinker with it. E.g. maybe you can think of a much shorter input to your regression that also results in this error.
The first RuntimeWarning
But now to your question. Your first RuntimeWarning
i.e.
linearRegression.py:22: RuntimeWarning: overflow encountered in double_scalars
m_graident += (-2/N) * x*(y-(m*x+b))
means x
and hence m_graident
are of type numpy.double=numpy.float64
. This datatype can store numbers in the range (-1.79769313486e+308, 1.79769313486e+308).
If you go bigger or smaller that's called an overflow. E.g.np.double(1.79769313486e+308)
is still ok but if you multiply it by say 1.1
you get your favorite runtime warning. Notice that this is 'just' a warning and still runs. But it can't give you a number back since it would be too big. instead it gives you inf
.
The other RuntimeWarnings
Ok but what does
linearRegression.py:21: RuntimeWarning: invalid value encountered in double_scalars
b_graident +=(-2/N) * (y-(m*x+b))
mean?
It comes from calculating with the infinity that I just mentioned. Some calculations with infinity are valid.
np.inf-10**6 -> inf
np.inf+10**6 -> inf
np.inf/10**6 -> inf
np.inf*10**6 -> inf
np.inf*(-10**6) -> -inf
1/np.inf -> 0
np.inf *np.inf -> inf
but some are not and give nan
i.e. not a number.
np.inf/np.inf
np.inf-np.inf
These are called indeterminate forms in math since it depends on how you got to the infinity what you would get out. E.g.
(np.double(1e+309)+np.double(1e+309))-np.double(1e+309)
np.double(1e+309)-(np.double(1e+309)+np.double(1e+309))
are both inf-inf
but you would expect different results.
Getting a nan
is unfortunate since calculations with nan
yield always nan
. And you can't use your gradients anymore once you add a nan
.
Other resources
An other option is to use an existing implementation of linear regression. E.g. from scikit-learn
. See
scikit-learn linear regression reference
scikit-learn user guid on linear models
Related Topics
When to Close Cursors Using MySQLdb
Sample Each Group After Pandas Groupby
Pandas Equivalent of Oracle Lead/Lag Function
How to Get the Domain Name of My Site Within a Django Template
How to Get Value from Form Field in Django Framework
Zip with List Output Instead of Tuple
Nltk-Based Text Processing with Pandas
Activating Anaconda Environment in VScode
How to Copy Inmemoryuploadedfile Object to Disk
Cannot Import Qtwebkitwidgets in Pyqt5
Multithreaded Web Server in Python
How to Use If/Else in a Dictionary Comprehension
I Have Python on My Ubuntu System, But Gcc Can't Find Python.H
Why Does Pandas Apply Calculate Twice
In Tensorflow, Differencebetween Session.Run() and Tensor.Eval()