Sklearn Error Valueerror: Input Contains Nan, Infinity or a Value Too Large for Dtype('Float64')

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.

EDIT: How could I miss that:

np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True

is obviously wrong. Right would be:

np.any(np.isnan(mat))

and

np.all(np.isfinite(mat))

You want to check whether any of the elements are NaN, and not whether the return value of the any function is a number...

Input contains NaN, infinity or a value too large for dtype('float32') when I train a DecisionTreeClassifier

There are some infinite values in your mass_error_min column:

data_new_2.describe()

mass mass_error_min
count 1425.000000 1425.0000
mean 6.060956 inf
std 13.568726 NaN
min 0.000002 0.0000
25% 0.054750 0.0116
50% 0.725000 0.0700
75% 3.213000 0.5300
max 135.300000 inf

So, you have to fill those inf with some value, use this code:

value = data_new_2['mass_error_min'].quantile(0.98)
data_new_2 = data_new_2.replace(np.inf, value)

Scikit-learn : Input contains NaN, infinity or a value too large for dtype ('float64')

The problem with your regression is that somehow NaN's have sneaked into your data. This could be easily checked with the following code snippet:

import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.cross_validation import train_test_split

reader = pd.io.parsers.read_csv("./data/all-stocks-cleaned.csv")
stock = np.array(reader)

openingPrice = stock[:, 1]
closingPrice = stock[:, 5]

openingPriceTrain, openingPriceTest, closingPriceTrain, closingPriceTest = \
train_test_split(openingPrice, closingPrice, test_size=0.25, random_state=42)

openingPriceTrain = openingPriceTrain.reshape(openingPriceTrain.size,1)
openingPriceTrain = openingPriceTrain.astype(np.float64, copy=False)

closingPriceTrain = closingPriceTrain.reshape(closingPriceTrain.size,1)
closingPriceTrain = closingPriceTrain.astype(np.float64, copy=False)

openingPriceTest = openingPriceTest.reshape(openingPriceTest.size,1)
openingPriceTest = openingPriceTest.astype(np.float64, copy=False)

np.isnan(openingPriceTrain).any(), np.isnan(closingPriceTrain).any(), np.isnan(openingPriceTest).any()

(True, True, True)

If you try imputing missing values like below:

openingPriceTrain[np.isnan(openingPriceTrain)] = np.median(openingPriceTrain[~np.isnan(openingPriceTrain)])
closingPriceTrain[np.isnan(closingPriceTrain)] = np.median(closingPriceTrain[~np.isnan(closingPriceTrain)])
openingPriceTest[np.isnan(openingPriceTest)] = np.median(openingPriceTest[~np.isnan(openingPriceTest)])

your regression will run smoothly without a problem:

regression = linear_model.LinearRegression()

regression.fit(openingPriceTrain, closingPriceTrain)

predicted = regression.predict(openingPriceTest)

predicted[:5]

array([[ 13598.74748173],
[ 53281.04442146],
[ 18305.4272186 ],
[ 50753.50958453],
[ 14937.65782778]])

In short: you have missing values in your data, as the error message said.

EDIT::

perhaps an easier and more straightforward approach would be to check if you have any missing data right after you read the data with pandas:

data = pd.read_csv('./data/all-stocks-cleaned.csv')
data.isnull().any()
Date False
Open True
High True
Low True
Last True
Close True
Total Trade Quantity True
Turnover (Lacs) True

and then impute the data with any of the two lines below:

data = data.fillna(lambda x: x.median())

or

data = data.fillna(method='ffill')

ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). sklearn

Looks like the column hours_worked_each_week contains nulls.

Do you get the same error if you drop that column:

X = df.drop(['infected', 'hours_worked_each_week'], axis=1).values

Alternatively, you can replace nulls with 0

df.fillna(0,inplace=True)


Related Topics



Leave a reply



Submit