sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.
EDIT: How could I miss that:
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
is obviously wrong. Right would be:
np.any(np.isnan(mat))
and
np.all(np.isfinite(mat))
You want to check whether any of the elements are NaN, and not whether the return value of the any
function is a number...
Input contains NaN, infinity or a value too large for dtype('float32') when I train a DecisionTreeClassifier
There are some infinite values in your mass_error_min
column:
data_new_2.describe()
mass mass_error_min
count 1425.000000 1425.0000
mean 6.060956 inf
std 13.568726 NaN
min 0.000002 0.0000
25% 0.054750 0.0116
50% 0.725000 0.0700
75% 3.213000 0.5300
max 135.300000 inf
So, you have to fill those inf with some value, use this code:
value = data_new_2['mass_error_min'].quantile(0.98)
data_new_2 = data_new_2.replace(np.inf, value)
Scikit-learn : Input contains NaN, infinity or a value too large for dtype ('float64')
The problem with your regression is that somehow NaN
's have sneaked into your data. This could be easily checked with the following code snippet:
import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.cross_validation import train_test_split
reader = pd.io.parsers.read_csv("./data/all-stocks-cleaned.csv")
stock = np.array(reader)
openingPrice = stock[:, 1]
closingPrice = stock[:, 5]
openingPriceTrain, openingPriceTest, closingPriceTrain, closingPriceTest = \
train_test_split(openingPrice, closingPrice, test_size=0.25, random_state=42)
openingPriceTrain = openingPriceTrain.reshape(openingPriceTrain.size,1)
openingPriceTrain = openingPriceTrain.astype(np.float64, copy=False)
closingPriceTrain = closingPriceTrain.reshape(closingPriceTrain.size,1)
closingPriceTrain = closingPriceTrain.astype(np.float64, copy=False)
openingPriceTest = openingPriceTest.reshape(openingPriceTest.size,1)
openingPriceTest = openingPriceTest.astype(np.float64, copy=False)
np.isnan(openingPriceTrain).any(), np.isnan(closingPriceTrain).any(), np.isnan(openingPriceTest).any()
(True, True, True)
If you try imputing missing values like below:
openingPriceTrain[np.isnan(openingPriceTrain)] = np.median(openingPriceTrain[~np.isnan(openingPriceTrain)])
closingPriceTrain[np.isnan(closingPriceTrain)] = np.median(closingPriceTrain[~np.isnan(closingPriceTrain)])
openingPriceTest[np.isnan(openingPriceTest)] = np.median(openingPriceTest[~np.isnan(openingPriceTest)])
your regression will run smoothly without a problem:
regression = linear_model.LinearRegression()
regression.fit(openingPriceTrain, closingPriceTrain)
predicted = regression.predict(openingPriceTest)
predicted[:5]
array([[ 13598.74748173],
[ 53281.04442146],
[ 18305.4272186 ],
[ 50753.50958453],
[ 14937.65782778]])
In short: you have missing values in your data, as the error message said.
EDIT::
perhaps an easier and more straightforward approach would be to check if you have any missing data right after you read the data with pandas:
data = pd.read_csv('./data/all-stocks-cleaned.csv')
data.isnull().any()
Date False
Open True
High True
Low True
Last True
Close True
Total Trade Quantity True
Turnover (Lacs) True
and then impute the data with any of the two lines below:
data = data.fillna(lambda x: x.median())
or
data = data.fillna(method='ffill')
ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). sklearn
Looks like the column hours_worked_each_week
contains nulls.
Do you get the same error if you drop that column:
X = df.drop(['infected', 'hours_worked_each_week'], axis=1).values
Alternatively, you can replace nulls with 0
df.fillna(0,inplace=True)
Related Topics
How to Insert a Checkbox in a Django Form
Adding Months to a Pandas Object in Python
How to Extract Address from Raw Text Using Nltk in Python
Stripping Whitespaces from a List Inside the List of Tuples
Fastest Way to Compute Image Dataset Channel Wise Mean and Standard Deviation in Python
Windowserror: [Error 193] %1 Is Not a Valid Win32 Application in Python
Conda: Remove All Installed Packages from Base/Root Environment
Capturing Video from Two Cameras in Opencv At Once
Pandas Join Dataframes Based on Conditions
How to Embed Matplotlib Graph in Django Webpage
Setting Matplotlib Colorbar Range
How to Sort a List of Lists by a Specific Index of the Inner List
How to Update/Delete Rows in Bigquery from the Python API
Matplotlib Rotate Image File by X Degrees
Why Does It Say That Module Pygame Has No Init Member
Django: Calling .Update() on a Single Model Instance Retrieved by .Get()