LogisticRegression: Unknown label type: 'continuous' using sklearn in python
You are passing floats to a classifier which expects categorical values as the target vector. If you convert it to int
it will be accepted as input (although it will be questionable if that's the right way to do it).
It would be better to convert your training scores by using scikit's labelEncoder
function.
The same is true for your DecisionTree and KNeighbors qualifier.
from sklearn import preprocessing
from sklearn import utils
lab_enc = preprocessing.LabelEncoder()
encoded = lab_enc.fit_transform(trainingScores)
>>> array([1, 3, 2, 0], dtype=int64)
print(utils.multiclass.type_of_target(trainingScores))
>>> continuous
print(utils.multiclass.type_of_target(trainingScores.astype('int')))
>>> multiclass
print(utils.multiclass.type_of_target(encoded))
>>> multiclass
Logistic regression: ValueError: Unknown label type: 'continuous'
LogisticRegression
from sklearn
is a classifier, i.e. it expects that the response variable is categorical.
Your task is of regression. Moreover, the plot does not seem to have the asymptotic behavior of a logit on the right. You may have better results using a polynomial regression as described here.
ValueError: Unknown label type: 'continuous' in DecisionTreeClassifier()
In ML, it's important as a first step to consider the nature of your problem. Is it a regression or classification problem? Do you have target data (supervised learning) or is this a problem where you don't have a target and want to learn more about your data's inherent structure (such as unsupervised learning). Then, consider what steps you need to take in your pipeline to prepare your data (preprocessing).
In this case, you are passing floats (floating point numbers) to a Classifier (DecisionTreeClassifier). The problem with this is that a classifier generally separates distinct classes, and so this classifier expects a string
or an integer
type to distinguish different classes from each other (this is known as the "target"). You can read more about this in an introduction to classifiers.
The problem you seek to solve is to determine a continuous numerical output, Result
. This is known as a regression problem, and so you need to use a Regression algorithm (such as the DecisionTreeRegressor). You can try other regression algorithms out once you have this simple one working, and this is a good place to start as it is a fairly straight forward one to understand, it is fairly transparent, it is fast, and easily implemented - so decision trees were a great choice of starting point!
As a further note, it is important to consider preprocessing your data. You have done some of this simply by separating your target from your input data:
X = dataset.drop(columns=['Date','Result'])
y = dataset.drop(columns=['Date', 'Open', 'High', 'Close'])
However, you may wish to look into preprocessing further, particularly standardisation of your data. This is often a required step for whichever ML algorithm you implement to be able to interpret your data. There's a saying that goes: "Garbage in, garbage out".Part of preprocessing sometimes requires you to change the data type of a given column. The error posted in your question, at face value, leads one to think that the issue on hand is that you need to change data types. But, as explained, in the case of your problem, it wouldn't help to do that, given that you seek to use regression to determine a continuous output.
Unknown label type: continuous
You are looking for KNeighborsRegressor
not KNeighborsClassifier
Change your code to
X = df[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']]
y = df['Yearly Amount Spent']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=1)
knn.fit(X_train,y_train)
Related Topics
Python and Operator on Two Boolean Lists - How
How to Tell Pycharm What Type a Parameter Is Expected to Be
What Is the Default _Hash_ in Python
Inline CSV File Editing with Python
How to Skip Iterations in a Loop
Loading JSONl File as JSON Objects
Python MySQL Connector - Unread Result Found When Using Fetchone
Use Scikit-Learn to Classify into Multiple Categories
Slicing of a Numpy 2D Array, or How to Extract an Mxm Submatrix from an Nxn Array (N>M)
Start a Flask Application in Separate Thread
When Should an Attribute Be Private and Made a Read-Only Property
How to Save the Pandas Dataframe/Series Data as a Figure
How to Convert Datetime.Timedelta to Minutes, Hours in Python