Logisticregression: Unknown Label Type: 'Continuous' Using Sklearn in Python

LogisticRegression: Unknown label type: 'continuous' using sklearn in python

You are passing floats to a classifier which expects categorical values as the target vector. If you convert it to int it will be accepted as input (although it will be questionable if that's the right way to do it).

It would be better to convert your training scores by using scikit's labelEncoder function.

The same is true for your DecisionTree and KNeighbors qualifier.

from sklearn import preprocessing
from sklearn import utils

lab_enc = preprocessing.LabelEncoder()
encoded = lab_enc.fit_transform(trainingScores)
>>> array([1, 3, 2, 0], dtype=int64)

print(utils.multiclass.type_of_target(trainingScores))
>>> continuous

print(utils.multiclass.type_of_target(trainingScores.astype('int')))
>>> multiclass

print(utils.multiclass.type_of_target(encoded))
>>> multiclass

Logistic regression: ValueError: Unknown label type: 'continuous'

LogisticRegression from sklearn is a classifier, i.e. it expects that the response variable is categorical.

Your task is of regression. Moreover, the plot does not seem to have the asymptotic behavior of a logit on the right. You may have better results using a polynomial regression as described here.

ValueError: Unknown label type: 'continuous' in DecisionTreeClassifier()

In ML, it's important as a first step to consider the nature of your problem. Is it a regression or classification problem? Do you have target data (supervised learning) or is this a problem where you don't have a target and want to learn more about your data's inherent structure (such as unsupervised learning). Then, consider what steps you need to take in your pipeline to prepare your data (preprocessing).

In this case, you are passing floats (floating point numbers) to a Classifier (DecisionTreeClassifier). The problem with this is that a classifier generally separates distinct classes, and so this classifier expects a string or an integer type to distinguish different classes from each other (this is known as the "target"). You can read more about this in an introduction to classifiers.

The problem you seek to solve is to determine a continuous numerical output, Result. This is known as a regression problem, and so you need to use a Regression algorithm (such as the DecisionTreeRegressor). You can try other regression algorithms out once you have this simple one working, and this is a good place to start as it is a fairly straight forward one to understand, it is fairly transparent, it is fast, and easily implemented - so decision trees were a great choice of starting point!

As a further note, it is important to consider preprocessing your data. You have done some of this simply by separating your target from your input data:

X = dataset.drop(columns=['Date','Result'])
y = dataset.drop(columns=['Date', 'Open', 'High', 'Close'])

However, you may wish to look into preprocessing further, particularly standardisation of your data. This is often a required step for whichever ML algorithm you implement to be able to interpret your data. There's a saying that goes: "Garbage in, garbage out".

Part of preprocessing sometimes requires you to change the data type of a given column. The error posted in your question, at face value, leads one to think that the issue on hand is that you need to change data types. But, as explained, in the case of your problem, it wouldn't help to do that, given that you seek to use regression to determine a continuous output.

Unknown label type: continuous

You are looking for KNeighborsRegressor not KNeighborsClassifier
Change your code to

X = df[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']] 
y = df['Yearly Amount Spent']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=1)
knn.fit(X_train,y_train)


Related Topics



Leave a reply



Submit