ValueError: x and y must be the same size
Print X_train shape. What do you see? I'd bet X_train
is 2d (matrix with a single column), while y_train
1d (vector). In turn you get different sizes.
I think using X_train[:,0]
for plotting (which is from where the error originates) should solve the problem
ValueError: x and y must be the same size when ploting predicted values
You are trying to generate a scatter plot of the DataFrame x
(a 1942871 X N object) against the Series y
. The code fails because x
has more elements in total than y
.
print('size of x = {0}'.format(x.size))
print('size of y = {0}'.format(y.size))
assert x.size == y.size
The sizes are not equal, hence the code fails.
If you must have a scatter plot of x
against y
, do so on a column-by-column basis
for col in x.columns:
plt.scatter(x[col], y, s=10)
matplotlib error: x and y must be the same size
X_test.shape = [36648 rows x 2 columns]
Both data arguments in plt.scatter
(here y_test
and X_test
) must be 1-dimensional arrays; from the docs:
x, y : array_like, shape (n, )
while here you attempt to pass a 2-dimensional matrix for X_test
, hence the error of different size.
You cannot get a scatter plot of a matrix with an array/vector; what you could do is produce two separate scatter plots, one for each column in your X_test
:
plt.figure(2)
plt.scatter(y_test, X_test.iloc[:,0].values)
plt.figure(3)
plt.scatter(y_test, X_test.iloc[:,1].values)
Matplotlib Error: x and y must be the same size, scatter plot is off
The reason for your error is that you are using survived.PClass
instead of failed.PClass
.
Updated code
plt.figure(figsize=(10,6))
plt.scatter(survived.Fare, survived.Pclass, alpha =0.5, color = 'orange', label='Survived');
plt.scatter(failed.Fare, failed.Pclass, alpha =0.5, color = 'blue', label='Failed');
plt.title('Distribution of Pclass and Fare for Survived and Failed')
plt.xlabel('Fare')
plt.ylabel('Pclass')
plt.legend()
plt.savefig('Survived_and_not_survived.jpg')
Output graph
ValueError: x and y must be the same size (Linear regression)
I repeated your code by using the housing competition data (just to have a working example. Here my code (I commented lines of your code that did not fit my data)
df = pd.read_csv('data/train.csv')
#X = df[['date', 'area', 'code','houses_sold', 'no_of_crimes']]
#y = df['average_price']
X = df[['GarageType', 'Alley', 'LotShape']]
y = df['SalePrice']
X = pd.get_dummies(df[['GarageType', 'Alley', 'LotShape']])
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print("Xtrain", X_train.shape, "y_train",
y_train.shape, "Xtest", X_test.shape, "y_test", y_test.shape)
#regr = linear_model.LinearRegression()
regr = LinearRegression()
lr = LinearRegression()
lr.fit(X_train,y_train)
print("Score on training set: {:.3f}".format(lr.score(X_train, y_train)))
print("Score on test set: {:.3f}".format(lr.score(X_test, y_test)))
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
If I check the shape I get
In [6]: X_test.shape
Out[6]: (365, 12)
In [7]: y_test.shape
Out[7]: (365,)
which is clearly not the same. You need one dimension for both X_test and y_test. I guess you want to choose one column, like this:
plt.scatter(X_test[X_test.columns[0]], y_test, color="black")
Related Topics
How to Extract Rar Files Inside Google Colab
Importing Large Tab-Delimited .Txt File into Python
Python File Opens and Immediately Closes
Permissionerror: [Errno 13] Permission Denied Flask.Run()
How to Extract Data from Dictionary in the List
How to Locate the Input Within Div
Pandas - Find Index of Value Anywhere in Dataframe
Subtracting Values Across Grouped Data Frames in Pandas
How to Find Words in a List That Starts With a Certain Letter the User Asked For
How to Deal With Certificates Using Selenium
In Python, How to Find the Vowels in a Word
In Dictionary, Converting the Value from String to Integer
How to Increment a Variable on a for Loop in Jinja Template
How to Use Ffmpeg in a Python Function
Django Model Choice Option as a Multi Select Box
Check If a Python Script Is Already Running in Windows