Valueerror: X and Y Must Be the Same Size

ValueError: x and y must be the same size

Print X_train shape. What do you see? I'd bet X_train is 2d (matrix with a single column), while y_train 1d (vector). In turn you get different sizes.

I think using X_train[:,0] for plotting (which is from where the error originates) should solve the problem

ValueError: x and y must be the same size when ploting predicted values

You are trying to generate a scatter plot of the DataFrame x (a 1942871 X N object) against the Series y. The code fails because x has more elements in total than y.

print('size of x = {0}'.format(x.size))
print('size of y = {0}'.format(y.size))
assert x.size == y.size

The sizes are not equal, hence the code fails.

If you must have a scatter plot of x against y, do so on a column-by-column basis

for col in x.columns:
plt.scatter(x[col], y, s=10)

matplotlib error: x and y must be the same size

X_test.shape = [36648 rows x 2 columns]

Both data arguments in plt.scatter (here y_test and X_test) must be 1-dimensional arrays; from the docs:

x, y : array_like, shape (n, )

while here you attempt to pass a 2-dimensional matrix for X_test, hence the error of different size.

You cannot get a scatter plot of a matrix with an array/vector; what you could do is produce two separate scatter plots, one for each column in your X_test:

plt.figure(2)
plt.scatter(y_test, X_test.iloc[:,0].values)

plt.figure(3)
plt.scatter(y_test, X_test.iloc[:,1].values)

Matplotlib Error: x and y must be the same size, scatter plot is off

The reason for your error is that you are using survived.PClass instead of failed.PClass.

Updated code

plt.figure(figsize=(10,6))
plt.scatter(survived.Fare, survived.Pclass, alpha =0.5, color = 'orange', label='Survived');
plt.scatter(failed.Fare, failed.Pclass, alpha =0.5, color = 'blue', label='Failed');
plt.title('Distribution of Pclass and Fare for Survived and Failed')
plt.xlabel('Fare')
plt.ylabel('Pclass')
plt.legend()
plt.savefig('Survived_and_not_survived.jpg')

Output graph

Sample Image

ValueError: x and y must be the same size (Linear regression)

I repeated your code by using the housing competition data (just to have a working example. Here my code (I commented lines of your code that did not fit my data)

df = pd.read_csv('data/train.csv')

#X = df[['date', 'area', 'code','houses_sold', 'no_of_crimes']]
#y = df['average_price']
X = df[['GarageType', 'Alley', 'LotShape']]
y = df['SalePrice']

X = pd.get_dummies(df[['GarageType', 'Alley', 'LotShape']])

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

print("Xtrain", X_train.shape, "y_train",
y_train.shape, "Xtest", X_test.shape, "y_test", y_test.shape)


#regr = linear_model.LinearRegression()
regr = LinearRegression()


lr = LinearRegression()
lr.fit(X_train,y_train)
print("Score on training set: {:.3f}".format(lr.score(X_train, y_train)))
print("Score on test set: {:.3f}".format(lr.score(X_test, y_test)))

regr.fit(X_train, y_train)

y_pred = regr.predict(X_test)

plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)
plt.xticks(())
plt.yticks(())

plt.show()

If I check the shape I get

In [6]: X_test.shape
Out[6]: (365, 12)

In [7]: y_test.shape
Out[7]: (365,)

which is clearly not the same. You need one dimension for both X_test and y_test. I guess you want to choose one column, like this:

plt.scatter(X_test[X_test.columns[0]], y_test, color="black")



Related Topics



Leave a reply



Submit