How to Plot Predicted Values VS the True Value

How to plot predicted values vs the true value?

The problem is that the range of your values span from about 0 to 60.000.
I would suggest two options:
Either you convert both axis to a log-scale

g=plt.scatter(y_test1, y_pred_test_Forestreg)
g.axes.set_xlabel('True Values ')
g.axes.set_ylabel('Predictions ')

Or, even better, Plot the difference between the true and predicted values (i.e. the prediction errors).

g=plt.plot(y_test1 - y_pred_test_Forestreg,marker='o',linestyle='')

How to plot a graph of actual vs predict values in python?

The problem you seem to have is that you mix y_test and y_pred into one "plot" (meaning here the scatter() function)

Using scatter() or plot() function (which you also mixed up), the first parameter are the coordinates on the x-axis and the second parameter are the coordinates on the y-axis.

So 1.) you need to one scatter() with only y_test and then one with only y_pred. To do this you 2.) need either to have 2D data, or as it seems to be in your case, just use indexes for the x-axis by using the range() functionality.

Here is some code with random data, that might get you started:

import matplotlib.pyplot as plt
import numpy as np

def plotGraph(y_test,y_pred,regressorName):
if max(y_test) >= max(y_pred):
my_range = int(max(y_test))
my_range = int(max(y_pred))
plt.scatter(range(len(y_test)), y_test, color='blue')
plt.scatter(range(len(y_pred)), y_pred, color='red')

y_test = range(10)
y_pred = np.random.randint(0, 10, 10)

plotGraph(y_test, y_pred, "test")

This will give you something like this:

Sample Image

How to plot the predicted value against all features of a dataset

  • Each feature must be plotted separately.
  • Remember that 'price' is the target, the dependant variable, and that lin_reg.predict(xtrain) is the predicted price from the training data.
# predicted price from xtrain
ypred_train = lin_reg.predict(xtrain)

# create the figure
fig, axes = plt.subplots(ncols=4, nrows=4, figsize=(20, 20))

# flatten the axes to make it easier to index
axes = axes.flatten()

# iterate through the column values, and use i to index the axes
for i, v in enumerate(xtrain.columns):

# seclect the column to be plotted
data = xtrain[v]

# plot the actual price against the features
axes[i].scatter(x=data, y=ytrain, s=35, ec='white', label='actual')

# plot predicted prices against the features
axes[i].scatter(x=data, y=ypred_train, c='pink', s=20, ec='white', alpha=0.5, label='predicted')

# set the title and ylabel
axes[i].set(title=f'Feature: {v}', ylabel='price')

# set a single legend
axes[12].legend(title='Price', bbox_to_anchor=(1.05, 1), loc='upper left')

# delete the last 3 unused axes
for v in range(13, 16):

Python Sample Image 2

  • If you were to plot everything into a single plot, it would be overcrowded and useless

Sample Image

  • You can also plot all the data with seaborn.relplot by melting df1 from a wide to long format.
    • However, it's more difficult to add the predicted values on top of a figure-level plot.
import seaborn as sns

dfm = df1.melt(id_vars='price', value_vars=df1.columns[:-1], var_name='Feature')

p = sns.relplot(kind='scatter', data=dfm, x='value', y='price', height=3,
col='Feature', col_wrap=4, facet_kws={'sharex': False})

Python Sample Image 3

Related Topics

Leave a reply