﻿ How to Plot Predicted Values VS the True Value - ITCodar

# How to Plot Predicted Values VS the True Value

## How to plot predicted values vs the true value?

The problem is that the range of your values span from about 0 to 60.000.
I would suggest two options:
Either you convert both axis to a log-scale

``g=plt.scatter(y_test1, y_pred_test_Forestreg)g.axes.set_yscale('log')g.axes.set_xscale('log')g.axes.set_xlabel('True Values ')g.axes.set_ylabel('Predictions ')g.axes.axis('equal')g.axes.axis('square')``

Or, even better, Plot the difference between the true and predicted values (i.e. the prediction errors).

``g=plt.plot(y_test1 - y_pred_test_Forestreg,marker='o',linestyle='')``

## How to plot a graph of actual vs predict values in python?

The problem you seem to have is that you mix `y_test` and `y_pred` into one "plot" (meaning here the `scatter()` function)

Using `scatter()` or `plot()` function (which you also mixed up), the first parameter are the coordinates on the x-axis and the second parameter are the coordinates on the y-axis.

So 1.) you need to one `scatter()` with only `y_test` and then one with only `y_pred`. To do this you 2.) need either to have 2D data, or as it seems to be in your case, just use indexes for the x-axis by using the `range()` functionality.

Here is some code with random data, that might get you started:

``import matplotlib.pyplot as pltimport numpy as npdef plotGraph(y_test,y_pred,regressorName):    if max(y_test) >= max(y_pred):        my_range = int(max(y_test))    else:        my_range = int(max(y_pred))    plt.scatter(range(len(y_test)), y_test, color='blue')    plt.scatter(range(len(y_pred)), y_pred, color='red')    plt.title(regressorName)    plt.show()    returny_test = range(10)y_pred = np.random.randint(0, 10, 10)plotGraph(y_test, y_pred, "test")``

This will give you something like this:

## How to plot the predicted value against all features of a dataset

• Each feature must be plotted separately.
• Remember that `'price'` is the target, the dependant variable, and that `lin_reg.predict(xtrain)` is the predicted price from the training data.
``# predicted price from xtrainypred_train = lin_reg.predict(xtrain)# create the figurefig, axes = plt.subplots(ncols=4, nrows=4, figsize=(20, 20))# flatten the axes to make it easier to indexaxes = axes.flatten()# iterate through the column values, and use i to index the axesfor i, v in enumerate(xtrain.columns):        # seclect the column to be plotted    data = xtrain[v]        # plot the actual price against the features    axes[i].scatter(x=data, y=ytrain, s=35, ec='white', label='actual')        # plot predicted prices against the features    axes[i].scatter(x=data, y=ypred_train, c='pink', s=20, ec='white', alpha=0.5, label='predicted')    # set the title and ylabel    axes[i].set(title=f'Feature: {v}', ylabel='price')# set a single legendaxes[12].legend(title='Price', bbox_to_anchor=(1.05, 1), loc='upper left')# delete the last 3 unused axesfor v in range(13, 16):    fig.delaxes(axes[v])``

• If you were to plot everything into a single plot, it would be overcrowded and useless

• You can also plot all the data with `seaborn.relplot` by melting `df1` from a wide to long format.
• However, it's more difficult to add the predicted values on top of a figure-level plot.
``import seaborn as snsdfm = df1.melt(id_vars='price', value_vars=df1.columns[:-1], var_name='Feature')p = sns.relplot(kind='scatter', data=dfm, x='value', y='price', height=3,                col='Feature', col_wrap=4, facet_kws={'sharex': False})``