Save Classifier to Disk in Scikit-Learn

Save classifier to disk in scikit-learn

Classifiers are just objects that can be pickled and dumped like any other. To continue your example:

import cPickle
# save the classifier
with open('my_dumped_classifier.pkl', 'wb') as fid:
    cPickle.dump(gnb, fid)    

# load it again
with open('my_dumped_classifier.pkl', 'rb') as fid:
    gnb_loaded = cPickle.load(fid)

Edit: if you are using a sklearn Pipeline in which you have custom transformers that cannot be serialized by pickle (nor by joblib), then using Neuraxle's custom ML Pipeline saving is a solution where you can define your own custom step savers on a per-step basis. The savers are called for each step if defined upon saving, and otherwise joblib is used as default for steps without a saver.

How to save classifier in sklearn with Countvectorizer() and TfidfTransformer()

With MaximeKan's suggestion, I researched a way to save all 3.

saving the model and the vectorizers

import pickle

with open(filename, 'wb') as fout:
    pickle.dump((movieVzer, movieTfmer, clf), fout)

loading the model and the vectorizers for use

import pickle

with open('finalized_model.pkl', 'rb') as f:
    movieVzer, movieTfmer, clf = pickle.load(f)

Save scikit-learn model without datasets

The persistent representation of Scikit-Learn estimators DOES NOT include any training data.

Speaking about decision trees and their ensembles (such as random forests), then the size of the estimator object scales quadratically to the depth of decision trees (ie. the max_depth parameter). This is so, because decision tree configuration is represented using (max_depth, max_depth) matrices (float64 data type).

You can make your random forest objects smaller by limiting the max_depth parameter. If you're worried about potential loss of predictive performance, you may increase the number of child estimators.

Longer term, you may wish to explore alternative representations for Scikit-Learn models. For example, converting them to PMML data format using the SkLearn2PMML package.

Export sklearn classifier to reference it in other scripts

This is the code snipet will work for you:

import pickle
# save the model to disk
filename = 'finalized_model.sav'
pickle.dump(clf, open(filename, 'wb'))

# some time later...

# load the model from disk
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, Y_test)
print(result)

from this source.

your question has duplicate.

How to use pickle to save sklearn model

Save:

import pickle

with open("model.pkl", "wb") as f:
    pickle.dump(model, f)

Load:

with open("model.pkl", "rb") as f:
    model = pickle.load(f)

In the specific case of scikit-learn, it may be better to use joblib’s
replacement of pickle (dump & load), which is more efficient on
objects that carry large numpy arrays internally as is often the case
for fitted scikit-learn estimators:

Save:

import joblib

joblib.dump(model, "model.joblib")

Load:

model = joblib.load("model.joblib")

Save Classifier to Disk in Scikit-Learn