How to convert a Scikit-learn dataset to a Pandas dataset
Manually, you can use pd.DataFrame
constructor, giving a numpy array (data
) and a list of the names of the columns (columns
).
To have everything in one DataFrame, you can concatenate the features and the target into one numpy array with np.c_[...]
(note the []
):
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
# save load_iris() sklearn dataset to iris
# if you'd like to check dataset type use: type(load_iris())
# if you'd like to view list of attributes use: dir(load_iris())
iris = load_iris()
# np.c_ is the numpy concatenate function
# which is used to concat iris['data'] and iris['target'] arrays
# for pandas column argument: concat iris['feature_names'] list
# and string list (in this case one string); you can make this anything you'd like..
# the original dataset would probably call this ['Species']
data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
Convert sklearn diabetes dataset into pandas DataFrame
From the sklearn website. You can pass as_frame to specify a pandas dataframe.
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html
data = load_diabetes(as_frame=True)
how to convert generated data into pandas dataframe
The first entry of the tuple contains the feature data and the the second entry contains the class labels. So if you want to make a pd.dataframe
of the feature data you should use pd.DataFrame(df[0], columns=["1","2","3","4","5","6","7","8","9"])
.
How do I create test and train samples from one dataframe with pandas?
I would just use numpy's randn
:
In [11]: df = pd.DataFrame(np.random.randn(100, 2))
In [12]: msk = np.random.rand(len(df)) < 0.8
In [13]: train = df[msk]
In [14]: test = df[~msk]
And just to see this has worked:
In [15]: len(test)
Out[15]: 21
In [16]: len(train)
Out[16]: 79
Related Topics
Python - Using the Multiply Operator to Create Copies of Objects in Lists
Insert an Item into Sorted List in Python
How to Install Xgboost Package in Python (Windows Platform)
Check If Value Already Exists Within List of Dictionaries
Remove All Line Breaks from a Long String of Text
Removing Control Characters from a String in Python
How to Create a Numpy Array of Arbitrary Length Strings
How to Efficiently Process a Numpy Array in Blocks Similar to Matlab's Blkproc (Blockproc) Function
Return SQL Table as JSON in Python
How to Select Rows with One or More Nulls from a Pandas Dataframe Without Listing Columns Explicitly
How to Sort a Pandas Dataframe by Index
How to Return a String from a Regex Match in Python
Python Read File as Stream from Hdfs
Can a Decorator of an Instance Method Access the Class
How to Copy Inmemoryuploadedfile Object to Disk
Format String Unused Named Arguments
How to Read Contents of an Table in Ms-Word File Using Python