Programming Experiments: 2017

A Simple Template for Machine Learning in Python

The following shows a simple flow to do machine learning in Python:

Load dataset
Split the dataset into train and test subsets
Create a classifier for classification task
Fit the train dataset
Predict the test labels using test dataset
Find out the accuracy

from sklearn import datasets
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier


def train():
    # Load your data set, e.g. the sklearn digits dataset
    digits = datasets.load_digits()

    # Split the data set into random train and test subsets
    features_train, features_test, labels_train, labels_test = \
        train_test_split(digits.data, digits.target, test_size=0.3, random_state=42)

    # Create a classifier, e.g. a DecisionTree classifier
    classifier = DecisionTreeClassifier(random_state=11)

    # Fit the train dataset in the classifier
    classifier.fit(features_train, labels_train)

    # Use the trained model to make predictions against the test dataset
    predictions = classifier.predict(features_test)

    # Calculate the prediction accuracy
    f1_score = metrics.f1_score(labels_test, predictions, average="macro")
    accuracy = metrics.accuracy_score(labels_test, predictions)

    print "F1 score = ", f1_score
    print "Accuracy = ", accuracy

Monday, April 17, 2017

A Simple Template for Machine Learning in Python