Monday, April 17, 2017

A Simple Template for Machine Learning  in Python


The following shows a simple flow to do machine learning in Python:

  1. Load dataset
  2. Split the dataset into train and test subsets
  3. Create a classifier for classification task
  4. Fit the train dataset
  5. Predict the test labels using test dataset
  6. Find out the accuracy

from sklearn import datasets
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier


def train():
    # Load your data set, e.g. the sklearn digits dataset
    digits = datasets.load_digits()

    # Split the data set into random train and test subsets
    features_train, features_test, labels_train, labels_test = \
        train_test_split(digits.data, digits.target, test_size=0.3, random_state=42)

    # Create a classifier, e.g. a DecisionTree classifier
    classifier = DecisionTreeClassifier(random_state=11)

    # Fit the train dataset in the classifier
    classifier.fit(features_train, labels_train)

    # Use the trained model to make predictions against the test dataset
    predictions = classifier.predict(features_test)

    # Calculate the prediction accuracy
    f1_score = metrics.f1_score(labels_test, predictions, average="macro")
    accuracy = metrics.accuracy_score(labels_test, predictions)

    print "F1 score = ", f1_score
    print "Accuracy = ", accuracy