Naive Bayes Classification (Binary )- Supervised Learning

DrRakha

This example is a good starting point for using the machine learning concept on a classification problem. In the code snippet below, we apply the supervised learning concept with the naive Bayes classifier. The naive Baye classifier is formulated around the Bayes theorem and conditional probability basics. The dataset that is used in the example is the Breast Cancer Dataset. We load this dataset using sklearn package function load_breast_cancer(). That dataset has records for 569 patients and 30 features regarding the images collected using the Needle Tip in Area of Concern. Some features are radius, texture, perimeter, area, smoothness, and compactness. To keep the simplicity level of this example, we pick only the first two features. The target of this data is two classes binary (Malignant, Benign). The dataset is split into training and testing sets to validate the trained classified on a 50% ratio. The size of training and testing is 284 patients each. We measure the outcome of the validation process using performance measures such as precision, recall, and f-measure.

Code:

# https://jupyter.org/try
# Demo2
# M. S. Rakha, Ph.D.
# Post-Doctoral - Queen's University 
# Supervised Learning - Naive Bayes Classification
%matplotlib inline
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.preprocessing import scale
import sklearn.metrics as sm
from sklearn.metrics import confusion_matrix,classification_report
from sklearn.model_selection import train_test_split

np.random.seed(5)
breastCancer = datasets.load_breast_cancer()

list(breastCancer.target_names)

#Only two features
X = breastCancer.data[:, 0:2]
y = breastCancer.target


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.50, random_state=42)
X_train[:,0].size
X_train[:,0].size

varriableNames= breastCancer.feature_names
 

from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()
nb.fit(X_train, y_train);

y_pred = nb.predict(X_test)


from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

This code is using a machine learning technique called Naive Bayes classification to analyze breast cancer data from the sklearn library. The data is split into training and testing sets, and a Naive Bayes model is trained on the training data. The model is then used to make predictions on the test data, and the performance of the model is evaluated using the classification_report method from the sklearn library which prints out metrics such as precision, recall, f1-score, and support for each label. However, it's important to note that using only two features from the breast cancer dataset for this task is not sufficient and will not yield good results. Using Naive Bayes for a binary classification task such as breast cancer diagnosis could be a good choice if the dataset is small and the relationship between the features and the target variable is simple. Naive Bayes is a probabilistic algorithm that makes assumptions about the features' independence, making it computationally efficient and easy to implement. However, the Naive Bayes algorithm often struggles when the dataset is large, the features are highly correlated, or the relationship between the features and the target variable is complex. In this case, using Naive Bayes on a dataset with only two features will not be a good choice as it doesn't give enough insight into the data and might be inaccurate.

Below is the results of running this python code on Jupyter notebook:

Code:

              precision    recall  f1-score   support

           0       0.93      0.76      0.83        98
           1       0.88      0.97      0.92       187
         accuracy             0.89       285
         macro avg           0.90      0.86      0.88       285
         weighted avg       0.90      0.89      0.89       285

Naive Bayes is well-suited for binary classification problems, a type of supervised learning that aims to predict one of two possible outcomes. The Naive Bayes algorithm uses probability estimates to make predictions, and it is particularly useful when the dataset is small or when the number of features is large. Because it makes the assumption of feature independence, it can handle a large number of features more efficiently than other algorithms. However, it is important to note that when the features are highly correlated, or the relationship between the features and the target variable is complex, the Naive Bayes algorithm may not perform as well, and other algorithms should be considered.

Java

C/C++

PHP

C#

HTML

CSS

ASP

Javascript

JQuery

AJAX

XSD

Python

Matlab

R Scripts

Weka

Naive Bayes Classification (Binary )- Supervised Learning