Machine Learning Model for Brain Tumor Analysis

Integrating an A.I.-assisted tool into a provider’s workflow will reduce the analysis time and mitigate misdiagnoses.

Gaurav Nukala
Product Coalition
Published in
9 min readMar 4, 2023

--

Summary

  • Huge imbalance between the number of patients and the number of neurosurgeons burdening the providers.
  • Large diagnostic data volumes can hinder neurosurgeons in precisely identifying tumors and their segmentation, leading to unintended consequences if misdiagnosed.
  • The traditional method involves sending the tissue sample to a lab, freezing, staining, and microscopically examining it, taking 20–30+ minutes¹.
  • AI-assisted analysis of laser-generated brain tissue images can shorten the tissue analysis process to 2–3 minutes.
  • A trained ML model achieved an 84% recall rate, the most critical metric.

Background

Brain tumor

A brain tumor is an abnormal growth or mass of tissue in the brain that can be either benign (noncancerous) or malignant (cancerous). Brain tumors can be primary, meaning they originate in the brain, or secondary, meaning they spread from other parts of the body.

In the image below, the dark ovals are tumor cells, among nerve fibers that appear as white streaks, indicating a malignant tumor called a diffuse glioma.

Brain tumor Glioma
Glioma tumor cells among nerve fibers (image source: https://www.nytimes.com/2020/01/06/health/artificial-intelligence-brain-cancer.html)

Different primary brain tumor types get their name from the kind of cells involved. The main types of brain tumors are:

  • Gliomas
  • Meningiomas
  • Pitutiary
  • Neuromas

According to the American Association of Neurological Surgeons, tumors have different potencies. Not all are malignant. In the table below, Gliomas are malignant tumors, while Meningiomas and Pitutiary are benign.

Examples of few tumors below —

Glioma tumor
Meningiomas

Current challenges

Shortage of Neurosurgeons

Currently, there are approximately 3,689 neurosurgeons who are practicing and board-certified in over 5,700 hospitals in the United States. They are responsible for serving a population of more than 311 million people. However, as the population ages and more individuals encounter neurological issues such as stroke, degenerative spine disease, Parkinson’s disease, and other movement disorders, the existing gap between supply and demand for neurosurgical services will become even more pronounced.

Slow tissue analysis process

Tissue analysis is when a neurosurgeon examines a sample of tissue taken from the brain or nervous system during surgery. The tissue analysis helps diagnose the underlying condition or disease that may affect the patient’s brain or nervous system.

The conventional technique, which involves shipping the tissue to a laboratory, freezing and staining it, and subsequently examining it under a microscope, typically requires 20 to 30 minutes or more.

Confirmation bias and oversight

Modern diagnostic techniques generate large volumes of data, making it harder for a human to accurately diagnose the presence tumor and the associated segmentation (location, extent).

Neurosurgeons sometimes misdiagnose because of confirmation bias leading to unintended consequences. Also, the doctors using tradional techniques can miss important details such as spread of a tumor along nerve fibers.

How can A.I., help with the current challenges?

A.I. can help with the current challenges in the following ways —

  • At a crunch time in the operating room, an A.I. engine can deliver a timely diagnosis, potentially saving lives.
  • A.I. can provide the objectivity that a doctor needs, i.e., it would identify misdiagnoses.
  • A.I. can help doctors prioritize cases.

But, A.I. is not a replacement for a doctor but lends itself as an assistant. The cost of misclassifying an image is high, i.e., false positives and false negatives.

  • What if a patient has a tumor, but the algorithm classifies it as “no tumor”. The cost is missing early detection and could be fatal.
  • What if a patient has no tumor, but the algorithm classifies it as a “tumor”. The cost is the emotional pain to the patient.

As noted above, A.I. is not a replacement for a neurosurgeon. Below is how I envision, a A.I. tool will fit into a provider’s workflow.

A.I. assisted clinical workflow
A.I. fitting into a provider’s clinical workflow

Before I get into the ML model — I wanted to provide a primer on understanding the performance of an ML model. If you are unfamiliar with the terms confusion matrix, recall, and sensitivity, refer to the next section.

A primer on confusion matrix

What is confusion matrix and why is it important for understanding the model performance?

A confusion matrix is an essential tool for evaluating the performance of a classification model because it provides a more detailed view of the model’s performance than simple accuracy measures. It summarizes the number of correct and incorrect predictions made by the model on a test data set.

The confusion matrix has four cells, each representing a possible outcome of a binary classification problem:

  • True Positive (TP): The model correctly predicted the positive class.
  • False Positive (FP): The model predicted the positive class, but it was negative.
  • False Negative (FN): The model predicted the negative class, but it was positive.
  • True Negative (TN): The model correctly predicted the negative class.

The confusion matrix can be represented as follows:

There are two important ratios when evaluating performance of a model: precision, recall.

Precision answers the question — What proportion of positive identifications was actually correct?

Recall answers the question — What proportion of actual positives was identified correctly?

In the case of brain tumor, there are two potential scenarios —

  • What if a patient has a tumor, but the algorithm classifies it as “no tumor”. The cost is missing early detection and could be fatal.
  • What if a patient has no tumor, but the algorithm classifies it as a “tumor”. The cost is the emotional pain to the patient.

To fully evaluate the effectiveness of a model, you must examine both precision and recall. Unfortunately, precision and recall are often in tension. That is, improving precision typically reduces recall and vice versa. But, in the case of tumor detection, a model with a high recall rate for malignant tumors is more critical than one with high precision.

Convolutional Neural Networks (CNN) model

Goal: Develop a scalable CNN model from the sample dataset to identify Glioma with a high recall rate. Gliomas make up about 74% of malignant tumors.

Dataset: I was fortunate to have a dataset from an MIT course I took. The dataset consists of 2881 train and 402 test grayscale images taken from the MRI scans. These images are of the following categories-

  • Glioma tumor — A tumor that occurs in the brain and spinal cord.
  • Meningioma tumor — A tumor that arises from the membranes surrounding the brain and spinal cord.
  • No tumor — There is no tumor in the brain.
  • Pituitary tumor — Tumor in the pituitary gland that doesn’t spread beyond the skull.

Metrics for evaluation:

  • Cross entropy loss and Accuracy
  • Recall rate for Glioma

Exploratory data analysis: Below are few key findings from my data exploration —

  • Two sets of datasets are available — training, and test — four data subsets for each tumor class — Glioma, Meningioma, Pituitary, No tumor.
  • The training set is imbalanced, i.e., the number of images in the no tumor category is about 48.6% lower than those in any of the other types.
  • The aspect ratio of the original images is 1 (512 x 512)
  • Each image has three channels but upon a visual inspection, a grayscale would work.
  • A visual inspection of images reveals that scans are not uniform in terms of cross-section, likely because tumor occurs in different parts of the brain. In addition, images are not further labeled on the planes — Axial, Coronal, Sagittal.

Pre-processing: Below are few key steps in pre-processing the data —

  • Using the cv2 package and reading the image as an array in grayscale.
  • Downsizing the resolution of an image from 512x512 to 150x150, thereby, retaining the aspect ratio.
  • Converting the target labels into 4 categories using one-hot encoding.
  • Normalizing the pixel values so that the convergence is faster. Early splitting of the training data to a validation set.

Sample code for one-hot encoding is embedded below:

# creating one-hot encoded representation of target labels
# we can do this by using this utility function - https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical

y_train_encoded = keras.utils.to_categorical(y_train,4)
y_val_encoded = keras.utils.to_categorical(y_val,4)
y_test_encoded = keras.utils.to_categorical(y_test,4)

Modeling Approach: Below were few key steps for creating the model:

  • Run a Convolutional Neural Network (CNN) on the training data and evaluate the model against the metrics.
  • Augment the model using a different number of conv. layers, drop-out ratios, batch sizes, and epochs.
  • Explore Keras Tuner to find the correct model parameters. Combine provided train and test data, split, and model.

Sample code for a CNN model with 16 layers with leaky ReLu as an activation function —

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Activation, Dropout, SpatialDropout2D
from keras.layers.advanced_activations import LeakyReLU
from sklearn.metrics import classification_report, confusion_matrix
import itertools
import matplotlib.pyplot as plt

#### creating a baseline model without any regularization
# initialized a sequential model
model_1 = Sequential()

# adding first conv layer with 16 filters and with kernel size 3, padding same provided the output size same as the input size and input_shape denotes input image dimension of CIFAR
# images
model_1.add(Conv2D(filters=16, kernel_size=3, padding="same", input_shape=(IMG_SIZE, IMG_SIZE, 1)))

# adding leaky relu activation function with negative slope of 0.1
model_1.add(LeakyReLU(0.1))

# adding second conv layer with 32 filters and with kernel size 3
model_1.add(Conv2D(filters=32, kernel_size=3, padding='same'))

# adding leaky relu activation function with negative slope of 0.1
model_1.add(LeakyReLU(0.1))

# adding second conv layer with 32 filters and with kernel size 3
#model_1.add(Conv2D(filters=64, kernel_size=3, padding='same'))

# adding max pooling to reduce the size of output of second conv layer
model_1.add(MaxPooling2D(pool_size=2))

# flattening the 3-d output of fourth conv layer after max pooling to make it ready for creating dense connections with the output layer for predictions
model_1.add(Flatten())

# adding a fully connected dense layer with 256 neurons
model_1.add(Dense(256))

# adding leaky relu activation function with negative slope of 0.1
model_1.add(LeakyReLU(0.1))

# adding the output layer with 10 neurons and activation functions as softmax since this is a multi class classification problem
model_1.add(Dense(4, activation='softmax'))

Sample code to fit the model to the validation set —

history_1 = model_1.fit(
Xcom_train, ycom_train_encoded,
batch_size=32,
epochs=20,
validation_data=(Xcom_val, ycom_val_encoded),
shuffle=True,
verbose=2
)

Sample code to calculate the accuracy of the model —

ycom_pred_test = model_1.predict(Xcom_test)
ycom_pred_test_classes = np.argmax(ycom_pred_test, axis=1)
ycom_pred_test_max_probas = np.max(ycom_pred_test, axis=1)
accuracy_score(ycom_test, ycom_pred_test_classes)

Sample code to calculate the confusion matrix:

def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')

print(cm)

plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)

fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, format(cm[i, j], fmt),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')

Sample code to calculate the precison and recall rates —

cnf_matrix = confusion_matrix(ycom_test,ycom_pred_test_classes)
np.set_printoptions(precision=2)

# Plot non-normalized confusion matrix
#plt.figure(figsize=(16, 8))
#plot_confusion_matrix(cnf_matrix, classes=CATEGORIES,title='Confusion matrix')
recall = np.diag(cnf_matrix) / np.sum(cnf_matrix, axis = 1)
precision = np.diag(cnf_matrix) / np.sum(cnf_matrix, axis = 0)
print(recall)
print(precision)

Performance of the model:

After several attempts to tune the model based on the goals I have set forth for the model, I finally achieved an accuracy of 82% and a recall rate of 84% for Giloma tumor (the most malignant tumor).

Model performance

Conclusion

  • As the original goal mentions, A.I. was not meant to replace a trained provider. The recommendation is to embed the A.I. into a Neurosurgeon’s workflow to improve tissue analysis turnaround time and mitigate misdiagnoses.
  • The model accuracy and recall rates will improve providers who will provide feedback to the A.I. engine, improving the quality and quantity of the test data.
  • The dataset could have been better. A visual inspection of images reveals that scans are not uniform regarding cross-section, likely because tumor occurs in different parts of the brain. Images are not further labeled on the planes — Axial, Coronal, Sagittal.

I write about product management, healthcare, decision-making, investing, and startups. Please follow me on Medium, LinkedIn, or Twitter.

--

--

Product executive; Built products at Apple and 3 unicorns; Follow me to hear my thoughts on product, healthcare, AI/ML, startups