NIH X-Ray Convolutional Neural Network Classification - Week Two Project Code

Course: DS 4002 - Data Science Project Course

Authors: Navya Annapareddy, Tony Albini, Kunaal Sarnaik, & Jaya Vellayan

Professor: Brian Wright

Date: January 15th, 2021

TABLE OF CONTENTS

I. Introduction

The following section serves as a general introduction to the topic that will be discussed throughout this research analysis — Chest X-Ray Diagnostic Image Classification. See the sub-sections embedded below for futher information: Motivation, Background, General Research Question, & Relevant Research.

a) Motivation

According to the Mayo Clinic, Chest X-Rays are one of the most common diagnostic tools utilized in the clinical setting; their usage rates fall just behind those of the electrocardiogram and bedside ultrasound.$^{1}$ Due to its strong ability to reveal many things inside of a patient's body, a Chest X-Ray is often among the first procedures a physician orders if heart or lung problems are suspected in said patient. Furthermore, due to the numerous, severe respiratory and cardiovascular complications of COVID-19, the use of Chest X-Rays within public and private hospitals is only expected to increase greatly within the coming years.$^{2}$ However, an increase in the number of such procedures disproportionately affects physicians specializing in an already arduous medical field — radiology.

According to a study conducted by radiologists Bruls & Kwee at the Zuyderland Medical Center in the Netherlands, the workload for radiologists during on-call hours has increased dramatically in the past 15 years on the global scale.$^{3}$ In their research manuscript, Bruls & Kwee call for the radiologist and technician workforce to be increased so that not only potential burn-out can be avoided, but also quality and safety of radiological care can be maintained. In the United States alone, the workload of radiologists has increased over the past two decades at an alarming rate.$^{4}$ Furthermore, previous studies have shown that the number of errors attributed to this heavy workload have similarly increased.$^{5-6}$ For patients, these trends are extremely concerning for two primary reasons: 1) a potential health risk going unnoticed, and 2) exposure to dangerous radiological substances.

According to the Radiological Society of North America, most missed radiologic diagnoses are attributable to image interpretation errors by radiologists.$^{7}$ In study conducted by RSNA researchers Bruno et al., it was found that the rate of missed, incorrect, or delayed diagnoses in the United States is estimated to be as high as 10-15%. An especially frightening case that these researchers found was that of a 4-year old boy who had swallowed a coin which was lodged within his esophagus. According to Bruno et al., a skilled pediatric radiologist missed this clear diagnostic indicator twice, leaving any mention of a coin out of their clinical image interpretation. Similar errors across the country and world likely result in many diseases and medical problems going undiagnosed, making them all the more harder to treat as they progress to their later stages.

However, the complications resulting from radiologic image interpretation errors to not stop there; in addition to increasing the severity of preexisting medical conditions due to a lack of diagnosis, a radiologic error often leads to more imaging down the road for a patient.$^{5}$ As such, one instance of radiologic exposure can translate into multiple. Although the radioactive exposure itself is relatively small with a Chest X-Ray, it can still be extremely concerning for a patient who needs multiple due to suffering from a chronic disease such as Chronic Obstructive Pulmonary Disorder (COPD).$^{8}$

Given both the causes and complications of radiologist image interpretation errors, what if there was a way to better screen for medical images such as those resulting from a Chest X-Ray? More specifically, can deep learning and convolutional neural networks (CNNs) help with such a specific application? The following research project will leverage machine learning techniques and state-of-the-art CNN architectures to analyze this proposition. A dataset of several Chest X-Ray images will be utilized, exploratory data analysis and feature engineering will be conducted, and a finalized CNN model will be constructed to classify diagnostic Chest X-Ray Images for disease. In the end, the model can potentially be utilized to serve as a preliminary screening tool in the clinical setting to aid radiologists in their imaging interpretation. As such, the overarching goal of this project is to contribute to the innovative research concentrated in medical image classification to increase clinical workflow efficiency, reduce both physician workload and error alike, and most importantly, improve patient outcomes.

b) Background

i) Chest X-Rays

Chest X-Rays utilize very small doses of ionizing radiation to produce pictures of the thoracic cavity. Commonly utilized to evaluate the lungs, heart, and chest wall, chest X-rays serve as important tools to diagnose symptoms such as shortness of breath, persistent cough, fever, chest pain, or traumatic injury.$^{8}$ Furthermore, they can also be used to monitor chronic diseases and disorders such as pneumonia, emphysema, and cancer. Listed below is a comprehensive overview of what the Chest X-Ray can reveal about any given patient's body$^{1}$:

Lung Condition - Detecting cancer, infection, chronic conditions, complications, or air collecting around a lung potentially causing collapse.

Heart-Related Lung Problems - Detecting changes or problems in the lungs that stem from preexisting heart problems.

Heart Size and Outline - Detecting changes in size and shape of the heart, which may additionally be a sign of heart failure, excess fluid, or heart valve abnormalities.

Blood Vessels - Detecting aneurysms, congenital heart disease, or other problems with the aorta, pulmonary artery, coronary artery, and vena cava.

Calcium Deposits - Detecting presence of calcium which may correspond to excess fat in blood vessels, damage to heart valves, coronary artery abnormalities, or anatomical severities with respect to the heart and its protective sac.

Fractures - Detecting rib or spine fractures, as well as other problems with bones (cancer, osteomyelitis, etc.).

Postoperative Changes - Detect any problems that emerge from a surgery, most common being intubation complications to the esophagus.

Pacemakers, Defibrillators, & Catheters - Detect any problems with the placement of these devices.

ii) Convolutional Neural Networks (CNNs)

The Convolutional Neural Networks (CNNs) are a class of artificial neural networks that have become especially popular in computer vision tasks.$^{9}$ The architecture of a CNN, consisting of various layers, is designed to take advantage of the two-dimensional structure of an input image.$^{10}$ This specific function is achieved with local connections and tied weights followed by some sort of pooling which results in translation invariant features between layers. An additional benefit of their many layers is ease of training secondary to the low number of parameters required as compared to fully connected networks.

The architecture of a CNN, as mentioned previously, consists of one or more layers. In general, three main types of layers are utilized in building any given CNN:

1. Convolutional Layers

The convolutional layer is always the first layer of a CNN. The input to this layer is a tensor which holds the raw pixel values of the image that is intended on being classified.$^{11}$ The shape of this input tensor is as follows: (# of images, image height, image width, input channels). The number of input channels is generally 3, corresponding to the RGB values of any given pixel. Essentially, the convolutional layer of a CNN takes this input tensor and uses filters (also called neurons or kernels) to convolve the image (via element-wise multiplication) pixel values and sum them up within an area of the inputted image (referred to as the neuron's receptive field), leaving one value. Depending on the number of filters utilized in the convolutional layer, the number of unique locations on the inputted image increase, which allows for a feature map. In the end, whatever input shape the original image tensor was in can be represented in fewer parameters, which is the hallmark function of a CNN.

2. Fully Connected (Dense) Layers

The next main type of layer in a CNN is the fully conected layer, sometimes referred to as the dense layer. This layer essentially takes an input volume and outputs an N dimensional vector where N is the number of classes that the program has to choose from. Each number in this N dimensional vector represents the probability of a certain class. For example, if one wanted to classify images as red, blue, or green, the output of a dense layer may be something along the lines of [0.2 0.5 0.3], indicating that the model finds there is a 20% chance of the image being red, 50% chance of it being blue, and 30% chance of it being green. The fully connected layer does this by looking at the output of the previous layer, which represents activation maps of high level features in the image, and determining which features most correlate to a particular class.

3. Pooling Layers

The third and final main type of CNN layer is the pooling layer. The pooling layers operate upon each feature map taken from their input to create a new set of the same number of pooled feature maps.$^{12}$ Essentially, what this does is filters the feature maps by value of importance, and this is performed with two common functions: 1) Average Pooling, and 2) Maximum Pooling. Average Pooling calculates the average value for each patch on the feature map whlie Max Pooling calculates the maximum value for each patch on the feature map. The end result of a pooling layer is a downsampled, summarized version of the features detected in the input image. This is what adds the model's invariance to local translation.

The above layers outline the general components comprising the architecture of any given CNN, yet several variants of these and additional layers also exist and are utilized widely for image classification. However, after the CNN architecture is laid out with respect to layer magnitude and ordering, the model still must be trained; CNNs achieve this training through backpropagation.$^{11}$ This training process is composed of four distinct sections:

1. The Forward Pass

This is the training phase in which an image is taken and passed through the entire CNN's layer hierarchy. The output of this first image is likely equal with respect to the number of classes that are being used to classify the image, with no preference given to any class. However, once the CNN finds the associated label that the training image is supposed to be classified as, it backpropagates through use of the loss function.

2. The Loss Function

To get to the stage of the CNN-predicted layer being the same as the intended training label for any given image, the loss of any given training image, as calculated by the loss function, must be zero. Essentially, this is an optimization problem in calculus, in which the loss function is utilized to find out which inputs (pixel weights in the feature map) most directly contributed to the loss (error) of the network. Common loss functions include MSE (Mean Squared Error), Hinge Loss, and Cross-Entropy, the lattermost of which will be utilized later in this notebook.

3. The Backward Pass

Since we have the loss, as calculated by any one of the loss functions listed previously, a backward pass is now performed through the CNN to find out which weights contributed most to the loss. Using these weight contributions, the CNN needs to adjust the features such that the loss decreases. This is done by calculating the change in loss with respect to change in weight.

4. Weight Update

Now that the weight importance of each feature in the CNNs layers is found with respect to the loss, the filter weights are updated such that the loss decreases. The learning rate is crucial to this phase, as it is essentially the length of steps taken in each weight update. Lower learning rates may take a long time to converge to minimum loss while larger learning rates may result in jumps that are too large and not precise enough to reach the optimal point quickly.

c) General Research Question

An in-depth analysis utilizing data science and machine learning principles in the context of medical Chest X-Ray Image Classification will be conducted in an attempt to address the following research question:

How do class imbalances affect the performance of a state-of-the-art Convolutional Neural Network model in classifying Chest X-Ray Images?

Using this queston as a guide, hypotheses will be developed and a comprehensive analysis that leverages state-of-the-art CNN principles will be conducted. A dataset containing several thousand Chest X-Ray images possessing diagnostic indications to a number of various cardiac and pulmonary diseases will be analyzed (see Dataset Overview & Cleaningsection for more information). The dataset will be cleaned, pre-processed, and exploratory analyzed, after which iterative model runs and feature engineering steps will be performed to test the performance of state-of-the-art CNN architectures with class imbalances.

d) Relevant Research

The utilization of Convolutional Neural Networks within the medical space, specifically for image classification, is both an established and an emerging area of innovative research. The distinct average of utilizing CNNs for such an application is that the algorithms can be generalized to solve many different kinds of medical image classification tasks in a process called transfer learning.$^{13}$ Given the large rise in medical data and releases of images, many researches have applied deep learning networks in order to successfully classify images such as CT Scans, MRI Graphs, and Ultrasound results as pathophysiological and physiological. However, given the focus of this project, the following are some relevant studies pertaining to classifying pulmonary and cardiac disorders through utilization of training CNNs on Chest X-Ray images.

In late 2019, a group of researchers, Jain et al., on a joing project from Bharati Vidyapeeth's College of Engineering and Karunya Institute of Technology and Sciences, utilized several state-of-the-art CNN architectures to classify Chest X-Rays as positive or negative for pneumonia.$^{14}$ Using the VGG16, VGG19, ResNet50, and Inception-v3 architectures, Jain et al. found that these cutting-edge, pre-trained algorithms were able to achieve testing accuracies of 87.28%, 88.46%, 77.56%, and 70.99%, respectively. However, an interesting finding from this group was that two custom models, which consisted of two and three convolutional layers, respectively, achieved testing accuracies of 85.26% and 92.31%, respectively. These custom findings suggest that CNNs might achieve better performance with regard to medical image classification if the models are specifically trained on the dataset being classified, as pre-trained models are trained on other objects such as cats and dogs. As a result, the pre-trained models, although state-of-the-art, may not be able to detect complex intricacies within medical images for sufficient classification.

Furthermore, with the rise of the COVID-19 pandemic in the past two years, substantial research has gone into Chest X-Ray image classification as a method of detecting the virus early and often. In November 2020, researchers at the University of Waterloo (Canada), Wang et al., developed a tailored deep CNN design specifically for detection of COVID-19 cases from Chest X-Ray images.$^{15}$ After consolidating 13,975 Chest X-Ray images across 13,870 patient cases, their COVID-Net model achieved a respectable accuracy of 93.3%. Furthermore, they reported that this custom tailored model achieved better accuracies than both the pretrained VGG and ResNet CNN architectures. Similarly, a joing team of researchers from both the University of Azad Jammu & Kashmir in Pakistan, and Stony Brook University in New York, utilized CNNs to classify Chest X-Ray images as either positive for COVID-19, positive for bacterial pneumonia, negative for COVID-19 but positive for viral pneumonia, and negative for any pathology.$^{16}$ Using a deep convolutional network based off of the state-of-the-art U-Net CNN design, the researchers, Hussain et al., achieved an admirable accuracy of 79.52% for their complex multi-class classification.

Overall, the use of deep CNNs within the medical image classification space is emerging. However, many of these prior analyses focus on a single disease and specify their network in an according way. As such, the goal of these prior projects is to simulate radiologist-level classification, which ultimately may not be feasible as it would suggest replacing radiologists with artificial neural networks. Our project will instead focus on serving as an initial screening tool for Chest X-Rays, classifying them based on finding or severity, such that radiologists can refocus their efforts rather than being replaced entirely.

II. Dataset Overview & Cleaning

The following section describes the dataset being used to address the general research question listed above, as well as details how the dataset was pre-processed for analysis.

a) Dataset Overview

The dataset being utilized in this project is a sub-sample of a dataset provided by the NIH containing 112,120 Chest X-Ray images obtained from 30,805 unique patients. The labels in the original dataset were generator through the use of Natural Language Processing techniques to text-mine disease classifications from the associated radiological reports. The labels are expected to be approximately 90% accurate and suitable for weakly-supervised learning. The images were already resized to 1024x1024 pixels squared, and the sub-sample that will specifically be utilized in this notebook, due to computational limitations, contains 5,606 images. The images are classified through the use of 15 different labels (14 diseases and one "No findings" label), and some images are classified with more than one disease label (see below). Finally, the dataset also was generated with a comma-separated file of patient demographics and the bounding boxes of each image.

Link to dataset$^{17}$: https://www.kaggle.com/nih-chest-xrays/data?select=Data_Entry_2017.csv

The 15 different classifications in the dataset are as follows:

  1. Atelectasis

  2. Consolidation

  3. Infiltration

  4. Pneumothorax

  5. Edema

  6. Emphysema

  7. Fibrosis

  8. Effusion

  9. Pneumonia

  10. Pleural Thickening

  11. Cardiomegaly

  12. Nodule Mass

  13. Hernia

  14. No Findings

  15. One or more disease classifications

Together, this makes for a total of 140 combinations that can exist for any given image's classification in just our 5,606-image sub-sample, as discussed later.

b) Dataset Cleaning & Pre-Processing

The following sub-section includes a step-by-step overview of how the dataset was loaded into the Google Colab environment, cleaned, and prepared for exploratory data analysis.

i) Importing Python Libraries

The following code chunk outlines the numerous packages, libraries, and modules that were imported to the notebook for the analysis to be conducted in Python. The tools are listed below:

ii) Mounting Drive & Importing Dataset

The following section mounts the drive to the correct directory and imports the images from the dataset of interest into the Colab environment.

The following code chunk contains JavaScript Code necessary for preventing Colab's runtime from disconnecting after the session is left idle. Due to the complexity of our analysis, this script is necessary to run in the browser's console such that the CNN models can be trained on the images without interruption.

The following code chunk mounts the Notebook's drive to the correct directory necessary in order to import the images being utilized to train the CNN models in subsequent analysis.

The following code chunk standardizes the path of the folder, images, and pickled pathways of the imaging dataset. It also includes a 'weight_path' argument, which will be utilized when fitting the pretrained models in the Keras API (VGG, ResNet, & DenseNet) to the imaging data.

iii) Looking at the Raw Dataset

Now that the relevant Python tools are imported and we are in the directory that our images are stored in, let us have a look at the dataset.

The followign code chunks reads the csv file associated with the 5,606-image sub-sample that we will be utilizing to train the CNNs into the Google Colab environment.

As shown in the output above, the csv contains the image index, the finding labels classified to the image index, and then demographic and bounding box information about the patients and images, respectively. We can see right off the bat that the images were resized down to 1024x1024 pixels squared from larger sizes, and that the ultimate classifications can have multiple findings and be very complex.

iv) Preprocessing

The following section pre-processes the csv file in order to prepare our exploratory data analysis.

The following section performs several pre-processing functions to the dataset. First, the dataframe is categorized utilizing the following map: No Findings --> 0, One Finding --> 1, and Multiple Findings --> 2. This is necessary for subsequent analysis of the three classes and training the CNN. Another dataframe is also created reflecting only the 7 most frequent classifications found in the csv file. These alterations are done to the original as part of pre-processing and preparation for exploratory data analysis such that sufficient hypotheses pertaining to the dataset and general research question can be constructed.

The following code chunk drops all columns of the dataframe pertaining to the 7 most frequent labels ('target_data'), such that subsequent visualizations can be constructed.

III. Exploratory Data Analysis

The following sub-section performs exploratory data analysis on both the csv file associated with the images and the images themselves. This is conducted in order to finalize the hypotheses of interest when answering the general research question. At the end of the sub-section, the hypotheses of our investigation are listed.

a) Key Questions

In order to approach our general research question and construct hypotheses, we will construct key questions to guide our exploratory data analysis of the imaging dataset we will be utilizing. They are as follows:

i) How balanced is the original distribution of our response variable (Chest X-Ray Image Finding)?

ii) How balanced are various altered distributions of our response variable?

iii) How representative is our dataset with respect to patient demographics?

iv) Generally, what do the Chest X-Ray Images in our dataset look like?

v) How do X-Ray Images of different classes differ by pixel intensity (brightness)?

b) Finalized Hypotheses

IV. Initial Model & Feature Engineering

a) More Pre-Processing and Function Preparation

Functions

Save image arrays as DataFrame

Save image arrays as Lists

Loading pickled data frame image arrays (not lists)

b) Feature Engineering Step 0: Initial 3-Class Classification

Imbalanced 3-Class Model

Balanced 3-Class Model

c) Feature Engineering Step 1: Binary Classification

Imbalanced Binary Classification

Binary Balanced Classification

d) Feature Engineering Step 2: Balanced Image Augmentation

Balanced Multiclass Image Augmentation

Balanced Binary Image Augmentation (Binary accuracy)

Balanced Multiclass Image Augmentation

Balanced Binary Image Augmentation (Binary accuracy)

e) Feature Engineering Step 3: Attempting Other Pretrained Architectures & Hypertuning

Balanced Multiclass Image Augmentation with ResNet-50

Balanced Binary Image Augmentation with ResNet-50

Balanced Multiclass Learning Rate Tuning (Low Better)

Balanced Binary Learning Rate Tuning (Higher Better)

Initial Balanced Multiclass Custom Model

Tuning Multiclass DenseNet Model With Custom Layers

Tuning Binary DenseNet Model With Custom Layers

V. Results of Finalized Model(s)

a) Multiclass Classification (None vs. One vs. Multiple)

i) Hypothesis One

- Finalized Balanced Multiclass Model
- Finalized Imbalanced Multiclass Model

b) Binary Classification (No Finding vs. Finding)

i) Hypothesis One

- Finalized Balanced Binary Model
- Finalized Imbalanced Binary Model

VI. Discussion & Conclusion

VII. References

  1. Tsakok MT, Gleeson FV. The chest radiograph in heart disease. Medicine. 2018;46(8):453-457. doi:10.1016/j.mpmed.2018.05.007
  2. Rousan LA, Elobeid E, Karrar M, Khader Y. Chest x-ray findings and temporal lung changes in patients with COVID-19 pneumonia. BMC Pulmonary Medicine. 2020;20(1):245. doi:10.1186/s12890-020-01286-5
  3. Bruls RJM, Kwee RM. Workload for radiologists during on-call hours: dramatic increase in the past 15 years. Insights into Imaging. 2020;11(1):121. doi:10.1186/s13244-020-00925-z
  4. Forsberg D, Rosipko B, Sunshine JL. Radiologists’ Variation of Time to Read Across Different Procedure Types. J Digit Imaging. 2017;30(1):86-94. doi:10.1007/s10278-016-9911-z
  5. Brady AP. Error and discrepancy in radiology: inevitable or avoidable? Insights Imaging. 2016;8(1):171-182. doi:10.1007/s13244-016-0534-1
  6. How to Manage Mistakes in Radiology | Diagnostic Imaging. Accessed January 13, 2021. https://www.diagnosticimaging.com/view/how-manage-mistakes-radiology
  7. Bruno MA, Walker EA, Abujudeh HH. Understanding and Confronting Our Mistakes: The Epidemiology of Error in Radiology and Strategies for Error Reduction. RadioGraphics. 2015;35(6):1668-1676. doi:10.1148/rg.2015150023
  8. Radiology (ACR) RS of NA (RSNA) and AC of. Chest X-ray (Radiograph). Accessed January 13, 2021. https://www.radiologyinfo.org/en/info.cfm?pg=chestrad
  9. Yamashita R, Nishio M, Do RKG, Togashi K. Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018;9(4):611-629. doi:10.1007/s13244-018-0639-9
  10. Unsupervised Feature Learning and Deep Learning Tutorial. Accessed January 14, 2021. http://deeplearning.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/
  11. Deshpande A. A Beginner’s Guide To Understanding Convolutional Neural Networks. Accessed January 14, 2021. https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/
  12. Brownlee J. A Gentle Introduction to Pooling Layers for Convolutional Neural Networks. Machine Learning Mastery. Published April 21, 2019. Accessed January 14, 2021. https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/
  13. Li Q, Cai W, Wang X, Zhou Y, Feng DD, Chen M. Medical image classification with convolutional neural network. In: 2014 13th International Conference on Control Automation Robotics Vision (ICARCV). ; 2014:844-848. doi:10.1109/ICARCV.2014.7064414
  14. Jain R, Nagrath P, Kataria G, Sirish Kaushik V, Jude Hemanth D. Pneumonia detection in chest X-ray images using convolutional neural networks and transfer learning. Measurement. 2020;165:108046. doi:10.1016/j.measurement.2020.108046
  15. Wang L, Lin ZQ, Wong A. COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Scientific Reports. 2020;10(1):19549. doi:10.1038/s41598-020-76550-z
  16. Hussain L, Nguyen T, Li H, et al. Machine-learning classification of texture features of portable chest X-ray accurately classifies COVID-19 lung infection. BioMedical Engineering OnLine. 2020;19(1):88. doi:10.1186/s12938-020-00831-x
  17. NIH Chest X-rays. Accessed January 15, 2021. https://kaggle.com/nih-chest-xrays/data