Machine Learning in Medical Diagnostics
written by Alexandru Parvu, Machine Learning Engineer, and Adrian Militaru, Data Engineer, in the June issue of Today Software Magazine.
Read the article in Romanian here
Perhaps one of the important fields of study is well known for being one of the most conservative in adopting new technologies, and for good reason. Medicine deals with human life and a small mistake can have devastating long-lasting effects.
It is for that reason that medicine has been hesitant to adopt machine learning in its practice even though such applications have been available for decades.
Making Sense of the Complex
The first and perhaps the simplest use of Machine Learning in medicine is that of making sense of some very complicated data. Many serious conditions tend to be diagnosed using many different factors. However, this diagnosis process is often not very straight forward. In fact, for one of the highest causes of mortality, predicting and preventing it is very often difficult to impossible. This is exacerbated by the fact that many factors are considered in this prevention, yet the complexity of these factors often make it very complicated to determine what has more influence.
For this first demonstration we used the sklearn library in python combined with the Cardiovascular dataset from Kaggle. Below we can see a small sample of the dataset:
The original purpose of this dataset was to track patient information and see what medical information can point to a cardiac related disease. It should be immediately observable that a wide variety of factors are considered when diagnosing the presence of heart disease denoted by a binary value in the dataset’s cardio column. And unfortunately, there does not appear to be a clear distinction between the people with heart disease and those without. Considering the 11 independent variables into the table we could get a total of plots which could help us understand what are the determining factors that might lead to heart disease.
This of course does not make it easy as going through 121 different plots would be time consuming and at the end there might not actually be a clear distinction between those with heart disease and those without.
This is in fact the case with this data. Despite having over 121 plots to choose from with a large variety in their appearance, distinguishing between patients with heart disease and those without is not at all clear. Please see the plots below:
The reason behind this difficulty is the fact that medical conditions are often not the result of a single factor but rather the result of the interaction of several factors. Thus, such simple plots would never tell you the whole complicated story. Thankfully, there are ways of clearing this noise with one of the most widely used machine learning models namely Random Forest or a variant of Random Forest named Gradient Boosting Machines.
Despite the simplicity of the model and its implementation, it offers us an explanation of the factors of influence that would otherwise elude us.
The code does not seem like much, but it seems a lot more important when we look at the explanation below:
from sklearn.ensemble
import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV
param_grid={
'loss': ['deviance', 'exponential']
, 'learning_rate': np.arange(0.01,0.2, step =0.04)
, 'n_estimators': np.arange(50,200, step =40)
, 'max_features' : ['auto', 'sqrt', 'log2']
}
estimator = GradientBoostingClassifier()
grid = GridSearchCV(estimator=estimator
, param_grid=param_grid
, scoring='accuracy'
, cv=4
, verbose=2)
grid.fit(X_train, y_train)
The above plot shows us in decreasing order of importance the determining factors that lead to heart disease. The “rand_var” is a random variable that is used to determine if a factor is in fact a determinant factor or not. Thus, we can notice that surprisingly the activity level, smoking habit, alcohol consumption, and gender is not a determinant of heart disease. What is a determinant, in decreasing order, is blood pressure, age, cholesterol level, weight, glucose level, and surprisingly height? Yes, taller people are more susceptible to heart disease.
Time Saver
But in fact, this is where machine learning can help. This time-consuming process could easily be reduced with just a few lines of code. But before that could be done, we need to establish what is the metric that we are interested in. Is it accuracy? Intuitively you would say yes, until you think that most of the people that come into a doctor’s office do not have Pneumonia, thus by just stating that nobody has pneumonia you would get an accuracy of over 99% which whilst impressive is not very helpful from a medical point of view.
It would be perhaps easier if we start from the point of view of the goal. A doctor’s objective is to detect as many of those people with pneumonia as possible. Thus, we want a high rate of detection of those people that have pneumonia. For this type of detection, where we are interested in finding the true positive patients that suffer from pneumonia. In order to measure this, there is such a metric called “recall”
The code below represents all that is needed for creating a very good model:
model_4 = create_model(model_url=resnet,
num_classes=2)
model_4.compile(loss=BinaryCrossentropy()
, optimizer=Adam()
, metrics=['accuracy'])
history = model_4.fit(aug_train_data
, validation_data=val_data
, epochs=10
, workers = 16
, use_multiprocessing=True
, steps_per_epoch=len(aug_train_data)
, validation_steps = len(val_data))
With a training period of under 10 minutes we can notice the results below which, like mentioned before, seem to be an achievement of our stated goal of having a model with a high recall regarding pneumonia.
The recall value for class 1 (Pneumonia) is 0.99 (99%) meaning that over 99% of the X-rays that presented with Pneumonia were correctly detected as having Pneumonia. It can be noticed that the accuracy (89%), is much lower than the recall for Pneumonia. In order to better understand this, we can further analyze our results in the form of a confusion matrix. A confusion matrix is used to show comparisons of the classifier’s performance in predicting true versus false positives and negatives.
The consequences of having a high recall and slightly lower accuracy can be seen below:
We can observe that while most of the people with Pneumonia were correctly identified, about 10% of the patients which did not have Pneumonia were also classified as having it.
This leads to the final point that should be made when discussing Machine Learning Models as to their application in the Medical Field. These tools cannot replace doctors as they require their medical expertise to be created.
Conclusion: No Replacement Just Enhancement
We are often tempted to think that Machine Learning will make certain professions obsolete. But often that is simply not the case. It certainly is not the case in medicine. The reason is that for these models to be created they require the historical diagnosis of doctors. The better the doctor, the better the accuracy of their diagnosis, and the better the model when created.
Thus, using these models is not that of replacing doctors, but rather to enhance their abilities. That is to enhance their abilities in diagnosing, by increasing the accuracy of diagnosis and treating diseases.
Machine learning can also increase the doctor’s speed in diagnosis and gives them the chance to treat as many patients as possible. Imagine the possibilities if every doctor could leverage the knowledge of every field expert in medicine at just a simple press of a button. That is in fact the potential of Machine Learning in the Medical Field.
* Please check Code Repo here.