Machine Learning
The authors of this review examine efforts to use machine learning (ML) algorithms during the ongoing pandemic and they focus on two main applications: diagnosis of COVID-19 and prediction of mortality risk and severity. Given that it is difficult to identify asymptomatic carriers, ML models are trained to be able to classify a patient as COVID-19 positive or negative. The articles included in the study focused on approaches based on ML that were used for predicting diagnosis of COVID-19 and the prognosis of mortality and severity which utilized simple clinical and laboratory data that were available from a public health agency. After applying inclusion/exclusion criteria, 52 articles were included in the study. The review found the XGBoost diagnostic model achieved an Area under the ROC Curve (AUC) of 97%, sensitivity of 81.9%, and specificity of 97.9%. Features that ranked highest by the model to predict COVID-19 diagnosis were mean corpuscular hemoglobin concentration (MCHC), eosinophil count, albumin, international normalized ratio (INR), and prothrombin activity percentage. Further, features that differentiated COVID-19 and other viral infections were MCHC, eosinophil ratio, prothrombin, INR, prothrombin activity percentage, and creatinine. The review identified another study that used data from the U.S. Department of Veterans Affairs. The study also used XGBoost for modeling and it achieved a specificity of 86.8%, a sensitivity of 82.4%, and an overall accuracy of 86.4%. The study found that that the top 5 features of importance were: serum ferritin, white blood cell count, eosinophil count, patient temperature, and C-reactive protein.
The Random Forest (RF) model found features that had not received considerable attention with diagnosis such as total protein, calcium, magnesium and basophils. The authors indicate that these features played an irreplaceable role in the RF algorithm, indicating that they have great potential as diagnostic markers in clinical practice for COVID-19. The method’s performance on an independent test set was consistent with that achieved on the training set, with an AUC of 99.26%, a sensitivity of 100%, and a specificity of 94.44. Other studies have also corroborated the potential of the RF model for decision-making, planning, and response. There are a number of other models mentioned in the article that have potential for accurate predictions and serve as effective diagnostic tools.
A related study deployed machine learning approaches for predicting the spread of COVID-19. Data were obtained from Johns Hopkins University, World Health Organization, and Worldometer. The study was developed based on decision tree algorithms related to global real-time data. Linear regressions were also used. Experimental results showed that the algorithm was able to forecast the possible confirmed cases for the U.S. on a 7-day basis. Experimental results showed that the confirmed cases were exponentially increasing from a few hundreds of thousands to nearly two and a half million. When looking at comparisons with other pandemics, rate of spread for COVID-19 is still lower than other pandemics such as the Spanish Flu and the virus has infected more people SARS or Ebola. The authors also state that COVID-19 is more deadly than the flu, but its mortality rate of 6.87% is less compared to the mortality rates of other outbreaks such as MERS or Ebola which recorded mortality rates of 34.40 and 39.53%, respectively. According to the prediction models, COVID-19 infections globally will decline during the first week of September 2021 and will continue the decline thereafter.
ML is a promising technology employed by various healthcare providers as it may result in better scale-up, speed-up processing power and is reliable. Further, it may be able to improve treatment options and studies have shown its application in screenings & predictions, forecasting, and contact tracing.