Unique ID:

Issue Date: April 13th 2020
Last modified: April 22nd 2020

AI to identify COVID-19 manifestations in patient chest radiology

Build an AI-powered tool that allows healthcare practitioners on the front line to get decision support in assessing cases of COVID-19 using chest radiography: “Feed the algorithm a valid chest X-ray, get back an indication of whether it is believed to be a covid-19”

Tags: covid-19

What do we want to do?

  • Build an AI-powered tool that allows healthcare practitioners on the front line to get decision support in assessing cases of COVID-19 using chest radiography
    • “Feed the algorithm a valid chest X-ray, get back an indication of whether it is believed to be a covid-19”
    • Develop the algorithm collaboratively using broad partnerships potentially with tech companies, hospital research arms, national institutes of health and disease,
  • If a CXR from a COVID-19 patient is visibly distinct from those from patients with other pathologies, then deep learning algorithms could potentially learn the difference[1]

How is it done today and what are the limits?

  • Several experimental algorithms exist in current experimentation that can distinguish between COVID-19 and non-COVID-19 CXRs at a level, though research in closely related fields has been done and progress is likely happening in this field
  • Researchers with the University of Waterloo, Canada and DarwinAI Corp. collaborated on a study to improve the effectiveness of screening of patients with COVID-19. The study introduces an AI system, COVID-Net, a neutral network designed to detect COVID-19 cases from chest radiography images. The system is open source and available to the general public, with the hope that broad access might allow the system to be built upon and improved by other researchers and data scientists to accelerate treatment for those who need it.
  • The Stanford Machine Learning group uses machine learning to diagnose pneumonia at an accuracy exceeding practicing radiologists[2] using a publicly available CXR dataset[3].
    • This does not differentiate between COVID-19 and non-COVID-19 patients as there is no public dataset containing COVID-19 imagery
  • Kaggle competition[4] (still ongoing) sponsored by the Radiological Society of North America focuses on differentiating pneumonia from non-pneumonia
    • Does not differentiate between COVID-19 imagery
  • Providence Heath Care has uploaded an algorithm on the UN Global Platform’s method service that attempts to distinguish between COVID-19 pneumonia and other non-COVID-19 classes. The model is available in beta, which leverages 1024 engineered features and is fine tuned on 196 augmented images from COVID-19; 1029 viral, 1564 bacterial pneumonia, and 2231 healthy. This is a joint work between Providence Health Care and multidisciplinary teams at Simon Fraser University’s School of Computing Science and Department of Mathematics, specifically: the MAGPIE Group; Big Data Hub; Omics Data Science Group; Computational Genomics: MAGPIE; and, GrUVi Lab.  The model is currently in the validation phase at St. Paul’s Hospital in Vancouver, Canada, subject to further evaluation and training.

What is new in this approach and why will it be successful?

  • Will develop a model training pipeline that allows both training on private datasets using differential privacy techniques and "data philanthropy”.
  • In addition, engineered features and model weights without any privacy exposure can be easily shared among the members to allow rapid model improvement by reducing big data burden and therefore, lowering barrier to entry to machine learning approach
    • Algorithmia has offered a willingness to provide in-kind expertise on data science, deep learning, data engineering, and assistance in optimizing the model training pipelines
      • Expertise in machine learning architectures in a data scarce environment
      • Expertise in making models production ready
    • The UN Task Team on Privacy Preserving Techniques includes experts from the field who can collaborate to ensure that holders of CXRs who are bound by strict confidentiality agreements can also participate in model training using differential privacy
      • Differential privacy techniques have been used by companies to ensure training and prediction happen where the data lies, and that models are “aggregated” back at a central location, thus ensuring that data do not move
    • The UN should call for data philanthropy to release anonymized CXR images that contain known COVID-19 positive cases, as there is precedent for this using non-COVID-19 CXRs (by the NIH as well as Guangzhou Women and Children’s Medical Center)
      • Allows research and development to be done against a public dataset
      • Can involve data competitions from partners like Kaggle and Driven Data
    • Given the global push to improve the tools available to front-line clinicians, there is already broad interest in providing in-kind support such an initiative from a range of partners, which can be used as the basis for approaching others:
      • The UN’s WHO can provide a quality check of any algorithmic method and provide expertise and authority
      • Those with access to data sources, including hospitals and research arms of hospitals in cities with known COVID-19 patients
        • US: Kaiser Permanente, NYU Langone, Columbia Medical Center
        • Other countries
      • Government agencies in countries with COVID-19 outbreaks with influence over data access, as well as agencies with a history of supporting data philanthropy in this field (US NIH, others).
        • NIH, NBS China, others
      • Members of our UN Task Teams (Andrew Trask of the Privacy Preserving TT, Kerrie Mengersen of the EO Task Team)
    • Will work with partners to ensure that the trained model is served in a manner that anyone can use in a secure manner, in accordance with strict data privacy laws
      • Will make use of Algorithmia’s API
      • Source code for the underlying model will be open, as will model weights

What difference will this make if we succeed?

  • Healthcare practitioners will have access to a supplementary tool for assessing the presence or absence of COVID-19 that in principle embodies expert knowledge yet requires none to use

What are the known risks and challenges, and how will they be addressed?

  • Access to public data sources
  • Access to data engineering and machine learning expertise
  • Ability to train model on scarce data
  • Ability to train model on data that cannot be made public


[1] An implication of the Universal Approximation Theorem.

[2] See https://stanfordmlgroup.github.io/projects/chexnet/

[3] See ChestX-ray14 dataset of 100,000 patients: https://nihcc.app.box.com/v/ChestXray-NIHCC

[4] See the competition here https://www.kaggle.com/c/rsna-pneumonia-detection-challenge, as well as supplemental dataset https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

Project Objective:

Exploration, Scientific / research

Project Sources
Project Sources
Big Data Source: Medical X-ray imagery
Region: Global
Id Country Regional: global
Data Coverage
Data Coverage
Cost Implication: Free
Data Quality
Data Quality
Quality Aspects Evaluated: Privacy and Security, Accuracy, including selectivity
Methods Used: Machine learning (Random forest, etc.)
Write Your Own Review
You're reviewing:AI to identify COVID-19 manifestations in patient chest radiology
Your Rating