Evaluation of the clinical performance of an AI-based application for the automated analysis of chest X-rays (2024)

Introduction

In recent years, the corona pandemic has once again shown that medical staff are exposed to an extremely high level of stress in their clinical routine1,2. The use of artificial intelligence (AI) in medical care has been discussed extensively for several years in order to support medical staff with the increasing workload in their daily routine—especially in highly technical fields such as radiological departments that deal with image-based tasks3,4.

Although various AI-based applications are principally conceivable in medicine, the evaluation of chest radiographs appears to be a good opportunity to establish an AI-based algorithm in clinical routine5. Eltorai et al. conducted an online survey in which they asked both radiologists and computer science experts about their expectations regarding the future impact of AI applications on the field of radiology. As part of this survey, they also asked radiologists about their desire for specific AI applications. About 30% of the radiologists declared an interest in AI applications that detect atelectasis (29.5%), pleural effusions (30.5%) and consolidations (31.6%). Even more radiologists expressed their interest in AI applications that indicate pneumothoraces (56.8%) and pulmonary nodules (88.4%)6.

The majority of studies evaluating the performance of AI-based algorithms for the interpretation of chest radiographs focus on one particular finding, e.g. signs of COVID-19 infection or tuberculosis7,8,9,10,11,12,13,14. The AI-based detection of lung nodules has also been aim of various studies in the past15,16,17.

Siemens Healthineers (Erlangen, Germany) offers an AI-based application for the automated analysis of radiographs of the chest, which continuously aims to develop a holistic approach to patient care. Currently, the AI Rad Companion Chest X-ray (AI-Rad) is designed to detect five specific radiographic findings: pulmonary lesions, consolidation, atelectasis, pneumothorax and pleural effusion. The AI-Rad is considered a diagnostic aid to support radiologists in their clinical routine.

Homayounieh et al. have tested the AI-Rad algorithm with regard to the detection of lung nodules15. Their study included 100 p.a. chest radiographs that were evaluated by nine radiologists with different levels of experience. Each radiologist reviewed all images in two sessions—once in an unaided mode, once in AI-aided mode. In the AI-aided session, the mean sensitivity, specificity and detection accuracy for the detection of lung nodules among all radiologists improved by 10.4%, 2.4% and 6.4% compared to unaided session. Junior radiologists experienced greater improvements in sensitivity compared to senior radiologists, whereas all radiologists experienced similar improvements in specificity15.

The purpose of the present study is to evaluate the performance of the AI-Rad. We compared the performance metrics of the AI-Rad with those of clinical radiologists by analyzing the findings described in the written reports and the findings detected by the AI algorithm.

Methods

Patient population

All radiographs were performed for diagnostic reasons. In total, 499 consecutive patients, who were examined between August and September 2021, were retrospectively enrolled in this study. Patients were not preselected regarding any personal characteristics (e.g. weight, age, gender) or certain pathologies. The radiographs were acquired with seven different X-ray devices that are located in four different hospitals. All hospitals are part of our radiological department.

AI rad companion chest X-ray

The AI-Rad solely analyzes the posterior-anterior (p.a.) view of chest X-ray images and creates secondary capture DICOM objects reporting on the results of the analysis. Each finding is marked on a copy of the analyzed X-ray image and listed in a table. Additionally, the AI-Rad provides a “confidence score” (CS) on a scale of 1 (low) to 10 (high) for each finding, which expresses the algorithm´s certainty for the presence of that particular finding. The manufacturer has preset the AI-Rad only to report findings with a CS ≥ 6, whilst findings with a CS ≤ 5 are not displayed.

The AI-Rad (version VA23A) is designed to detect five specific radiographic findings: Pulmonary lesions, consolidation, atelectasis, pneumothorax and pleural effusion. Pulmonary lesions, as defined by the AI-Rad, include lung nodules (rounded or oval opacities < 3cm in diameter) and lung masses (pulmonary, pleural or mediastinal lesions > 3cm in diameter). To detect pneumothoraces, the AI-Rad screens for radiographic signs suggestive of air in the pleural space. Likewise, the AI-Rad screens for radiographic signs suggestive of fluid in the pleural space for the detection of pleural effusions. Atelectasis are defined as increased opacities accompanied by volume loss, which, in turn, can be an abnormal displacement of fissures, bronchi, vessels, the diaphragm, or the mediastinum. The AI-Rad defines consolidations as increased parenchymal attenuation. This definition includes hom*ogeneous increases of parenchymal attenuation (consolidation) that obscures pulmonary vessels and bronchi as well as hazy increases of parenchymal attenuation (ground glass opacity) that do not obscure pulmonary vessels and bronchi.

Reporting procedures and data collection

The report for each radiograph was written immediately after the examination. In most cases, the radiographs were evaluated in a consensus decision between a junior radiologist and a senior radiologist (> 20years of experience). The radiologists were not aware of this study. Therefore, the written reports reflect the radiological routine without any external influencing factors. The evaluation of the radiographs by the AI-Rad was performed retrospectively.

The written reports were screened for the mentioning of the pre-defined radiographic findings (pulmonary lesions, consolidation, atelectasis, pneumothorax and pleural effusion). In case a certain pre-defined finding was not mentioned in the written report, it was considered as “not detected by the radiologist”. The findings detected by the AI-Rad were listed including the CS (confidence score).

Ground truth

The ground truth for the data set was defined in a consensus decision by two radiologists (4 and 6years of experience). In order to do so, further images (e.g. additional radiographs in lateral view, previous and/or follow-up X-ray examinations as well as CT scans) were taken into account.

While determining the ground truth, the overall image quality of the radiographs was rated on a 5-point Likert scale (1 = very poor image quality, 5 = excellent image quality). In addition, the reason for a potentially suboptimal image quality was determined.

Statistical analysis

Data processing and descriptive statistical analyses as well as graphical illustration were performed using the statistical software R and RStudio (R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. RStudio Version 1.4.1106). For the written report as well as the AI-Rad analysis, the sensitivity, specificity, positive (PPV) and negative predictive value (NPV) as well as the false discovery rate (FDR) and the false omission rate (FOR) were calculated for the detection of each pre-defined finding. Furthermore, receiver operating characteristic (ROC) curves were created and the area under the curve (AUC) was calculated to illustrate the performances.

Ethical approval

Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of the Faculty of Medicine of the Ruhr-University Bochum.

Informed consent

Patient consent was waived by the Ethics Committee of the Faculty of Medicine of the Ruhr-University Bochum due to the retrospective study design.

Results

Chest radiographs of 499 patients were analyzed in the present study. The mean age was 65.4 ± 17.0 (median 67.6, range 22–97) years.

Overall, the image quality of the great majority of radiographs was “good” or “excellent”. Only 1.2% of the radiographs were rated “appropriate”. The most frequently cited reason for suboptimal image quality was “overlapping soft tissue”. Details on the image quality are summarized in Table 1.

Full size table

Ground truth

Overall, 499 X-ray examinations were included in the present study of which 386 examinations included radiographs in p.a. and lateral view, 113 examinations consisted of radiography solely in p.a. view.

To determine the ground truth not only the particularly in this study included X-ray images were evaluated, but also additional examinations were considered. In 375 of the 499 included cases, additional X-ray examinations and/or CT scans were available at the time, when the ground truth was defined.

In terms of additional radiographs, 332 patients had at least one additional X-ray examination of the chest. In 299 cases, previously acquired X-ray images were available. In 136 cases, follow-up X-ray images were available. In 103 cases, both previous as well as follow-up X-ray images were available.

Likewise, 237 patients had at least one CT examination that included the chest. In 186 cases, a CT scan that was acquired before the date of the in this study included radiograph was available. In 121 cases, a CT scan that was acquired after the date of the in this study included radiograph was available. In 70 cases, CT scans that were acquired before as well as after the date of the in this study included radiograph were available.

On 312 of the 499 analyzed (62.5%) radiographs, none of the pre-defined findings was detected. Accordingly, on 187 radiographs (37.4%) at least one of the pre-defined findings was detected; out of these radiographs, the majority had one (n = 99) or two (n = 62) pre-defined findings. Table 2 shows the distribution of the pre-defined findings.

Full size table

The written report and the AI-Rad analysis came to the same result in 251 cases (50.3%) and disagreed in 248 cases (49.7%). In 366 cases (73.3%), the written report agreed with the ground truth and in 133 cases (26.7%), the written report disagreed with the ground truth. Likewise, in 276 cases (55.3%), the AI-Rad agreed with the ground truth and in 223 cases (44.7%), the AI-Rad disagreed with the ground truth.

Lung lesions

The results regarding the detection of lung lesions are shown in Fig.1 and Table 3. An example for the detection of a lung lesion is shown in Fig.2. Considering all CS (CS ≥ 6), the AI-Rad offered high sensitivity (0.83) and specificity (0.83) for the detection of lung lesions with an excellent NPV (0.97), but high FDR (0.62). With increasing level of CS, the FDR (0.20 at CS = 10) decreased markedly. At the same time, however, the sensitivity (0.28 at CS = 10) decreased markedly. The NPV remained high (0.91 at CS = 10).

Performance metrics for the detection of lung lesions. (A) Receiver operating characteristic (ROC) curves displaying the performance of the AI Rad Companion Chest X-ray (AI) and the written report (WR) for the detection of lung lesions (area under the curve (AUC) AI = 0.867/WR = 0.750). (B) Sensitivity (Sens.), negative predictive value (NPV) and false discovery rate (FDR) of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of lung lesions. The sensitivity of the AI is considerably higher at a (confidence score =) CS ≥ 6; at the same time, the FDR is considerably higher. At CS = 10, the FDR is comparable to the WR, but the sensitivity is considerably lower. The NPV of the AI is high at all CS.

Full size image
Full size table

Example of a lung lesion detected by the AI Rad Companion Chest X-ray (A, B) that was confirmed by a CT scan of the chest (C): 74year old, male patient diagnosed with a carcinoma of the tongue 2years ago.

Full size image

The sensitivity of the written report for the detection of lung lesions was comparatively low (0.52). The specificity (0.98) as well as the NPV (0.94) were excellent. At the same time, the FDR was comparatively low (0.21).

Consolidation

The results regarding the detection of consolidations are shown in Fig.3 and Table 4. An example for the detection of consolidations is shown in Fig.4. The AI-Rad offers good sensitivity (0.88) and specificity (0.77) for the detection of consolidations, when considering all CS (CS ≥ 6). With increasing CS, the sensitivity decreases markedly (0.14 at CS = 10), whereas the specificity increases (0.99 at CS = 10). The NPV is excellent at all CS (0.98 at CS ≥ 6; 0.91 at CS = 10). The FDR is relatively high (0.70 at CS ≥ 6), when considering all CS, but decreases noticeably with increasing CS (0.36 at CS = 10).

Performance metrics for the detection of consolidations. (A) Receiver operating characteristic (ROC) curves displaying the performance of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of consolidations (area under the curve (AUC) AI = 0.873/WR = 0.868). (B) Sensitivity (Sens.), negative predictive value (NPV) and false discovery rate (FDR) of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of consolidations. The sensitivity of the AI is slightly higher at a (confidence score =) CS ≥ 6; at the same time, the FDR is markedly higher. At CS = 10, the FDR is comparable to the WR, but the sensitivity is much lower. The NPV of the AI is high at all CS.

Full size image
Full size table

Example of a consolidation detected by the AI Rad Companion Chest X-ray (A, B) that was confirmed by a CT scan (C): 72year old, male patient admitted to the hospital with fever, dyspnea and severe cough.

Full size image

The sensitivity of the written report for the detection of consolidations was good (0.78). The specificity (0.98) as well as the NPV (0.94) were excellent. In addition, the FDR was comparatively low (0.35).

Atelectasis

The results regarding the detection of atelectasis are shown in Fig.5 and Table 5. The AI-Rad offers moderate sensitivity (0.54 at CS ≥ 6) for the detection of atelectasis that decreases markedly with increasing CS (0.04 at CS = 10). The specificity (0.92 at CS ≥ 6) as well as the NPV (0.90 at CS ≥ 6) remain very high at all CS. The FDR is highest when considering all CS (0.40 at CS ≥ 6) and decreases markedly with increasing level of CS.

Performance metrics for the detection of atelectasis. (A) Receiver operating characteristic (ROC) curves displaying the performance of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of atelectasis (area under the curve (AUC) AI = 0.743/WR = 0.702). (B) Sensitivity (Sens.), negative predictive value (NPV) and false discovery rate (FDR) of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of atelectasis. The sensitivity of the AI is slightly higher at a (confidence score =) CS ≥ 6; at the same time, the FDR is slightly higher. At CS ≥ 7, sensitivity and FDR of the AI and the WR are on a similar level. At CS ≥ 8, the sensitivity as well as the FDR decrease markedly. The NPV of the AI is high at all CS.

Full size image
Full size table

Likewise, the written report offers moderate sensitivity (0.43) for the detection of atelectasis. The specificity (0.97) as well as the NPV (0.89) are excellent. The FDR is on a low level (0.24).

Pneumothorax

The results regarding the detection of pneumothoraces are shown in Fig.6 and Table 6. When analyzing the performance metrics for the detection of pneumothoraces, it must be noted that the prevalence of pneumothoraces was considerably low in the cohort (2.0%), which influences the overall calculation of the performance metrics.

Performance metrics for the detection of pneumothoraces. (A) Receiver operating characteristic (ROC) curves displaying the performance of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of pneumothoraces (area under the curve (AUC) AI = 0.830/WR = 0.848). (B) Sensitivity (Sens.), negative predictive value (NPV) and false discovery rate (FDR) of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of pneumothoraces. CS = confidence score.

Full size image
Full size table

The AI-Rad offers a good sensitivity for the detection of pneumothoraces when considering all levels of CS (0.70 at CS ≥ 6). However, the sensitivity decreases markedly with increasing level of CS (0.30 at CS = 10). Both specificity as well as NPV are excellent at all CS. The FDR is comparatively high at all levels of CS (0.70 at CS = 10). Described in absolute numbers; the AI-Rad detected 7 out of 10 pneumothoraces correctly. At the same time, the AI-Rad indicated 23 pneumothoraces false positively (see also Fig.7).

Three examples of false positively indicated pneumothoraces (A-C) by the AI Rad Companion Chest X-ray.

Full size image

The written report offers good sensitivity for thedetectionof pneumothoraces (0.70). The specificity (1.0), the NPV (0.99) as well as the FDR (0.22) are excellent.

Pleural effusion

The results regarding the detection of pleural effusions are shown in Fig.8 and Table 7. The AI-Rad offers good sensitivity for detecting pleural effusions when considering all levels of CS (0.74 at CS ≥ 6). However, the sensitivity decreases dramatically with increasing level of CS (0.02 at CS = 10). The specificity was excellent at all levels of CS. The NPV decreased slightly with increasing level of CS, but remained on a very good level (e.g. 0.81 at CS = 10). The FDR was very low at all levels of CS (e.g. 0.13 at CS ≥ 6).

Performance metrics for the detection of pleural effusions. (A) Receiver operating characteristic (ROC) curves displaying the performance of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of pleural effusions (area under the curve (AUC) AI = 0.861 / WR = 0.910). (B) Sensitivity (Sens.), negative predictive value (NPV) and false discovery rate (FDR) of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of pleural effusions. The sensitivity of the AI is inferior to the sensitivity of the WR at all CS (= confidence scores).

Full size image
Full size table

The written report offered very good sensitivity (0.88) and excellent specificity (0.94) as well as NPV (0.97) for the detection of pleural effusions. At the same time, the FDR was low (0.21).

Discussion

The purpose of the present study was to evaluate the performance of the AI-Rad (version VA23A) by analyzing the performance metrics of the AI-Rad and clinically working radiologists. The findings described in the written reports and the findings detected by the AI-Rad were compared to the findings of a ground truth reading, which was accomplished by a consensus agreement of two radiologists after evaluating additional radiographs (e.g. lateral view) and CT examinations (if available).

For the interpretation of the performance metrics of the AI-Rad, it is important to consider the different CS that are provided for each detected finding. The CS expresses the algorithm´s certainty for the presence of that particular finding. The AI-Rad might offer higher sensitivity for certain findings compared to the written report when considering the lowest CS (≥ 6). However, at the same time, the FDR of theAI-Rad at this CS might alsobe considerably higher. Likewise, at a higher CS, the AI-Rad might offer a similar FDR compared to the written report, but with a considerably lower sensitivity. Therefore, the different CS are important when evaluating the reported findings of the AI-Rad.

The sensitivity of the AI-Rad for the detection of lung lesions was superior in comparison to the sensitivity of the written report (0.83 (AI-Rad at CS ≥ 6) versus 0.52 (WR)). However, it has to be noted that, unlike the AI-Rad, radiologists immediately evaluate the findings they detect and decide whether it is worth mentioning in the written report. It is conceivable that a small, calcified granuloma, for example, that has been present for a long time may not be mentioned in the written report, but is indicated by the AI-Rad.

Furthermore, the sensitivity of the written report for the detection of lung lesions in the present study is comparable to previously published data. Homayounieh et al., for example, report on a mean sensitivity of 45% among nine radiologists with different levels of experience for the detection of pulmonary nodules15. The sensitivity of the AI-Rad in the present study is also comparable to previous published data. Yoo et al., for example, report on an artificial intelligence algorithm forlung nodule detection and describe a sensitivity of 86%18.

The superior sensitivity of the AI-Rad for the detection of lung lesions (at CS ≥ 6), however, is accompanied by a markedly higher FDR compared to the written report (0.62 (AI-Rad at CS ≥ 6) versus 0.21 (WR)). Indeed, the AI-Rad wrongly indicated ECG electrodes or the nipple as lung lesions in several cases. Calcifications of the costal cartilage are also often misinterpreted by the AI-Rad. Consequently, radiologists need to check each indicated finding with a CS ≥ 6 as the number of false positive findings is considerably high. When the AI-Rad reports a lung lesion with the CS = 10, it is more likely to bea true positive finding as the FDR is markedly lower (0.20 compared to 0.62 at CS ≥ 6).

In terms of detecting lung lesions, a benefit of the AI-Rad for clinical radiologists may be the high NPV (0.91–0.97; depending on the CS), which is comparable to the NPV of the written report (0.97). When taking the evaluation of the AI-Rad into account, radiologists may re-insure their own negative search for lung lesions, which may increase their confidence intheir report.

In terms of detecting consolidations, the AI-Rad offers slightly higher sensitivity (0.88 (AI-Rad at CS ≥ 6) versus 0.78 (WR)) compared to the written report. However, the higher sensitivity is accompanied by a higher FDR (0.70 (AI-Rad at CS ≥ 6) versus 0.35 (WR)). Therefore, radiologists might benefit from the higher sensitivity, but need to re-evaluate the indicated findings of the AI-Rad carefully. At CS = 10, the FDR of the AI-Rad is comparable to the WR (0.36 (AI-Rad at CS = 10) versus 0.35 (WR)), but the sensitivity of the AI-Rad decreased markedly (0.14 (AI-Rad at CS = 10) versus 0.78 (WR)).

These performance metrics of the AI-Rad regarding the detection of consolidations are in line with previously published data. Rueckel et al., for example, report on minor differences in the performance of an AI algorithm and board-certified radiologists for the detection of pneumonia on chest radiographs19. In addition, Yee et al. report on a comparable sensitivity (84.1%) of their neural network for the detection of pneumonia on chest radiographs20.

In terms of detecting consolidations, the high NPV (0.91–0.98; depending of the CS) of the AI-Rad may be a benefit for radiologists in clinical practice as they can reliably re-insure their own negative search for consolidations.

Similar to the detection of consolidations, the AI-Rad can provide slightly higher sensitivity for the detection of atelectasis compared to the written report (0.54 (AI-Rad at CS ≥ 6) versus 0.43 (WR)). However, it has to be noted that—similar to the arguments mentioned for the detection of lung lesions—it remains unclear whether small atelectasis have been detected by the radiologists, but were not considered worth mentioning in the written report. At CS ≥ 6, the FDR of the AI-Rad is higher compared to the written report (0.40 (AI-Rad at CS ≥ 6) versus 0.24 (WR)). Sensitivity as well as FDR decrease markedly with increasing CS. The NPV of the AI-Rad and the written report for the detection of atelectasis are on a high level (0.83–0.90 (AI-Rad; depending of the CS) versus 0.89 (WR)).

Compared to the AI-Rad, the written report achieved higher sensitivity for the detection of pleural effusions (0.74 (AI-Rad at CS ≥ 6) versus 0.88 (WR)). This might be accounted to the additional lateral view radiographs that are not taken into account by the AI-Rad, but are helpful in detecting smaller pleural effusions. The NPV (0.94 (AI-Rad at CS ≥ 6) versus 0.97 (WR)) of the AI-Rad and the written report for the detection of pleural effusions are comparable. This is in line with an earlier study conducted by Rueckel et al., who found only minor differences in the performance of an AI algorithm and board-certified radiologists for the detection of pleural effusions on chest radiographs19.

The performance metrics regarding the detection of pneumothoraces calculated in the present study are most likely not representative due to the low prevalence of pneumothoraces in our cohort (2.0%). However, during the systematic analysis of the radiographs for establishing the ground truth, we noticed that the AI-Rad indicates a considerably high number of pneumothoraces that are false positive. Therefore, according to our experience, it is conceivable that the FDR would be comparatively high even with a higher prevalence in the cohort. Nevertheless, future studies with higher prevalence need to evaluate reliably the performance of the AI-Rad for the detection of pneumothoraces.

The present study has certain limitations: (1) The AI-Rad is intended to be a supporting tool whose output is considered by radiologists before making their final decision while writing reports. However, the present study evaluated the performance of the AI-Rad alone and compared it to the performance of radiologists in the clinical routine without the assistance of an AI application. (2) As previously published studies show, less experienced radiologists are more likely to benefit from the support of an AI application15. However, the present study aimed to compare the overall performance of radiologists in the clinical routine and therefore did not differentiate between the individual experience of each radiologist. (3) The analysis in the present study focused on the list of findings provided by the AI-Rad, rather than the location of a finding indicated by the AI-Rad. Therefore, it is possible that the AI-Rad may have correctly listed a finding on the report sheet, but indicates it in the wrong location. (4) Unlike the AI-Rad, radiologists are able to consider lateral view radiographs and previously conducted radiographs for comparison. (5) As explained above, the conclusions regarding the performance of the AI-Rad for the detection of pneumothoraces are limited because of the low prevalence of pneumothoraces in this cohort. (6) The overall image quality of the chest radiographs was very good. The performance of the AI-Rad regarding chest radiographs with poor image quality was not evaluated in the present study.

Conclusions

The results of the present study indicate that the AI-Rad can offer a slightly higher sensitivity for the detection of certain findings (lung lesions, consolidations and atelectasis) compared to the written report. However, this advantage is partially offset by the disadvantage of a higher FDR of the AI-Rad. Consequently, radiologist need to carefully re-evaluate and verify each finding indicated by the AI-Rad.

At the current stage of development, it is conceivable that the high NPVs for the detection of the pre-defined findings are the greatest benefit of the AI-Rad. Radiologists re-insuring their own negative search for pathologies in a chest radiograph by considering the evaluation of the AI-Rad may have higher diagnostic confidence in their reports leading to faster reporting.

Data availability

The data are available from the corresponding author on reasonable request.

References

  1. Krammer, S., Augstburger, R., Haeck, M. & Maercker, A. Adjustment disorder, depression, stress symptoms, corona related anxieties and coping strategies during the corona pandemic (COVID-19) in Swiss Medical Staff. Psychother. Psychosom. Med. Psychol. 70, 272–282 (2020).

    PubMed Google Scholar

  2. Spoorthy, M. S., Pratapa, S. K. & Mahant, S. Mental health problems faced by healthcare workers due to the COVID-19 pandemic—A review. Asian J. Psychiatr. 51, 102119 (2020).

    Article PubMed PubMed Central Google Scholar

  3. Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H. & Aerts, H. J. W. L. Artificial intelligence in radiology. Nat. Rev. Cancer 18, 500–510 (2018).

    Article CAS PubMed PubMed Central Google Scholar

  4. Syed, A. B. & Zoga, A. C. Artificial intelligence in radiology: Current technology and future directions. Semin. Musculoskelet. Radiol. 22, 540–545 (2018).

    Article PubMed Google Scholar

  5. Kallianos, K. et al. How far have we come? Artificial intelligence for chest radiograph interpretation. Clin. Radiol. 74, 338–345 (2019).

    Article CAS PubMed Google Scholar

  6. Eltorai, A. E. M., Bratt, A. K. & Guo, H. H. Thoracic radiologists’ versus computer scientists’ perspectives on the future of artificial intelligence in radiology. J. Thorac. Imaging 35, 255–259 (2020).

    Article PubMed Google Scholar

  7. Murphy, K. et al. COVID-19 on chest radiographs: A multireader evaluation of an artificial intelligence system. Radiology 296, E166–E172 (2020).

    Article PubMed Google Scholar

  8. Zhang, R. et al. Diagnosis of coronavirus disease 2019 pneumonia by using chest radiography: Value of artificial intelligence. Radiology 298, E88–E97 (2020).

    Article PubMed Google Scholar

  9. Wehbe, R. M. et al. DeepCOVID-XR: An artificial intelligence algorithm to detect COVID-19 on chest radiographs trained and tested on a large US Clinical Data Set. Radiology 299, E167–E176 (2020).

    Article PubMed Google Scholar

  10. Mushtaq, J. et al. Initial chest radiographs and artificial intelligence (AI) predict clinical outcomes in COVID-19 patients: Analysis of 697 Italian patients. Eur. Radiol. 31, 1770–1779 (2021).

    Article CAS PubMed Google Scholar

  11. van Ginneken, B. The potential of artificial intelligence to analyze chest radiographs for signs of COVID-19 pneumonia. Radiology 299, E214–E215 (2020).

    Article PubMed Google Scholar

  12. Dorr, F. et al. COVID-19 pneumonia accurately detected on chest radiographs with artificial intelligence. Intell. Med. 3–4, 100014 (2020).

    Google Scholar

  13. Qin, Z. Z. et al. Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems. Sci. Rep. 9, 15000 (2019).

    Article ADS PubMed PubMed Central Google Scholar

  14. Kulkarni, S. & Jha, S. Artificial intelligence, radiology, and tuberculosis: A review. Acad. Radiol. 27, 71–75 (2020).

    Article PubMed Google Scholar

  15. Homayounieh, F. et al. An artificial intelligence-based chest X-ray model on human nodule detection accuracy from a multicenter study. JAMA Netw. Open 4, e2141096 (2021).

    Article PubMed PubMed Central Google Scholar

  16. Li, X. et al. Multi-resolution convolutional networks for chest X-ray radiograph based lung nodule detection. Artif. Intell. Med. 103, 101744 (2020).

    Article PubMed Google Scholar

  17. Chamberlin, J. et al. Automated detection of lung nodules and coronary artery calcium using artificial intelligence on low-dose CT scans for lung cancer screening: Accuracy and prognostic value. BMC Med. 19, 55 (2021).

    Article PubMed PubMed Central Google Scholar

  18. Yoo, H., Kim, K. H., Singh, R., Digumarthy, S. R. & Kalra, M. K. Validation of a deep learning algorithm for the detection of malignant pulmonary nodules in chest radiographs. JAMA Netw. Open 3, e2017135 (2020).

    Article PubMed PubMed Central Google Scholar

  19. Rueckel, J. et al. Artificial Intelligence algorithm detecting lung infection in supine chest radiographs of critically ill patients with a diagnostic accuracy similar to board-certified radiologists. Crit. Care Med. 48(7), e574–e583. https://doi.org/10.1097/CCM.0000000000004397 (2020).

  20. Yee, S. L. K. & Raymond, W. J. K. Pneumonia diagnosis using chest X-ray images and machine learning. in Proceedings of the 2020 10th International Conference on Biomedical Engineering and Technology 101–105 (Association for Computing Machinery, 2020). https://doi.org/10.1145/3397391.3397412.

Download references

Funding

We acknowledge support by the Open Access Publication Funds of the Ruhr-Universität Bochum.Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

  1. Department of Radiology, Neuroradiology and Nuclear Medicine, Johannes Wesling University Hospital, Ruhr University Bochum, Bochum, Germany

    Julius Henning Niehoff,Jana Kalaitzidis,Jan Robert Kroeger,Denise Schoenbeck,Jan Borggrefe&Arwed Elias Michael

Authors

  1. Julius Henning Niehoff

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  2. Jana Kalaitzidis

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  3. Jan Robert Kroeger

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  4. Denise Schoenbeck

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  5. Jan Borggrefe

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  6. Arwed Elias Michael

    View author publications

    You can also search for this author in PubMedGoogle Scholar

Contributions

Conceptualization: J.N. and J.B. Writing—original draft preparation: J.N., J.K. and A.M. Writing—review and editing: J.B. and J.R.K. Investigation: J.N., A.M., D.S. and J.K. Formal analysis: A.M. Data Curation: J.N., A.M., D.S. and J.K. Supervision: J.B. and J.R.K.

Corresponding author

Correspondence to Julius Henning Niehoff.

Ethics declarations

Competing interests

J.R. Kroeger received research support from Philips Healthcare, support for attending meetings and/or travel from Veryan, honoraria for scientific lectures from GE Healthcare and honoraria for clinical advisory board membership from Siemens Healthineers. J. Borggrefe received honoraria for scientific lectures from Philips Healthcare and Siemens Healthineers. The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Evaluation of the clinical performance of an AI-based application for the automated analysis of chest X-rays (9)

Cite this article

Niehoff, J.H., Kalaitzidis, J., Kroeger, J.R. et al. Evaluation of the clinical performance of an AI-based application for the automated analysis of chest X-rays. Sci Rep 13, 3680 (2023). https://doi.org/10.1038/s41598-023-30521-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41598-023-30521-2

Evaluation of the clinical performance of an AI-based application for the automated analysis of chest X-rays (2024)

FAQs

How does AI help with X-rays? ›

Artificial Intelligence (AI) can analyse X-rays and diagnose medical issues just as well as doctors, a study has claimed. Software was trained using chest X-rays from more than 1.5m patients, and scanned for 37 possible conditions.

What is the performance of a chest radiography AI algorithm for detection of missed or mislabeled findings a multicenter study? ›

Results: The AI had high sensitivity (96%), specificity (100%), and accuracy (96%) for detecting all missed and mislabeled CXR findings.

What is the evaluation of an artificial intelligence model for detection of pneumothorax and tension pneumothorax on a chest radiograph ›

Conclusions and Relevance

These findings suggest that the assessed AI model accurately detected pneumothorax and tension pneumothorax in this chest radiograph data set. The model's use in the clinical workflow could lead to earlier identification and improved care for patients with pneumothorax.

How to evaluate the technical quality of a chest radiograph? ›

Some of the more common criteria to assess radiographs, specifically chest radiographs, are penetration, inspiration, rotation, magnification, and angulation. "PIRMA" (penetration, inspiration, rotation, magnification, angulation) is a helpful mnemonic that can help one remember these 5 criteria.

Is AI as good as doctors at checking X-rays? ›

Initial results from a Swedish study of 80,000 women showed a single radiologist working with AI detected 20% more cancers than two radiologists working without the technology. In Europe, mammograms are reviewed by two radiologists to improve accuracy.

How good is AI in radiology? ›

Study shows AI improves performance for some radiologists but worsens it for others. Understanding who might benefit from AI and who would not is critical for designing tools that boost human performance. The findings underscore the importance of tailored AI-clinician integration over a one-size-fits-all approach.

What are the potential benefits of AI applications in diagnostic imaging? ›

One of the most notable benefits of AI in this field is its ability to accelerate the analysis of medical images. Traditional methods of image interpretation can be time-consuming and subject to human error.

What is the accuracy of AI detection? ›

That said, AI detectors can't guarantee anywhere close to 100% accuracy because they are based in large part on probabilities. Not to mention, each of the detectors use different datasets of content to train them. So, they can often provide different results from one another.

How does artificial intelligence in radiology improve efficiency and health outcomes? ›

The significance of AI integration in medical imaging

With strategic integration at key points, AI systems can drive quality and cost-effectiveness by mitigating the potential for human error, accelerating processes for faster results and potentially increasing the imaging biomarkers available to assess response.

What is the role of AI in diagnosing lung diseases? ›

AI is an emerging field that is revolutionizing that how clinical and imaging data can be utilized to explore to accurately diagnose and classify the respiratory diseases such as COPD, asthma, fibrosis, pneumonia etc.

What would be your choice of imaging to evaluate for a subtle pneumothorax? ›

Chest radiography is the first investigation performed to assess pneumothorax, because it is simple, inexpensive, rapid, and noninvasive; however, it is much less sensitive than chest computed tomography (CT) scanning in detecting blebs or bullae or a small pneumothorax.

What is the best way to confirm diagnosis of a pneumothorax? ›

To get a definite diagnosis, your doctor will most likely need to order an imaging test such as a chest X-ray, an ultrasound or CT scan.

What are the 3 criteria for diagnostic quality radiographs? ›

There are 3 main determinants of radiographic quality: receptor exposure, spatial resolution, and distortion. Many factors can affect these elements of quality which can ultimately impact the diagnostic quality of the image.

How do you assess the quality of a radiograph? ›

Radiographic quality is assessed using the four criteria shown below:
  1. Density.
  2. Contrast.
  3. Definition.
  4. Sensitivity.

What is the most common use of chest sonography is to evaluate? ›

Non-traumatic uses for thoracic ultrasound include evaluation for pleural effusions, infections such as pneumonia or empyema, pulmonary edema, chronic obstructive pulmonary disease, pulmonary embolism, and acute respiratory distress syndrome.

How can AI help radiation therapy? ›

The improved spatial positioning accuracy of images and the capability to capture tumor motion during treatment have made it possible to further lower the dose to normal organs while administering a very high dose to the tumor, thus enabling stereotactic radiotherapy, even at metastatic sites if feasible [6–10].

What is the role of artificial intelligence in the future of radiology? ›

b | AI is expected to impact image-based clinical tasks, including the detection of abnormalities; the characterization of objects in images using segmentation, diagnosis and staging; and the monitoring of objects for diagnosis and assessment of treatment response.

Will AI take over radiography? ›

Conclusion. In conclusion, AI is not going to replace radiologists entirely. However, it will change the way they work. Radiologists will need to adapt to these changes and learn how to work alongside AI.

How has technology improved X-rays? ›

Digital radiography systems provide better image quality, facilitate the storage and transmission of images (PACS – Picture Archiving and Communication Systems), and integrate well with other digital healthcare systems.

References

Top Articles
Latest Posts
Article information

Author: Barbera Armstrong

Last Updated:

Views: 5802

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.