Baseline accuracy

The following are the accuracy results from our paper. 


Train-test splits

Slides from these institutes were not used in training the final segmentation model on the core set. They were used as an unseen testing set to report accuracy: OL, LL, E2, EW, GM, and S3.  <- Please use this testing set to reproduce the accuracy results from the table above.

Clarification note: We used a separate model for the concordance comparison with pathologists to accommodate imbalance in multi-rater data (evaluation set). In that other model, our testing set was: OL, LL, C8, BH, AR, A7 and A1. 

Class grouping

The network was trained to map pixels into five region classes: tumor, stroma, inflammatory infiltration, necrosis and other. Regions that belong to rare classes were grouped with predominant classes where appropriate, as follows:

  • Grouped with “tumor”: angioinvasion, DCIS.
  • Grouped with “inflammatory infiltrates”: lymphocytes, plasma cells, other immune infiltrates.