Chronic myeloid leukemia (CML) is a type of leukemia that originates in the myeloid cells of the bone marrow. It produces white blood cells, which usually help the body fight off illnesses. A patient’s cells have a chromosome that is changed in almost all cases of CML. The alteration is known as the Philadelphia chromosome. The result is the production of the damaged gene BCR-ABL. With the presence of this gene, leukemia cells develop out of control. When the leukemia cells grow, the healthy bone marrow cells could be forced out. The blood may not have enough of the different blood cell types as a result. An increased amount of abnormal white blood cells might be found in the blood of CML patients.

For the study of micro-anatomy data, deep learning algorithms are being utilized to speed up the process, improve objectivity, and lessen mistakes. Furthermore, the creation of explainable AI (xAI), also known as explainable AI systems, is utilized to simply explain the deep learning computations and help us comprehend medical disorders better.

The Philadelphia chromosome

Deep Learning

Deep learning is a sub-set of machine learning which is a subset of AI. AI can merely be a programmed rule that tells the machine to behave in a specific way in certain situations. In other words, artificial intelligence can be nothing more than several if-else statements. In order to reduce the mistakes of its own predictions of the model vs. the facts, machine learning requires significant user input. An extremely powerful framework for deep learning in the modern era, a DL network can express functions of increasing complexity by adding additional layers and units inside each layer.

Given a suitably big model and dataset of labeled training samples, deep learning can typically complete tasks that involve mapping an input sequence to an output sequence quickly and easily. Deep learning requires stronger and better computer hardware in order to operate successfully.

Areas of artificial intelligence

Cost and Loss Function

The reduction of error between predictions and true values is the aim of DL model development. Using loss functions linked to each training example, this is accomplished. The average of the loss function values over all data samples is the cost function. The cost function is improved to lower the DL error. We can attain the greatest outcomes in DL by improving the cost function.


The effectiveness of algorithms for comprehending medical images is evaluated using a variety of measures. The table used to visualize algorithm performance and determine multiple assessment metrics is called the confusion matrix. Confusion matrices are used to evaluate deep learning models and give a more realistic view of their performance. The output could have two or more classes. In the table, there are four possible combinations of anticipated and actual values.

Confusion Matrix

Mean Average Precision

Mean Average Precision (mAP) is a metric used to evaluate object detection models. Confusion Matrix, Intersection over Union (IoU), Recall, Precision are the sub-metrics that form the backbone of the formula for the mAP accuracy. The mAP is calculated by finding Average Precision(AP) for each class and then average over a number of classes.

Creation of Ground  Truth Data

In order to train a model in deep learning (GTD) is needed which is accomplished by simply creating Base ROIs and annotating the existing cells within the Base ROIs. The number of cells in the GTD form that is available in this work (see Tab.1) were 10 files along with + 800,000 GTD utilizing the HSA KIT proprietary software. There are various phases involved in creating the GTD. The Carl Zeiss Image Data File contains a WSI file, which is first loaded into the HSA KIT (CZI). The annotations are set within the ROI with regard to a function of the HSA KIT. Then the blood cells structure are annotated and this structure has 2 sub-structures which are erythrocyte and leukocytes, the leukocytes is further divided into 45 different classes including (NET, Pseudo-Gaucher-Zelle and Megakaryozyt) and the quantity of the classes are (5399, 97 and 149) respectively. The quantity and quality of these annotations depends on the location, clarity and size of the base ROI. Below is a breakdown of the class distribution along with the corresponding percentages.

All data274,553617,7091495,39997
Used data48,88788,947594,58793
The available and used amount of data


The Deep Learning (DL) model creation framework PyTorch has grown to be very popular and effective. This Torch-based open-source machine learning library was created to boost deep neural network implementation speed and flexibility. PyTorch is a Torch and Python-based Deep Learning tensor library that is mostly utilized in CPU and GPU applications. Torch is an open-source ML library used for creating deep neural networks and is written in the Lua scripting language.

Selection of the data set

After creation of GTD, the following tables was used in the 3 Class AI model training and the erythrocyte and leukocytes AI model training:

Model TypeEpochsLearning RateBatch SizeTile Size
Instance Segmentation1000.00012512
The settings used for the 3 class AI training
Model TypeEpochsLearning RateBatch SizeTile Size
Instance Segmentation500.00011256
The settings used for the erythrocyte and leukocytes AI training


In this work, HSA KIT from HS Analysis GmbH was utilized to find CML cells and its structures. HyperCMLNet, which is a proprietary deep learning AI model with a focus on the instance segmentation and classification of CML cells was constructed in HS Analysis GmbH. The architectures Mask R-CNN which was the foundation for HyperCMLNet (type 1) and Vision Transformer which was the foundation for HyperCMLNet (type 2) were used in this work.

Interpretation and validation of result

The following data will show the comparison of both loss and mAP evaluations on the instances segmentation results that was performed by both of the trained (type 1 & 2) of the HyperCMLNet architectures for erythrocyte and leukocytes and the 3 classifications. The following table shows the actual obtained results from the model training.

 (A) Loss evaluation comparison, (B) mAP evaluation comparison

The comparison between the two architectures shows that the HyperCMLNet (Type 1) with the 3 classes has the lowest Loss out of all models while also having the lowest mAP as well. The figures also show that HyperCMLNet (type 2) has the highest Loss but also the highest mAP out of all models.

Visual interpretation of the results

After the AI training is finished for any model, it always ends in giving the user mathematical results such numerical results. Although they are important, another determination of the quality of the AI model is done by checking the detection manually. Which leads into this section of visual interpretation of the results, and this section illustrates the visualization of the instance segmentation of both of the (type 1 & 2) AI models of the CML images, the next figure displays multiple images where it shows the original and the (predictions) determined by the networks. It shows ((A) erythrocyte and leukocytes, (B) NET, (C) Megakaryozyt, (D) Pseudo-Gaucher-Zelle) cell detection. Images taken from the model trainings show that both (type 1 & 2) of HyperCMLNet have very similar detection. When comparing the (type 1 & 2) in terms of erythrocyte and leukocytes, both models show great detection both erythrocyte and leukocytes, both have a great level of detail and detection boundaries and their qualities is high for the most part however, it is difficult to distinguish a few details, at a closer inspection both made some errors separating some mashed up erythrocyte cells. As well as leukocytes although their error quantity was smaller due to their general area size being bigger when rivaled against the erythrocyte cells. When inspecting the visuals, it would seem that the erythrocyte detection with (type 1) seems just a tiny bit better with separation but then again it isn’t saying much since both do a bad job at separating extremely mashed up erythrocyte cells. Each architecture has its own idea of how to do separation especially in the case of a cluster of mashed up erythrocyte cells. When comparing the (type 1 & 2) models in terms of NET detection, again, the quality is very similar to each other where the majority of the quality is average to good due to both (type 1 & 2) detect some non-NET cells and both don’t detect some NET cells however, (type 2) seems to have a slightly better NET detection in general when compared to (type 1).

When comparing (type 1 & 2) against megakaryozyt and pseudo-gaucher. Both have bad detection quality and this is due to having low amounts of GTD used for training however, (type 2) seems to have a slightly better detection than (type 1) due to the (type 1) model not being able to detect some cells of each of the megakaryozyt and pseudo-gaucher and (type 1) in general not being as detailed around the boundaries of the detection when compared to (type 2).

Visualization of the training results of both types of HyperCMLNet

Interpretation of xAI

In  this  section, the results of the classes of CML will be compared to each other with the heatmap tool detection that was used in HSA KIT. The heatmap tool in HSA KIT is used to detect the prediction where the classes of the AI models are most likely to be located with its colors, the warmer (more red) is its, the higher the probability of the detection, and inversely the colder (more blue) it is, the less likely the detection prediction will occur. In short, the heatmap tool is used to visualize the hotspot of AI generated annotations. It is intended for the visualization AI projects with results containing large quantities of GTD on one slide such as the project that was used in this work.

We can see the actual comparison from the following figure, between the  ((A) erythrocyte and leukocytes, (B) NET, (C) Pseudo-Gaucher-Zelle and (D) Megakaryozyt), with the first column using (type 2) and the second one showing (type 1). In the case of (A), since there is very little difference between both architectures in detecting erythrocyte and leukocytes classes, it is difficult to really determine which one is most optimal however, both architectures show excellent hotspot predictions. This is due to the large quantity of detection made by each individual model.

In the case of (B), the detection quality for both architectures showed promising results however, it is clear that the heatmap detection of both the architectures were very similar as well and since the NET cells themselves are not in close proximity this leaded to the heatmap not making very large hotspots but instead individual ones.

In the case of (C) and (D), it is shown that the heatmap detection again is very similar but just like in the case of (B) the actual cell detection were few and far in between which caused the heat map detection to be singular.

Comparison of the HSA KIT’s heatmap tool for different ((A) erythrocyte and leukocytes, (B) NET, (C) Pseudo-Gaucher-Zelle and (D) Megakaryozyt) (type 1 & 2)  models

Summary and outlook

In this work, the mother structures (erythrocyte, leukocytes) and the 3 sub-classes of the leukocytes (NET, Pseudo-Gaucher-Zelle and Megakaryozyt) of chronic myeloid leukemia were detected using  deep learning  models and then were calculated and validated. In addition, HSA KIT’s heatmap tool was used to visualize these models to determine how the same cells classes with different architecture were recognized. HyperCMLNet (type 1) which is based on the Mask R-CNN architecture and HyperCMLNet (type 2) which is based on the Vision Transformer architecture were both implemented on GTD of CML. The aim of this work was to compare the detection multiple classes of CML on both architectures in instance segmentation in deep learning.