Deep Learning has shown remarkable potential in medical diagnostics, particularly in detecting lung cancer early through medical imaging analysis. However, the lack of transparency and interpretability of deep learning models has raised concerns about their reliability in critical medical scenarios. To address this challenge, Explainable Deep Learning (xDL) techniques are being developed to provide clear and understandable explanations for AI predictions. xDL aims to bridge the gap between the capabilities of complex neural networks and the understanding of human experts. By offering transparent insights into how the AI arrives at its diagnostic decisions, xDL promotes a more collaborative human-machine interaction. In the context of lung cancer diagnosis, xDL allows medical practitioners to comprehend the rationale behind the AI’s conclusions based on analyzed imaging data. The integration of xDL in the lung cancer diagnosis process promises to revolutionize the interaction between medical professionals and AI systems. Instead of treating deep learning models as opaque black boxes, xDL empowers clinicians to validate the AI’s decision-making processes, building trust in its diagnostic capabilities and providing valuable insights into relevant patterns and features for lung cancer detection.

lung cancer

Lung cancer is a dangerous form of cancer that originates in the lung cells and is responsible for a significant number of global cancer-related deaths. This condition arises when normal lung cells experience genetic mutations, leading to uncontrolled growth and the formation of tumors. Consequently, lung function is affected, causing symptoms like persistent cough, chest pain, shortness of breath, and coughing up blood. There are two main types of lung cancer: non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC), with NSCLC being more prevalent. Smoking is the primary risk factor for lung cancer, though it can also occur in non-smokers due to various environmental exposures. Early detection is critical for improving outcomes, and diagnosis involves imaging tests and biopsies. Treatment options include surgery, radiation therapy, chemotherapy, targeted therapy, and immunotherapy, tailored to individual factors such as cancer type and stage, overall health, and patient preferences.


HSA kit is a deep learning software which can analyze lung cancer by detecting high-resolution images of tissues

HSA kit detect the lung cancer by two most important structures:

  1. stroma-red
  2. tumor-green

Whole slide images (WSI)

Whole Slide Images (WSI) are complete digital scans of tissue slides used in medical imaging and pathology. These images provide highly detailed representations of tissue samples at different magnifications, resembling traditional microscopy. WSI files can be quite large, but they offer the advantage of remote access, enabling easy collaboration and analysis. WSI finds applications in pathology diagnosis, research, education, and quality assurance. Despite some challenges like file management and data security, the use of WSI in healthcare is on the rise due to its potential benefits.

Digital representation of images

Digital representation of images involves converting visual information, including colors and intensity, into a numerical format that can be processed and interpreted by computers and electronic devices. In this process, images are broken down into small square units called pixels, with each pixel assigned specific color and brightness values. These values are typically represented using the RGB color model, where combinations of red, green, and blue intensities create a wide range of colors. The resulting digital image is essentially a two-dimensional grid of pixels, with higher resolution images containing more pixels and finer detail. Various image file formats are used to store digital images, employing compression techniques to reduce file size while maintaining image quality. Once in digital form, images can be easily edited and manipulated using software, allowing for operations like resizing, color adjustments, and filters to alter the image’s appearance.

we have three types of slides:

  1. HE slides
  2. KI67 slides
  3. CK slides

the different between each slide based on their staining, also each of them has different exposure of time and different Lense to be scanned.

HE slides

HE slides in biology pertain to tissue samples that have been treated with Hematoxylin and Eosin staining techniques. This widely used method in histology and pathology involves coloring tissue components differently to facilitate microscopic examination. Through this technique, nuclei are stained in shades of blue-purple using hematoxylin, while eosin imparts various shades of pink to cytoplasm and extracellular structures. This staining process is crucial for identifying cellular structures and abnormalities within tissues, aiding in the diagnosis of medical conditions.

KI67 slides

KI67 slides in biology refer to tissue samples that have been prepared and stained for the Ki-67 protein. This protein is a marker of cell proliferation, and its presence indicates actively dividing cells within the tissue. Staining for Ki-67 involves using specific antibodies that bind to the protein, allowing researchers to visualize and quantify the number of proliferating cells under a microscope. This technique is commonly used in research and medical settings to assess the growth rate and activity of cells in various tissues, helping to understand processes like tissue regeneration, cancer growth, and other cellular dynamics.

CK slides

CK slides in biology involve tissue samples that have been subjected to staining techniques targeting Cytokeratins (CKs), a group of proteins found in cells, especially epithelial cells. These proteins play a key role in maintaining the structural integrity of cells and are particularly abundant in tissues like skin, glands, and lining of organs. Staining for CKs employs specific antibodies that bind to these proteins, allowing researchers to identify and visualize epithelial cells and their patterns under a microscope. This staining method is widely used in histology and pathology to classify and study various types of tumors, as different CKs can indicate the origin and characteristics of cancerous cells.

Digital representation of images involves converting a 2D image into a grid of pixels, forming a matrix with each pixel represented by numerical values. These values encode the color and intensity of the corresponding part of the image. In the RGB color model, each pixel is described by three numerical values for red, green, and blue intensities. These values range from 0 to 255, allowing a broad spectrum of colors to be displayed. The resolution of the image depends on the number of pixels, with higher resolution images containing more pixels and finer details, but also requiring more memory.

In this table we can see all information that we need, we had 9 slides of lung cancer tissues, all of them are scanned, also as you can see the number of annotations of each slide and each one is done.

   Deep learning (DL)

Deep Learning (DL) is a specialized field within machine learning that centers on artificial neural networks with multiple layers. These networks are designed to emulate the human brain’s structure and functioning, enabling them to comprehend intricate patterns and connections in data. DL has become increasingly popular due to its capacity to manage vast amounts of data and autonomously extract crucial features for decision-making. A crucial element of deep learning models is the artificial neuron, which processes input, applies mathematical transformations, and produces an output. Neurons are organized into layers, including input, hidden, and output layers. The deep neural networks consist of multiple hidden layers, allowing them to learn complex patterns and hierarchies. During training, the model adjusts the numerical parameters of neurons to minimize the difference between predicted and actual outputs. This training process involves forward propagation, where input data is passed through the network to calculate neuron activations, and backpropagation, which adjusts the neuron parameters based on error propagation. Deep learning has achieved significant success in image and speech recognition, natural language processing, recommendation systems, autonomous vehicles, and healthcare diagnostics. Its ability to learn complex representations from raw data, handle vast datasets, and leverage hardware advancements has made it a revolutionary technology in various domains.

Feature Extraction

Feature extraction is an essential process used in machine learning and computer vision to transform raw data into a more concise and meaningful representation. This step involves selecting or creating relevant features that capture important characteristics of the data, making it easier for machine learning algorithms to process and make accurate predictions. Feature extraction is performed to simplify the data while retaining its essential information, which can lead to more efficient processing and better generalization of machine learning models. Two types of feature extraction methods are handcrafted features, where experts manually design features based on domain knowledge, and learned features, which are automatically derived from the data using unsupervised or supervised learning techniques. Handcrafted features are designed to capture specific patterns relevant to the task, while learned features adapt to complex data distributions and often outperform handcrafted features in performance. Feature extraction plays a critical role in preparing data for various applications, including image recognition, speech processing, and natural language understanding.

Ground truth data

Ground truth data is verified and labeled data used as a reliable reference to evaluate the performance of machine learning algorithms and models. It represents the accurate outcomes or attributes associated with the input data. In supervised learning, the algorithm learns from the input data and their corresponding ground truth labels to make predictions or classifications. By comparing its predictions to the correct labels, the algorithm refines its parameters to improve accuracy. Ground truth data is obtained through human annotation or expert judgment, ensuring its reliability. It is crucial for training machine learning models to make accurate predictions on real-world data. However, obtaining high-quality ground truth data can be time-consuming and costly. In cases where it is challenging to acquire ground truth data, researchers may use methods like crowdsourcing or active learning. Evaluating the model on separate data with known ground truth labels allows researchers to measure its performance metrics. Ground truth data plays a fundamental role in various tasks, such as image recognition, natural language processing, and data analysis, ensuring the effectiveness and accuracy of machine learning models in real-world applications. in this figure you can see the number of files.

Neural Networks

Neural networks are artificial intelligence models inspired by the human brain’s structure and functioning. They are used to solve complex problems by learning patterns and relationships in data. Composed of interconnected artificial neurons, neural networks process input data through layers, applying mathematical operations to produce outputs. During training, neural networks adjust their connections (weights) using labeled data to minimize prediction errors. This process, called backpropagation, updates the network’s parameters to improve accuracy. Neural networks find applications in various domains, such as image and speech recognition, natural language processing, and decision-making. Their ability to handle complex patterns in data has led to their widespread adoption and success. Different types of neural networks, including feedforward, CNNs, and RNNs, cater to specific tasks like image processing, sequential data analysis, and language modeling. Despite their effectiveness, neural networks face challenges, such as overfitting and the need for ample labeled data. Ongoing research seeks to improve neural network performance and address these issues through advanced architectures and optimization techniques.

  Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a specialized type of artificial neural network that excels in analyzing visual data like images and videos. It mimics the human brain’s visual processing capabilities, automatically learning and extracting meaningful features from the input data. The CNN’s core component is the convolutional layer, which applies filters to detect patterns in the input data. These filters are adjusted during training to capture various visual features. Activation functions and pooling layers further enhance the network’s ability to recognize patterns efficiently. CNNs use fully connected layers to map the extracted features to the final output classes. They excel in learning hierarchical representations, enabling them to handle complex visual tasks. CNNs are trained through supervised learning, adjusting their parameters to minimize prediction errors using labeled training data. They are widely used in image recognition, object detection, and other visual recognition tasks.

    Mask Region-Based Convolutional Neural Network (Mask R-CNN)xxxx.

      Vision Transformer (ViT)xxx.

Explainable DL (xDL) and Human Machine Interaction (HMI).

Explainable Deep Learning (xDL) is a specialized branch of deep learning that focuses on creating models and methods that can offer clear and understandable explanations for their predictions and decisions. Unlike traditional deep learning models that are often seen as inscrutable black boxes, xDL strives to produce interpretable outcomes, making it easier for humans to comprehend and trust the model’s decisions. xDL employs various techniques to visualize and explain the learned features, highlight the significant input factors, and provide insights into the model’s decision-making process. This transparency is especially valuable in critical applications like healthcare and finance, as it helps build user confidence and ensures that the model’s predictions are reliable and easily understandable. Human-Machine Interaction (HMI) involves studying and designing interfaces that enable efficient communication and collaboration between humans and machines. It encompasses the ways humans interact with machines, including computers and robots, to accomplish tasks, access information, and control functions. HMI focuses on creating user interfaces that are easy to use, intuitive, and responsive to human needs. The goal is to establish a smooth and natural interaction between humans and machines, minimizing the learning curve and enhancing the overall user experience. HMI considers various interaction modes, such as visual interfaces, voice recognition, touch screens, and gestures. HMI is critical in domains like consumer electronics, automotive systems, virtual reality, and healthcare. The effectiveness of HMI significantly impacts user productivity, safety, and satisfaction, making it an essential area of research and development in human-computer interaction.

 Heat Map.

A heat map is a graphical representation that uses colors to visualize data values within a dataset. It helps display large amounts of data in an easily interpretable way. In a heat map, the intensity of colors corresponds to the magnitude of the data values, with warmer colors representing higher values and cooler colors representing lower values. Heat maps are valuable for identifying patterns and trends in data, especially in large datasets or when dealing with tables or grids. They find applications in data analysis, finance, weather forecasting, biology, and sports analytics, among other fields

  Cost and loss function.

Cost and loss functions are crucial components in machine learning, particularly in supervised learning tasks. They serve as measures of the model’s performance and guide the model’s parameter adjustments during training. The loss function quantifies the discrepancy between the model’s predictions and the actual labels in the training data. The objective is to minimize this function, as a lower value indicates better prediction accuracy. The choice of the loss function depends on the task and data type. The cost function, which is synonymous with the objective function or loss function, aggregates the loss values across the entire training dataset. Minimizing the cost function helps find the best model parameters to fit the training data accurately and produce generalized predictions.

   Stochastic Gradient Descent (SGD).

Stochastic Gradient Descent (SGD) is an optimization algorithm commonly used in training machine learning models, particularly in deep learning. It is a variation of the gradient descent algorithm used to minimize the loss function during model training. Instead of computing the gradient over the entire training dataset, SGD randomly selects a small subset of data (mini-batch) to estimate the gradient. By using mini-batches, SGD updates the model parameters more frequently, making it computationally efficient for large datasets. The random selection of mini-batches helps SGD avoid local minima and improve generalization to new data. However, the randomness can introduce more noise in the optimization process, causing fluctuations in the loss function during training. To address this, learning rate schedules and adaptive learning rate methods like AdaGrad, RMSprop, and Adam are often used with SGD to control the learning rate and improve convergence


Regularization is a technique used in machine learning to prevent overfitting and enhance the model’s ability to generalize to new, unseen data. It does this by introducing a penalty term to the model’s objective function, discouraging overly complex patterns and reducing the impact of irrelevant features. Different types of regularization techniques, such as L1 and L2 regularization, add penalty terms based on the model’s weights, promoting sparsity and even weight distribution. Dropout is another regularization method that randomly deactivates neurons during training to improve the model’s robustness. Regularization is especially valuable for complex models with many parameters, as it helps strike a balance between model complexity and fit to the training data, leading to improved generalization and better performance on new data.

  Encoder-Decoder Structure

The encoder-decoder structure is a framework used in machine learning and neural networks for tasks like sequence-to-sequence mapping and language translation. In this arrangement, the encoder processes input data, converting it into a condensed representation, often referred to as a context or thought vector. This representation captures the essential information from the input. The decoder then takes this context vector and generates the desired output sequence, such as translated text or another sequence of data. This structure is widely used in applications like machine translation, text generation, and speech recognition, enabling the model to understand and generate complex sequences by breaking down the process into encoding and decoding stages.


Hyperparameters are the predefined settings of a machine learning model. They’re decided upon before training starts and remain unchanged during the training. Just as a chef decides on the oven temperature before baking, in machine learning, the right choice of these settings can determine the success of the model’s training.

  Learning Methods

Which is a part of machine learning, utilizes multi-layered neural networks to dissect data intricacies. The main methods for training these networks are supervised, unsupervised, and reinforcement learning. While supervised learning relies on data with predefined labels, unsupervised learning works with data without specific labels, and reinforcement learning focuses on making decisions based on rewards or penalties.


Machine Learning Metrics: Within the realm of machine learning, metrics serve as vital measures to assess how effectively a model operates. These benchmarks help in fine-tuning and enhancing the model. Key metrics encompass accuracy, precision, recall, F1-score, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE), and their usage varies based on whether the task is classification or regression.

Metric Status: Evaluating a metric involves checking its authenticity, consistency, relevance to the task, clarity of understanding, and its ability to detect variations. It’s vital to confirm that the metric accurately represents its purpose, offers stable results over time, fits the task’s needs, is comprehensible, and can recognize shifts in results.

Validity: Ensuring the metric accurately reflects its intended purpose.

Reliability: The metric’s stability across multiple instances of measurement.

Applicability: The appropriateness of a metric for a specific machine learning task or issue, like the potential unsuitability of accuracy for datasets with uneven classes.

Interpretability: The ease with which the metric’s results can be comprehended and used to derive meaningful conclusions.

Sensitivity: The metric’s ability to detect and respond to variations in how the model performs.

  • True Positive (TP): Instances where the model’s positive prediction matches the actual positive outcome.
  • False Negative (FN): Situations where the model misses a positive outcome, predicting it as negative.
  • False Positive (FP): Cases where the model incorrectly flags a negative outcome as positive.
  • True Negative (TN): Occasions where both the model’s prediction and the actual outcome are negative.

Mean Average Precision (mAP)

In tasks like object recognition and image segmentation within deep learning, the Mean Average Precision (mAP) stands out as a crucial performance metric. It’s not just about spotting an object in a picture; it’s also about pinpointing its exact position using a bounding rectangle. This dual responsibility necessitates a holistic metric capturing both classification and localization accuracy.

mAP offers this holistic insight by integrating precision and recall at diverse decision boundaries. To break it down:

  1. Precision evaluates the accuracy of the objects the model identified.
  2. Recall assesses the model’s ability to spot all actual objects in a photo.

For each object category, a curve of precision versus recall is plotted by adjusting the model’s decision boundary. The area beneath this curve provides the average precision (AP) for that specific category. By averaging the APs for all categories, we obtain the mAP, which offers a consolidated view of the model’s efficacy across different object types.

Implementation and Results

Creation of ground-truth data

Here’s a step-by-step guide on how to create ground-truth data:

Define the object

first of all we have to know what is our target, and for which purpose we will collect data, in this case our purpose was lung cancer.

Data Collection

  • Pinpoint where you’ll obtain the primary data. It might come from sensors, photographs, sound clips, written files, and more.
  • Ensure that the collected data genuinely mirrors the real-world scenarios or situations for which you’re designing the models.“

so the data collection in this case started from collecting slides of lung tissues and scanning it. the scanning duration for each slide took me 3 hours and the process was :

  1. preparing the slide
  2. put it under the microscope (the microscope was connected with the computer)
  3. opening the HSA kit in that computer
  4. and preparing HSA kit to scan, for each slide types was different exposure time
    • HE slides was scanning by 250 exposure of time
    • KI67 and CK slide was scannng by 600 exposure of time
  5. after that preparing the vignette filter and display it
  6. then scan it with very high quality pixel by pixel and and for the blurry part u have to use the sharpness knob in the right of the microscope

Annotation and Labeling

  • Choose an appropriate tool or system to mark your specific kind of data.
  • Provide annotators with detailed and unambiguous directions. This step is crucial for maintaining uniformity in the data annotations.

first you have to create your own base ROI for annotating

after creating your base ROI you have to annotate the tumor part in that base ROI so the model knows which part is tumor

after all of this the stroma part should be annotate as well

Quality Assurance

  • Frequently review the labeled data to confirm it adheres to set quality standards.
  • Think about having more than one person annotate the same data and employ methods like majority consensus to determine the final label.
  • Use a portion of the labeled data to train a model and evaluate its efficacy on a separate portion. If results are subpar, consider revising the ground-truth dataset.“

Human-Machine Interaction (HMI)

(HMI) is the relationship between human and machines. To classify and make the GTD the annotation tool is used which contains large set of data from hundreds to thousands. For an excellent and accurate concept, a large data base is needed Accuracy is the key in this work. Good results depend on the amount of the given data. Increasing data leads to better results.

Size of the data is important but at the same point the quality of the generated samples is not less.

Model nameModel TypeStructuresEpochsBatch SizeLearning Rate
HyperlungNet V1Instance segmentationStroma tumor20020.0001
HyperlungNet V2Instance segmentationStroma tumor20020.0001
HyperlungNet V3segmentationstroma20020.0001
HyperlungNet V4segmentationtumor20020.0001
HyperlungNet V5segmentationtumor20020.0001
HyperlungNet V6segmentationStroma tumor20020.0001
HyperlungNet V7segmentationStroma tumor20020.0001
HyperlungNet V8segmentationStroma tumor20020.0001
HyperlungNet V9Instance segmentationtumor20020.0001

Store and Backup

Architecture of a network


Selection of the data set

xAI technique and results

Environment for Computing

Environment for testing