Introduction

Intra- and extracellular influences, such as radioactive radiation during radioimmuntherapy, can cause mutations or DNA double-strand breaks (DSBs). After induction of DSBs, phosphorylation of the histone H2A variant H2AX occurs within a few minutes (gH2AX). It can pose potential dangers to the cell and thus to the entire organism. Depending on the extent of the damage, programmed cell death (apoptosis) can be triggered or the cell can undergo necrosis. Because of that laboratorys are searching for a substance to protect subsidiary organs during radioimmuntherapy. For the verification of the effect of these, the „gH2AX Analyzer“ in the HSA KIT can be used to quantify the gH2AX-Signals, which correlate to the DSBs.

Advantages of the HSA KIT

  1. Evidenced-based development: The HSA KIT is based on evidence from experiments. Projects are archived, allowing data to be traceable even after 2 years. This ensures that the settings and parameters used can be retraced, even after a long period of time.
  2. Easy modification: Modifying experiments is made simple with the Copy-Tool feature of the HSA KIT. It enables the quick copying of projects, preserving previous data. A new project with a new IP is created, ensuring that no previous data is lost. Subsequently, new processing and parameter adjustments can be made, facilitating easy comparison of data and potential evaluation.

These features of the HSA KIT enhance data traceability, enable efficient modifications, and facilitate effective evaluation of experimental results.

Implementation

To optimize the detection of gH2AX signals, two deep learning models were trained. The first model, called „HyperNonVesselNet,“ is designed to detect erythrocytes and blood vessels. It divides the entire kidney into non-vessel and vessel areas. This model was trained using segmentation, which is the process of dividing an image into different regions or objects. The goal is to identify the relevant areas or objects in an image and separate them from the background and/or other objects. The training is based on semantic segmentation and was implemented using U-Net.

The second model, „HypergH2AXKidneyNet“, is applied as a substructure within the non-vessel area. It was trained using instance segmentation. The VisionTransformer was utilized for implementing this model.

Training

The ground truth data (GTD) used to train the deep learning model is created using the HSA KIT software. The sections are loaded into a new project, and a base ROI (Region of Interest) is initially drawn to define the area where the training or analysis takes place. To train the model with diversity, different areas are covered using the rectangle annotation tool. Within the base ROI, manual annotations can be set. Several semi-automatic tools based on classical image processing algorithms are available to speed up the annotation process. However, in this case, the brush tool has been found to be the most optimal. Therefore, all annotations were manually created by drawing along the contours of the objects. Example annotations are presented below:

The „HypergH2AXKidneyNet“ model was trained for the quantification of gH2AX signals. The training process was time-consuming, involving several optimization approaches that were considered, executed, and the results were verified.

The following figure shows a kidney which is quantificated with the HSA KIT. The yellow structure presents the Non-Vessel are, the red structures the erythrocytes and blood vessel, the black structures the gH2AX-Signals.

Signal detection with the HSA KIT

In order to verify the plausibility of the software results, the sections were individually examined. All structures automatically identified by the HSA KIT were considered, including the non-vessel area, erythrocytes or vessels, and gH2AX signals. Initially, the entire kidney was examined to rule out major errors. Subsequently, using the 11th time point (48 hours, Group 1, Animal 11) as an example, the displayed and hidden structures were compared (Fig. X).

The base ROI (Region of Interest), mentioned earlier, is represented by the black outline of the kidney in Figure Xa. This defines the area to be quantified, and all detected and analyzed signals are located within this marking. The area outside this marking is filled with black diagonal lines, indicating that this region is excluded from quantification. The red-marked structures represent erythrocytes or vessels. Due to the automatic detection of these structures, the exclusion of blood vessels and red blood cells does not need to be performed manually. This allows for the definition of the non-vessel area, depicted in yellow (Figure Xb). Within this area, the gH2AX signals are quantified, represented in black (Figure Xc). For result calculation, the non-vessel area serves as the basis for the entire kidney, as it contains only cells where potential DNA damage can occur.

Additionally, to assess the quality of the results, the staining detected as signals is displayed and hidden, and examined. The latter was done to facilitate a better comparison to determine whether these locations represent signals or not. Image sections were captured at a 40x magnification in the HSA KIT and later compared in the course of the study (Figure X).

The structures detected as signals are highlighted in green in Figure Xa. The red-marked area was identified by the software as an erythrocyte. The non-vessel area, which would typically appear in yellow, has been hidden in both image sections for a better overview. It can be observed that no signals are detected within the erythrocytes. As an example, red blood cells within the image are circled in purple. These structures are characterized by a hollow space filled with yellowish, irregular structures. Both the hollow spaces and the structures within them can vary in size and shape. The model was trained to only detect larger blood vessels and blood cells (>100 μm), as smaller ones are often present between cells (purple) and do not have a negative impact on the quality of the results, as the model does not falsely detect them as positives. Further details on this and the training process were explained in section „Training“. As can be seen in the image section, these structures were not detected as signals. Moreover, no structures in this image section were falsely identified as gH2AX signals, indicating that no false-positive results were generated at this location. Additionally, the image section shows that no false-negative signal was detected. Every distinct signal, regardless of its size, present in Figure X was correctly detected by the software. Therefore, the error rate at this point is 0% for both false positives and false negatives.

Automated area calculation of the gH2AX signals with the HSA KIT

The scanned OTs were analyzed using the HSA KIT. The parameters were optimized and adjusted according to the staining intensity by applying and verifying various settings, which will be explained in more detail later. The goal was to capture the entire area of each distinct signal and minimize the detection of unspecific staining and red blood cells. Through comparing the results of different settings, a Confidence value of 0.70 was found to be optimal. This parameter indicates the software’s level of confidence in the signals it detects, ranging from 0 to 1. Additionally, a range of 0 to 5,000,000 μm2 was set for the size range in which signals should be detected. Since the areas of DNA double-strand breaks can vary greatly, this option was not further considered.

Once the parameters were optimized, the model was applied to the samples, and the area of the gH2AX signals was automatically calculated by the software. The total areas of the kidneys were also calculated, allowing for a comparison between the total damage areas and the kidney areas. In addition, the percentage of damage within the triple determinations was averaged, and the standard deviation was calculated (Table 1).

The first part of Table 1a, which pertains to the first group, indicates that the mean percentages of the ratios between gH2AX total area and the total area of the kidneys continuously increase, doubling from the 1-hour value to the 48-hour value. The measurement results at 4 hours and 24 hours differ from each other by approximately X%, while the range between these values and the first and last values is approximately X%. The standard deviations, except for the 24-hour value (SD24h = X%), are below X%.The values for Group 2 were also summarized in tabular form (Table 1b).

Similarly, these values also show a continuous increase in the mean values. Analogous to Group 1, the 4-hour and 24-hour values do not differ significantly from each other. The value for the two-day incubation is the highest at approximately X%, while the value for the one-hour incubation is the lowest at X%. Again, the percentage standard deviations, except for the 4-hour value (SD4h = 0.109%), are below X%. The calculated and measured results of Group 3 were also compiled and presented in tabular form (Tab. 1c).

In this group as well, a continuous increase in values can be observed, which, however, plateaus between the 4-hour and 24-hour values. The 48-hour value reaches 15 times the value of the initial time point. Both the 4-hour and 48-hour values show a standard deviation higher than 0.1% (SD4h = X%; SD48h = 0.101%).

To visualize these values, a bar chart was created representing the percentage of DNA damage relative to the total area of the kidney. The averaged values of damages from the 3 treated animals per group and time point were used. Due to a high deviation, the 11th animal from Group 3 was treated as an outlier and was not considered for calculating the mean and standard deviation. The resulting chart displays the values for the four time points after the addition of the tested substances with their respective concentrations. The standard deviations were also included (Fig. X).

It can be observed that the animals incubated with the substance for only one hour (light blue bars) have incurred the least amount of damage. The bars representing the 4-hour values (dark blue) show the second lowest level of damage in groups 1 and 2, while in group 3, they exhibit more damage compared to the one-hour and 24-hour values. The green bar representing the 24-hour incubation decreases with decreasing concentrations of the protective substance. When one-tenth or one-hundredth of the concentration is used, the values decrease to half or one-fourth, respectively. The longest incubation duration shows the highest proportion of damage in all three groups, reaching its maximum with the intermediate dose. Through statistical analysis (ANOVA), it was determined that there are no significant differences between group 2 and group 3 for the 4-hour values, while group 1 shows significant differences compared to both. There are no significant differences between the three groups for the 1-hour values or between the 4-hour and 24-hour values within a group. Significant differences are present among the remaining values.

Group 1 (cProtective Substance = 1 mg/kg) shows a continuous increase in damage with an increase in incubation duration, with a fivefold increase in damage observed between the 1-hour and 4-hour values. After an additional 20 hours, only a non-significant increase is observed. Doubling the duration to 48 hours results in a slight doubling of the percentage of damage.

Group 2, which was treated with a 10-fold lower dosage than group 1 (cProtective Substance = 0.1 mg/kg), also exhibits a steady increase. The value after one hour of incubation is around X%, which triples after an additional three hours. Similar to group 1, a non-significant increase can be observed between the 4-hour and 24-hour values. The value after 48 hours of exposure reaches approximately X%, representing a fivefold increase compared to the 24-hour value and a 19-fold increase compared to the 1-hour value.

The value after one hour of incubation in group 3 (cProtective Substance = 0.01 mg/kg) is the lowest compared to the other two dosages, but statistically not significantly lower. The next value, after 4 hours, triples. The value after an additional 20 hours shows a slight, non-significant increase of about X%. A continuous increase in the percentage of gH2AX signals can also be observed within this group. Therefore, the last time point (48 hours) doubles the damage compared to the 24-hour value.

Upon closer examination, it is noticeable that, in general, the values of group 2 and group 3 fall within a similar range, while the values of the first group, which has the highest dosage, are higher. With the exception of the 48-hour value, a doubling of the percentage of damage can be observed in the first group. The latter is highest for the intermediate dosage (group 2) at approximately X%.

The inserted standard deviations do not show a clear pattern of increase or decrease.

Discussion

Before conducting the practical part of this study, it was expected that the investigated protective substance would have a protective effect. It was assumed that after one hour of incubation and using the highest dosage (Group 1 cProtective Substance = 1 mg/kg), the least damage would occur. Furthermore, it was expected that the percentage of damage would slightly increase after 4 hours and 24 hours and reach its peak at 48 hours of incubation. The expectation was that the highest dosage would have fewer damages at every time point compared to the other two dosages.

The first assumption was partially confirmed, as the damages were minimal after one hour of incubation, regardless of the substance concentration, contrary to expectations. The average percentage values for this incubation duration showed a variance of X% within the three groups. However, according to the conducted statistical analysis, they were not significant and could be explained by the natural differences in the kidneys. Great care was taken to mainly cut and stain the middle region of the kidney since it is known to have the highest number of DNA double-strand breaks. This ensured that approximately the same layer was processed for all samples to ensure result comparability. Thus, efforts were made to mitigate the natural differences in the kidney as much as possible.

The bar graph in Figure X indicated that the values increased as expected with longer incubation. However, the highest dosage (cProtective Substance = 1 mg/kg) showed the highest levels of damage, except for the 48-hour value. After 4 hours and 24 hours, approximately twice as much damage occurred compared to the lower dosages (Group 2: cProtective Substance = 0.1 mg/kg and Group 3: cProtective Substance = 0.01 mg/kg). Therefore, it can be concluded that the ten- and hundred-fold lighter doses have a higher protective factor than the highest dosage. However, the 48-hour value of Group 2 was approximately X% higher than that of Group 1, which contradicts the assumption that the lower dosage provides stronger protection. On the other hand, the value after two days of incubation in Group 3, with approximatelyX% less than Group 1, was the lowest. This would suggest that the substance with the lowest dosage exhibits a longer-lasting protective effect since the values after 1 hour, 4 hours, and 24 hours did not significantly differ within Group 2 and Group 3, but the 48-hour value was slightly less than double.

The difference between the 4-hour and 24-hour values within the groups was not significant according to the statistical ANOVA analysis. However, after an additional 24 hours, the percentages of damages increased significantly, with a doubling in the 1st and 3rd groups and a quadrupling in the 2nd group. This could be related to a decrease in the protective effect of the substance over time, which would require re-dosing. The biological half-life of the protective substance is less than 1 hour. Additionally, factors such as the biological half-life of the therapeutic substance in the blood and the time it takes for this substance to accumulate in and subsequently leave the kidney or tumor play a role. This half-life is approximately 40 hours. The concentration in the kidney is maximal after 1 to 2 hours and then decreases. These values cannot be supported by literature references due to patent protection reasons. However, it takes some time for these DNA damages to form γH2AX signals. Furthermore, Actinium-225 was used, which has a half-life of approximately 10 days (periodic table). Due to pharmacodynamic interactions, it is not possible to make precise statements about which substances induce which events. Therefore, this must be determined empirically. The stagnation of values between the time points of 4 hours and 24 hours indicates that the substance had the highest protective effect at these time points.

The relative standard deviations showed no higher value than X% between the triplicate determinations, except for the 48-hour value of group 3. This allowed for the use of each value in creating the bar graph. The 48-hour value of the last group had a deviation of X%, which is why it was not further considered in the results. The associated file, which represented the 11th animal of this group, was manually examined to understand the error. It was found that there was an error during staining. The automatically microscopied section had a water haze, which made the brown staining of the cell nuclei, representing the γH2AX signals, not intense enough to be correctly detected by the software. This could be due to the presence of water under the section during the transfer of the kidney section to the mounting medium. This resulted in many false-negative results from the HSA KIT, which in turn showed a relatively low damage of X% compared to the other two values. Regardless, each section was manually examined and showed clear and intense signals.

The standard deviations showed no correlation with the number of damages. However, in a previous study (practical report), such a correlation was observed. As the number of damages increased, the relative standard deviation also increased, which was attributed to a higher false-negative error rate of the software, as the generated signals were mostly smaller and could not be detected. With the new deep learning models of the HSA KIT, trained as part of this study, even the smaller signals were detected, so the error rate had no influence on the standard deviations.

With the HSA KIT, it is possible to adjust certain parameters according to the sections. Care was taken to choose these parameters in a way that allowed for the automatic detection of as many manually identified γH2AX signals as possible. Additionally, the entire area of these signals was considered, as later in the study, the calculation of the signal area was intended. A confidence level of 0.70 was found to be optimal. Both small and large γH2AX signals were recognized as such, and their entire areas were detected. During the manual examination of individual sections, it was observed that some tissue sections exhibited particularly intense brown background staining. This may have occurred because endogenous peroxidase activity was not completely blocked, leading to the co-staining of white and red blood cells. Therefore, these sections were specifically checked for false-positive results since the signals to be detected were also stained brown. It was found that, except for a few signals, no significantly high number of false detections was obtained. The error rate was less than 5% for both false-positive and false-negative detections. Additionally, nonspecific staining may have originated from incomplete removal of paraffin, as it can mask the specific staining. Furthermore, inadequate rinsing of the tissue sections or overdevelopment of the substrate reaction due to an excessive amount of chromogen in the solution could lead to nonspecific staining. Since this was a rapid staining method, highly concentrated reagents and relatively short incubation times were used. As a result, even every additional second during incubation caused significant changes, such as drying out of the sections, which mainly occurred at the edges of the kidneys. These nonspecific stainings were also not detected as false positives.

During the manual analysis of the sections, some spots were noticed that exhibited intense brown to black staining. Some of these spots were recognized as signals by the HSA KIT because they resembled γH2AX signals in their structure and color. However, these were residues that detached from the labeling field of the tissue cassette and adhered to the section. This most likely occurred during the numerous treatments with xylene, as the tissue cassette holder was inserted too far into the tanks, and the fat-dissolving substance soaked the labeling. These spots were also observed on the isotype controls, supporting the aspect that they cannot be signals. Furthermore, these spots exhibited a highly irregular morphology, as γH2AX signals have a round to oval shape, while these spots had a full, patchy structure.

However, the crucial point indicating that these spots were not signals is that they were not localized within cells but randomly distributed in the tissue. To avoid these false-positive results, a maximum size of 900 μm2 was set for signal detection in the HSA KIT. This filtered out all signals larger than 900 μm2. This value was empirically determined within the scope of this study and represented the threshold between the largest γH2AX signal and the smallest nonspecific spot.

The recall value was 0.995, and the precision value was 0.966, indicating that there were more false-positive results (i.e., a signal being detected when it wasn’t present) than false-negative detections. This could be observed from the number of false negatives (FN) and false positives (FP). However, none of the calculated values deviated significantly from 1, suggesting that the model was suitable for the accurate quantification of γH2AX signals. The harmonic mean, F1-score, was 0.980, further confirming that this was an appropriate deep learning model.

To make accurate statements regarding the extent and magnitude of the substance’s effects, further characterization of the radioimmunotherapy targeting prostate cancer and the kidney protection substance is required. These experiments could include additional time points and animals to statistically differentiate smaller differences. Furthermore, staining could be performed more frequently and compared to exclude errors from the software and staining procedure. Additionally, a study could be conducted in which mice receiving the substance under the same treatment conditions are compared to mice not receiving it. This could not be carried out in this study due to the unavailability of the respective mice.

Due to the ongoing research and naming restrictions on the substance, there was limited literature available for comparison in this section.

Metrices

To further investigate the quality of the model, metrics were calculated. For this purpose, a region of a kidney was selected, and gH2AX signals were manually detected.

A total of 434 structures were annotated. Subsequently, the HyperH2AXKidneyNet model under investigation was applied to the same region of the section (Fig. XX)

With the DL model, a total of 447 objects were detected as signals. In the next step, a comparison was made with HSA KIT to determine which manually annotated signals were automatically detected by the model and which ones were not. Based on this, the values for the number of FN (false negatives), TN (true negatives), FP (false positives), and TP (true positives) were counted. The following values were obtained: FN: 2 TN: indeterminable FP: 15 TP: 432 Using these values and equations (1.1) to (1.4), the metrics accuracy, precision, recall, and F1-score were calculated. Since the TN values are indeterminable, they were not further considered. The following values were determined: Accuracy = 0.962; Precision = 0.966; Recall = 0.995; F1-score = 0.980 The recall value is the highest among all the values, followed by the F1-score. The precision value is slightly higher than the accuracy.

HSA KIT vs. ImageJ vs. QuPath

GTD

HypergH2AXKidneyNet

ImageJ

QuPath

To compare the Results better we used a zoomlevel of 80.

GTD HSA KIT:

HypergH2AXKidneyNet

ImageJ:

QuPath:

The following confusionmatrix is a template and were used to create one for each of the softwaresolutions.

After that the solutions were summarized in a table.

With these data a column chart was created where the blue column is the HSA HypergH2AXKidneyNet, the black column Ki-67, the red one ImageJ and the last one (yellow) QuPath. The metrices precision, recall and F1-score were calculated for each of them to compare the methods better.

If you compare all of the columns you can see that the model „HypergH2AXKidneyNet“ the column with the highest average score of all the models have. The score of the recall value from Ki-67 is a little bit higher but this is because of the fact that this model only can detect the biggest signals. The detections are right but too few. ImageJ and QuPath are approximately on the same level.

Manual plausibility


A manual plausibility analysis was conducted to assess whether the numerical values obtained using the automated quantification software (HSA KIT) appeared plausible compared to the sections. Each section was manually examined in the HSA KIT, and comparisons were made both within the respective groups and across different groups. Since the damages had comparable sizes, the number of damages could be equated with size, allowing the plausibility analysis of size values based on the number of damages to be conducted. To make qualitative statements regarding the extent of damages, sections from the same region of the kidney were used for comparison. The gH2AX signals were counted within an area of 200 μm x 200 μm at 20x magnification, and the number of signals in the sections within the groups was compared. In order to maintain the scope of this study, only one animal or area of the kidney was represented and compared for each group and time point. The entire kidney section is depicted below them, and the chosen area is indicated by the blue cross.

The kidney sections at the 1-hour and 48-hour time points do not show any nonspecific staining at first glance, while this is the case for the other two time points. A brownish, hazy staining extends over the majority of the kidney. Looking at the kidney from Group 1, which was incubated for one hour (Fig. Xa), it is evident that this sample has the fewest damages. Only one signal with a diameter of approximately 3 μm was observed. The 6th time point, which occurred after 4 hours of incubation, exhibits around 25 distinct signals. In terms of size, these signals range between 3 to 5 μm. After 24 hours, the section shows 39 damages. Here, occasional signal complexes can also be found, resulting in a size spectrum ranging from 3 to 10 μm. In the last section, 54 signals were counted. Larger complexes can also be found here, ranging in size around 10 μm. The same approach was applied to Group 2.

Except for the kidney after 24 hours of exposure, no nonspecific staining is noticeable. The 14th time point, chosen as a representative for the one-hour incubation of Group 2 (cProtective substance = 0.1 mg/kg), shows a relatively small DNA damage of approximately 2 μm. The section from the 4-hour incubation also exhibits only one signal, but this signal consists of multiple nuclear damages, resulting in roughly five times the total size, namely 9.5 μm. The next time point displays 8 damages, among which there is a signal comparable in size to the previous time point. The remaining signals have a size of approximately 5 μm. After 2 days of exposure, significantly more signals are present, forming larger complexes with a diameter of about 5 to 10 μm. A total of 74 signals were counted. Furthermore, the sections of Group 3 (cProtective substance = 0.01 mg/kg) were compared (Fig. X).

Nonspecific brownish staining is also only visible in the section that was obtained after 24 hours of incubation. The section from the one-hour treatment shows 4 distinct signals with a size of approximately 3 μm. The next section exhibits 7 signals in the same size range as the signals from the one-hour incubation. The 24-hour section shows 11 gH2AX signals, including several larger signals measuring about 5 to 6 μm. After two days of exposure, 34 signals are observed, some of which have a diameter twice as large, namely 8 to 10 μm, compared to those generated after one or four hours.

Abstract

An immunohistochemical staining of gH2AX signals was performed, with a positive control and a negative control for each section supporting the success of this staining. Subsequently, quantification was carried out using the HS Analysis study management software (HSA KIT), and the results passed a plausibility test. The models used and trained in this study were trained specifically for this purpose, namely the „HyperNonVesselNet“ and „HypergH2AXKidneyNet“. The first model was designed for automated detection of vessels or erythrocytes, excluding them from the total kidney area to determine the non-vessel area. The second model was trained to detect gH2AX signals, with its application limited to the non- vessel area as a substructure. This approach allowed achieving an error rate of less than 5 % and enabled the quantification of both small, weak signals and large, intense signals.

After quantification, it was found that there were no significant differences in the values after one hour of incubation. Similarly, within the three groups, no statistically significant differences were observed between the 4-hour and 24-hour values. This suggests that the protective substance had its highest effectiveness at these time points. Additionally, it was observed that at all dosage levels, the value significantly increased after 48 hours of incubation, indicating a decline in the substance’s efficacy by that time. Therefore, evaluating multiple administrations of the substance could be a promising approach. The lowest dosage, namely cprotective substance = 0.01 mg/kg, exhibited the highest protective effect, as the values at 1, 4, and 24 hours were of a similar magnitude to those of Group 2 (medium dosage, cprotective substance = 0.1 mg/kg), but the final value was significantly lower. The highest dosage (cprotective substance = 1 mg/kg) consistently showed the highest number of damages at all time points.