Morphological heterogeneity evaluation in model cells
To establish an effective and label-free phenotypic method for screening candidate drugs for SBMA, morphological heterogeneity in the SBMA model cells (healthy AR-24Q and diseased AR-97Q cells that harbor different poly-glutamine repeats in ARs) was quantified.
The results of the bulk assay (mitochondrial activity assay and metabolism measurement) on 20,000–30,000 cells revealed significant differences between model cell types even without AR stimulation (Fig. S1). However, these differences were undetectable in microscopy images examining < 100 cells per image (Fig. 2a). However, diverse morphologies were observed upon assessing 300 cells collected from replicated wells (Fig. 2b). Morphological analysis revealed that distinguishing AR-24Q and AR-97Q cells was difficult as the population was heterogenic and exhibited several overlapping morphologies. iD examination of single cells evaluated using principal component analysis (PCA) revealed that cells in each model type exhibit complex morphological heterogeneity in the multi-dimensional data space described using 14 morphological parameters. Thus, it was difficult to describe cell type characteristics (Fig. 2c).
Next, morphological analysis was performed using the population profile. The iDs were summarized as “population data (pD to represent each cell type (Fig. 2d–e). Consequently, the morphological characteristics of different cell types were distinctively identified as evidenced by distantly clustered plots and heatmaps. However, contamination of different cell types in the clusters with a heterogeneous heatmap pattern indicated that the summarized pDs were heterogeneous and consequently interfered with stable clustering.
In silico FOCUS establishment for characterizing cells with high heterogeneity
To improve the stability of analysis, we study aimed to minimize the experimental bias that affects morphological information. Bias from “different fields-of-view” or “different wells” (Fig. S2) is prominently observed in case of morphologically heterogeneous cells. To mitigate this effect, the iDs were pooled in silico (Fig. S3). Although data pooling effectively minimized this bias, it reduced the data size that reflects morphological variety. Therefore, bootstrap was introduced to generate “pDs with bootstrap (b-pDs)” for statistical variation, which further improved the ability to distinguish the morphological differences between cell types (Fig. 2f–g). However, the heatmap pattern within the same cell type cluster indicated heterogeneous b-pDs. This suggested a high degree of heterogeneity in the cell population, which interfered with the generation of stable population data and clustering.
Next, this study aimed to enhance the morphological characteristics of each cell type and reduce “morphologically overlapping cells” from the pDs. An anomaly discrimination concept was introduced in the Mahalanobis-Taguchi (MT) method (Figs. 1b and S3)30,31. Drug-responsive cells with unique morphological characteristics (compared to the control cells) were defined as “anomaly iDs” based on the Mahalanobis distance. To enhance the effect of anomaly iDs in the pDs, the top 50% of iDs ranked in the order of Mahalanobis distance were acquired. In this concept, threshold need not be considered for anomaly iD definition. Thus, an increased number of morphological changes occurring due to drug response leads to an increased number of pDs from in silico FOCUS (= focused pDs) comprising anomaly iDs.
To demonstrate the effectiveness of FOCUS in silico, cell type discrimination was evaluated using focused pDs. Conceptually, cell-specific anomaly iDs were concentrated on, and the number of cells with major overlapping morphological features was minimized using in silico FOCUS (Fig. 2h). The differences in morphological characteristics between the cell types when analyzed based on focused pDs (Fig. 2i–j) were found to be more distinct than the differences observed on analysis based on raw pDs (Fig. 2d–e) and b-pD (Fig. 2f–g). The pattern of focused pDs in the heatmap was homogeneous, indicating that in silico processing of FOCUS data effectively enhanced the stability of phenotypic evaluation. Comparative analysis of the coefficient of variations (CVs) among pDs in all morphological parameters revealed that the large variance reflecting the experimental bias in raw pDs decreased by more than fivefold in focused pDs, which were significantly more stable than b-pDs (Fig. 2k). In focused pDs, the number of significant morphological parameters (AR-24Q vs. AR-97Q cells) increased (Fig. 2l). Hence, in silico processing of FOCUS data effectively generated stable pDs and clusters.
Dihydrotestosterone (DHT)-responding phenotype analysis in model cells
Next, the ability of in silico FOCUS to identify drug-responding phenotypes in model cells was examined. DHT, which was selected to induce characteristic phenotypic changes observed in SBMA21,22,26, is an AR ligand that induces disease phenotype in AR-97Q cells by promoting pathogenic nuclear AR accumulation and cell death through transcriptional dysregulation. The physiological AR cascade activates AR-24Q cells without any detrimental effects.
Visual analysis of DHT stimulation response revealed that healthy cells elongated to a greater degree than diseased cells (Fig. 3a). However, distinguishing between control and DHT-responding conditions was found to be difficult upon comparing the length_width_ratio distribution of 300 cells from each condition (Fig. S4a).
To emphasize the morphological characteristics of the DHT-responding cell sub-population, in silico FOCUS was used to concentrate anomaly iD in focused pD from each model cell (Fig. 3b). An increased number of anomaly iDs was expected to increase the extent to which the focused pDs would reflect the drug response, even from a small sub-population. Consistently, the distribution of anomaly iDs exhibited a higher degree of deviation than the control population distribution and a lower degree of deviation than that of iDs without in silico FOCUS (Fig. 3c,d).
Comparative analysis of the morphological characteristics using PCA revealed that the morphological response to DHT in focused pDs was more distinct than that in b-pDs in both model cell types (Fig. 3e). The b-pDs clustered based on the cell type, whereas the focused pDs indicated a phenotypic transition of clusters starting from the control phenotype to the DHT-responding phenotype. The positions of these drug-responding clusters in PCA indicated that the AR-24Q and AR-97Q cells exhibited characteristic morphologies of drug-responding cells.
The effect of in silico FOCUS was visualized using clustering (Fig. 3f). Heterogeneous heatmap patterns within clusters related to different conditions indicated that the b-pDs were not stable to characterize the DHT responses as differences based on cell type were too prominent to allow this characterization. In contrast, the focused pDs allowed for homogenous clusters. The clustering results also indicated characteristic transitions in the morphologies of responder cells based on DHT concentrations in both AR-24Q and AR-97Q cells. The distinctly clustered AR-24Q data perfectly reflected the DHT dose–effect and indicated that the DHT responses of healthy cells could be extracted using in silico FOCUS analysis. The control status of the non-responders also indicated that the focused pDs were sensitive to morphological differences even at a DHT concentration of 1 nM.
Statistical analysis of heterogeneity in morphological parameters revealed that the variance of morphological parameters (= CV from morphological parameters) in focused pDs was significantly lower than that in b-pDs with DHT responses (Fig. S4b). Moreover, the average proportion of significant morphological parameters between control and. target groups increased from 46% (b-pDs among both cell types) to average 76% (focused pDs among both cell types) (Fig. S4c).
Furthermore, the phenotypic characteristics of cells could be interpreted from the clustering of the focused pDs. The “mean of length-related parameters” (such as length_width_ratio, perimeter, and compactness) increased after DHT administration (till 10 nM concentration), while the “mean of inner_radius” remained less in AR-24Q cells (Fig. 3f). This response indicates that compared with those in control cells, the elongation was higher and the morphology was thinner in DHT-stimulated healthy cells. However, the increase in “standard deviation (SD) of size-related parameters” (such as area, perimeter, and inner radius) suggested that this elongation was highly heterogeneous. Although the morphological characteristics of AR-97Q cells were not as distinct as those of AR-24Q cells, similar increases in “SD of size-related parameters” were detected. This indicated increased morphological heterogeneity in both cell types. The increased “mean of inner_radius and intensity_deviation” in the AR-97Q cells indicated that DHT promoted heterogenous cell morphology with decreased elongation. These findings suggest that the neural model cells were morphologically heterogeneous, a phenomenon that increased with drug response.
As in silico FOCUS collects top “anomaly iDs” from the target cell population relative to the control cell population, these anomaly iDs can be generated only by chance. To verify that the anomaly detection effect of in silico FOCUS is not affected by false-positive anomalies, “control cell population” was examined for both control and target groups. Conceptually, the extracted populations A and B should never be completely the same even if they are obtained from the same cell population. Therefore, anomaly iDs can be forcibly generated. However, the efficacy of in silico FOCUS with respect to detecting the drug responses will not be overcome if such pseudo-anomalies cannot form meaningful focused pDs. These results indicated that the pseudo-focused pDs can be forcibly generated but they do not disturb the cluster of real focused pDs reflecting the DHT effect (Fig. S4d). Additionally, this validation of in silico FOCUS data using established positive control is important.
In silico FOCUS is a type of image cytometry analysis that generates focused pD by summarizing iDs. Hence, the sample size of iD collection can markedly affect the robustness of analysis. The effect of iD collection size (Table S1) starting from the default collection number (100 iDs) was investigated. The efficacy of in silico FOCUS with respect to evaluating the drug effect markedly decreased when iD collection was reduced. Hence, more than 100 iD collections are essential.
In summary, in silico FOCUS analysis was found to exhibit robust performance with respect to the evaluation and interpretation of heterogenic cellular responses upon using the control drug.
Pioglitazone (PG)-responding phenotype evaluation in model cells
Next, the ability of in silico FOCUS analysis to effectively evaluate rescue responses in model cells was examined. PG exerts therapeutic effects on SBMA model cells (AR-97Q stimulated with 10 nM DHT) and murine models by activating the expression of PPARγ21. Based on the results obtained in previous studies, cells treated with PG at concentrations > 0.1 µM were used as positive controls and were defined as exhibiting a “rescued phenotype.”
The PG rescue effect was reflected in AR-97Q cell morphology (Fig. 4a). AR-97Q cells, which exhibit a shrunken morphology, appeared elongated after PG treatment (1 µM). However, significant differences between control and PG-treated cells were difficult to detect when 300 cells were analyzed (Fig. S5a). Thus, morphological heterogeneity in the model cell made it difficult to detect the rescue effect using a simple method. Therefore, in silico FOCUS was expected to classify the rescued phenotype in AR-97Q cells based on their similarity to healthy AR-24Q cells (Fig. 4b).
Next, in silico FOCUS analysis was used to determine its ability to detect the subtle drug responses of a small cell population. The anomaly iD distribution indicated that in silico FOCUS could differentiate these cell sub-populations from control cells (Fig. S5b). From the representative morphologies of iDs, their elongated morphology can be confirmed, especially in anomaly iDs with large Mahalanobis distances (Fig. S6).
PCA visualization with both b-pDs and focused pDs indicated that PG treatment mitigated the pathological morphological changes in AR-97Q cells (Fig. 4c). However, only analysis of focused pDs enabled the detection of a dose-dependent relationship for this morphological transition. SpecificallyFocused pDs clustered close to the healthy phenotype at PG concentrations > 0.1 µM. The significant decrease in CV from morphological parameters indicated that focused pDs exhibited significantly higher stability than b-pDs (Fig. S5c).
The clustering results also confirmed the effectiveness of the in silico FOCUS method (Fig. 4d). The homogeneity of heatmap patterns under the same condition cluster in focused pDs was markedly higher than that in b-pDs, indicating the stable clustering performance of pDs (Fig. 4d). Although the main cluster branches that distinguish “disease phenotype” or “healthy phenotype” were similar, only focused pDs formed both PG-responding morphology clusters (with both 0.1 and 1 µM) close to the healthy phenotype cluster. The heatmap pattern in the cluster indicated gradual morphological recovery, reflecting PG dose–response. Clustering stability was also evidenced by the increase in morphological parameters in the focused pDs relative to b-pDs, showing significant control vs. target differences (Fig. S5d). Based on the interpretation of parameters (Fig. 4d), the increase in “mean of length-related parameters” supported the hypothesis that PG treatment recovers the thin and elongated morphology in AR-97Q cells.
To confirm that the efficacy of in silico FOCUS for detecting the PG rescue effect is not affected by false-positive anomaly iDs generated only by chance, the focused pDs from PG-responding conditions were compared with the pseudo-focused pDs forcibly generated using control status for “target” (Fig. S5e). Comparative analysis revealed that pseudo-focused pDs do not disturb the profile of real focused pDs upon PG treatment.
These findings indicate that the in silico FOCUS method can effectively detect drug rescue in morphologically heterogeneous SBMA cells. Additionally, this method may help determine the effective drug concentration based on morphology data alone.
Phenotype classification model development for drug screening
Unsupervised analysis indicated that the focused pDs from in silico FOCUS were effective descriptors for identifying the characteristic phenotypes in the model cells. This prompted us to further develop supervised machine learning models for automatically classifying the phenotypes to enable high-throughput image-based drug screening. This involves screening drug candidates using AR-97Q cells and evaluating their drug-responsive morphology using the phenotype classification models.
Model cells exhibited the following two phenotypes that could be trained as “hit” in the phenotype classification: the “rescued phenotype of AR-97Q” (indicating the PG effect) and the “healthy phenotype of AR-24Q” (served as the control). The following two classification models were constructed: model A, trained with “disease” and “rescued”; model B, trained with “disease” and “healthy” (Fig. 5). Both models exhibited high classification accuracy in cross-validation. However, model A could be over-fit to recognize only the PG effect in AR-97Q cells, while model B could be over-fit to recognize only cell type differences. Therefore, model A with “healthy” and model B with “rescued” as blind test data were tested to confirm their performance universality. Both models perfectly predicted the blind test data as “hit,” indicating that these are over-fit narrow models for phenotype classification. Thus, the morphological profile (focused pD) extracted using in silico FOCUS was effective for constructing high-accuracy and robust image-based classification models.