ChemicalQDeviceExplainability: Tensor Network, Neural Network
Explainability Studies Tensor Network, Neural Network, Hybrid Combinations 04-10-24.pdf

Dataset 1: Standard tensor networks substitutions had better performance than non-linear fully connected layer substitutions, except when tensor networks exceeded 4 layers. '3TN' was comparable to '6FC' in saliency metrics, but had 323x less parameters, took up 343x less storage space, and ran 4.2x faster. (Slides 02-06) Mean, median, and standard deviation values were lowered with increasing number of tensor network layers. The linear tensor network model is more explainable due to a smaller number of features that were important in the model's predictions. In other words, machine learning focused on a fewer number of data points for the model to further optimize the solution based on data points that mattered more, typically at decision boundaries.

The larger values of saliency skewness indicate that the tail of the distribution extends towards the right side, with the majority of the lower saliency values on the left. Greater positive saliency kurtosis values indicate a sharper and more desirable peaked distribution. Put differently, the normalized saliency values for the tensor networks had smaller values which were represented by a taller peak - contributing to more interpretable classification predictions.

Hybrid Studies: In general, hybrid models did not show considerable improvements in normalized saliency values of mean, median, and standard deviation shown in Table 2. However, hybrid '2TN 3FC' had a 25% increase in saliency skewness, and a 56% increase in kurtosis over '6FC' corresponding to increased interpretability. The lowest loss for Dataset 1 was from '2FC 1TN' at 6.00E-08. It is also worth noting that tensor networks placed before fully connected layers were faster than models that had fully connected layers before tensor networks. 

Dataset 2: The more challenging concentric circles dataset saw a range of saliency metrics and performance. Notable high performance was obtained by '2TN 3FC' for lowest saliency mean, median, and standard deviation; with high skewness and kurtosis values. '6FC' had the best loss at 5.14E-08.

Dataset 3: The spirals dataset was also led by '6FC' in saliency metrics and performance, Figure 9. Worth noting is that '3TN' had the best saliency mean and median, while '4TN' had a model loss of 1.

Dataset 4: Perhaps the most challenging dataset containing intersecting ellipses was led by tensor network based '3TN' in normalized saliency median and model loss at 3.81E-02. '6FC' was close behind in loss, with leading metrics for skewness and kurtosis.

The most significant part of the study was the consistently better tensor network additions than fully connected layers across all 5 saliency metrics for Dataset 1, indicative of a more explainable model. Hybrid studies showed compatibility between layer types with some additional improvements in skewness and kurtosis. The 6 layer fully connected model had the best overall performance for the 3 more challenging datasets. Models such as '3TN' that is 300X smaller; and faster are poised to further improve the most recent trend of lightweight tensor network substitutions into much larger LLMs for Generative AI tasks, as published by at least two sets of researchers using unsupervised learning methods.

Created by Kevin Kawchak Founder CEO ChemicalQDevice2024 San Diego, CaliforniaHealthcare Innovation