import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
from mpl_toolkits.mplot3d import Axes3D
from itertools import starmap
import math
import warnings
'ignore') warnings.filterwarnings(
Tensor Decomposition: PARAFAC and Tucker
Traditional data analysis methods often rely on matrices, but when applied to complex, multimodal datasets they can struggle to uncover meaningful latent structures. High-order (multiway) data analysis extends beyond the two-dimensional matrix framework, providing more powerful tools for identifying patterns and relationships in multi-dimensional data. In this post, I review two cornerstone techniques (PARAFAC and Tucker) which generalize familiar methods like singular value decomposition (SVD) and bilinear principal component analysis (PCA) to higher-order tensors.
Background
In many areas of science, like spectroscopy, data are usually stored as vectors or matrices. But in real life, data often have more than two dimensions. For example, measurements that vary across time, space, and different conditions all at once. To work with these kinds of multi-dimensional datasets, we use multi-linear algebra. It’s the higher-dimensional version of linear algebra. Just as a matrix can be broken down with tools like Singular Value Decomposition (SVD) or Eigenvalue Decomposition (EVD) these larger data structures can also be decomposed with special methods. These decompositions help us find patterns that would be hidden if we only looked at two-dimensional data.
For example, Excitation-Emission Matrix (EEM) fluorescence spectroscopy and hyperspectral imaging naturally produce three-dimensional data.
In situation where the collected data are not intrinsically 3-ways but rather tidy (\(i\) observations × \(j\) variables), we can add for instance “time” as a third mode, in order to have three inputs (\(i\) observations × \(j\) variables × \(k\) time). In other words, by adding an additional modality, we obtain a collection of matrices structured in a table with three entries called third-order tensor. Likewise, if we add “days” as the fourth mode into the previous data (\(i\) observations × \(j\) variables × \(k\) time × \(l\) days), we obtain a fourth-order tensor. If “longitude” is added (\(i\) observations × \(j\) variables × \(k\) time × \(l\) days × \(m\) longitude), we obtain a fifth-order tensor, and so forth. Intuitively we can see that an order-\(N\) tensor is the number of modes or ways , or \(N\)-dimensional arrays. In the following we will only consider data arranged as third-order tensor \(i.e.\), 3-way data.
Third-Oder Tensor
Let \(T\) be an array with three entries, three matrix representations can be associated with this array:
\(T\) can be seen as 3 different collections of matrices. These collections are called horizontal, vertical and frontal slices. The matrices of each collection have the same dimension.
\(T\) can be considered successively as three matrices called “representations n mode”. The process of reorganizing \(T\) in these three forms is called matricization (or mode-\(k\) flattening).
\(T\) can be seen as three collections of vectors. Each collection is a table with 2 entries, the elements of which are vectors of the same dimension also called “fibers”.
Tensor Decomposition
PARAFAC
PARAFAC (Parallel Factor Analysis) also called Canonical Polyadic (CP) is the simplest and most widely used three-way decomposition. Introduced independently by Harshman and, by Carrol and Chang (who called it Canonical Decomposition or CANDECOMP), PARAFAC represents the most intuitive extension of familiar techniques like Principal Component Analysis (PCA) into the three-dimensional world. PARAFAC is express by:
\[ \mathbf{X} \approx \sum_{r=1}^{R} \lambda_r \cdot (\mathbf{a}_r \otimes \mathbf{b}_r \otimes \mathbf{c}_r) \]
This equation tells us that a 3D tensor \(\textbf{X}\) can be approximated as a sum of \(R\) simple components, where each component is the outer product \((\otimes)\) of three vectors: \(\textbf{a}_r\) from mode 1, \(\textbf{b}_r\) from mode 2, and \(\textbf{c}_r\) from mode 3, scaled by weight \(\lambda_r\).
One of the defining strengths of PARAFAC is that it applies the same number of components (R) across all modes and, under mild conditions, guarantees a unique decomposition. This property of uniqueness is extremely valuable: unlike many other factorization methods, PARAFAC can recover the “true” underlying factors without ambiguity. As a result, it provides not just a mathematical approximation but an interpretable solution that reflects real-world structure. This makes PARAFAC a powerful tool for scientific discovery, with broad applications in spectroscopy, neuroscience, signal processing, and beyond.
Tucker3
Tucker3 takes a more sophisticated approach, like having an adjustable toolbox instead of a master key. While PARAFAC forces all modes to have the same number of components, Tucker3 allows each mode to have its own optimal number of factors. The Tucker3 decomposition follows this structure:
\[ \mathbf{X} \approx \mathbf{G} \times_1 \mathbf{A} \times_2 \mathbf{B} \times_3 \mathbf{C} \]
Here, \(\textbf{G}\) is the core tensor that acts as a “weighting grid,” determining how strongly each combination of factors from different modes interacts. The \(×₁\), \(×₂\), \(×₃\) symbols represent tensor contractions. Think of them as sophisticated ways of combining the core tensor with factor matrices \(\textbf{A}\), \(\textbf{B}\), and \(\textbf{C}\).
Unlike PARAFAC’s simple sum, Tucker3 creates a weighted combination where the core tensor \(\textbf{G}\) contains elements \(g_{i,j,k}\) that specify how factors from mode 1 (position \(i\)), mode 2 (position \(j\)), and mode 3 (position \(k\)) combine together. This flexibility allows Tucker3 to capture more complex interaction patterns that PARAFAC might miss.
Implementation
In the examples below, we demonstrate a practical implementation of tensor decomposition methods using the Python library TensorLy. In the first example, we built and used a synthetic dataset composed of three Gaussian distribution patterns, illustrating how PARAFAC decomposition can effectively capture and uncovored underlying structures in a noisy multidimensional data. In the second example, we use a published dataset that captures how our tongues move when we speak. Using this dataset, we will compare PARAFAC and Tucker decomposition, highlighting the differences in how each method captures the underlying structures of a complex multidimensional data.
Python libraries
For our tensor decomposition analysis, we rely on TensorLy, a powerful Python library specifically designed for multilinear algebra and tensor operations. TensorLy provides a unified, user-friendly interface for tensor computations while maintaining computational efficiency through optimized backends.
import tensorly as tl
from tensorly.decomposition import parafac, tucker
from tensorly.cp_tensor import cp_to_tensor
from tensorly.tucker_tensor import tucker_to_tensor
We also specify the computational backend with tl.set_backend('numpy')
, which determines how TensorLy performs its underlying mathematical operations. TensorLy supports multiple backends including NumPy
, PyTorch
, TensorFlow
, and JAX
, each offering different advantages depending on the application. For more details see here.
123)
np.random.seed('numpy') tl.set_backend(
Example 1
The bivariate Gaussian distribution with means \(\mu_x,\mu_y\), standards deviations \(\sigma_x, \sigma_y\), and correlation coefficient \(\rho\), is given by:
\[ f(x, y) = \frac{1}{2\pi \sigma_x \sigma_y \sqrt{1 - \rho^2}} \exp\left( -\frac{1}{2(1 - \rho^2)} \left[ \left( \frac{x - \mu_x}{\sigma_x} \right)^2 - 2\rho \left( \frac{x - \mu_x}{\sigma_x} \right) \left( \frac{y - \mu_y}{\sigma_y} \right) + \left( \frac{y - \mu_y}{\sigma_y} \right)^2 \right] \right) \]
We consider the simplified 2D Gaussian distribution when \(\rho = 0\) (i.e., no correlation between \(x\) and \(y\)):
\[ f(x, y) = \frac{1}{2\pi \sigma_x \sigma_y} \exp\left( - \frac{1}{2} \left[ \left( \frac{x - \mu_x}{\sigma_x} \right)^2 + \left( \frac{y - \mu_y}{\sigma_y} \right)^2 \right] \right) \]
def gaussian_distribution(x, y, mu=(0,0), sigma=(1,1), amplitude=1):
= 1 / (2 * math.pi * mu[0] * mu[1])
alpha = ((x - mu[0]) / sigma[0])**2
beta_x = ((y - mu[1]) / sigma[1])**2
beta_y = beta_x + beta_y
beta return amplitude * alpha * np.exp(-(1 / 2) * beta)
Now, let’s built three different Gaussian distributions with specific characteristics.
= np.linspace(-5, 5, 100)
x = np.linspace(-5, 5, 100)
y = np.meshgrid(x, y)
X, Y
# Broad distribution
= gaussian_distribution(X, Y, (-2, -2), (2, 2), 1)
broad
# Two peaks
= gaussian_distribution(X, -Y, (1.5, 1.5), (0.8, 0.8), 1)
peak1 = gaussian_distribution(-X, Y, (2, 2), (0.8, 0.8), 2)
peak2 = peak1 + peak2
twopeaks
# Sharp distribution
= gaussian_distribution(X, Y, (1, 1), (0.2, 0.2), 1.2) sharp
Next, we stack the three distributions into a tensor of dimension \(100 \times 100 \times 3\).
= np.stack([broad, twopeaks, sharp], axis=-1) T
We can now visualize the three distributions to verify their shapes and characteristics.
= plt.figure(figsize=(15, 5))
fig
for i in range(3):
= fig.add_subplot(1, 3, i+1, projection='3d')
ax = ax.plot_surface(X, Y, T[:, :, i], cmap='viridis', linewidth=0, antialiased=True)
surf 'X-Position')
ax.set_xlabel('Y-Position')
ax.set_ylabel(
plt.tight_layout() plt.show()
plt.close()
For the purpose of this post, we add some noise into the data at three distinct levels: 0.001, 0.01, and 0.05.
= T + 0.001 * np.random.randn(*T.shape)
noisy1_T = T + 0.01 * np.random.randn(*T.shape)
noisy2_T = T + 0.05 * np.random.randn(*T.shape) noisy3_T
Then, we create a list of tensors to decompose, e.g., [noisy1_T, noisy2_T, noisy3_T]
.
= [noisy1_T, noisy2_T, noisy3_T]
datasets = [0.001, 0.01, 0.05]
noise_levels = ['Broad', 'Two-peak', 'Sharp'] titles
def plot_tensor(data, noise):
= plt.figure(figsize=(15, 5))
fig f"Noise level = {noise}", fontsize=16)
fig.suptitle(
for i in range(3):
= fig.add_subplot(1, 3, i+1, projection='3d')
ax = ax.plot_surface(X, Y, data[:, :, i], cmap='viridis', linewidth=0, antialiased=True)
surf 'X-Position')
ax.set_xlabel('Y-Position')
ax.set_ylabel(
plt.tight_layout()
plt.show() plt.close()
for _ in map(plot_tensor, datasets, noise_levels):
pass
PARAFAC decomposition is applied iteratively to the list of tensors. For each tensor in datasets
, the method extracts three components, producing a set of weights
and factor
matrices that capture the underlying patterns along each mode:
- Mode 0 factors: describe the pattern along the rows (X-axis) of the tensor.
- Mode 1 factors: describe the pattern along the columns (Y-axis) of the tensor.
- Mode 2 factors: correspond to the the different Gaussian distributions.
All of these results are collected and stored in a list called results
, for further analysis and visualization.
= list(map(lambda data: parafac(data, 3, init='random', random_state=42), datasets)) results
We then reconstruct the tensor from the factorization results and evaluate the quality of the decomposition by calculating the relative reconstruction error. The reconstructed tensor is obtained using the factor matrices and weights returned by the PARAFAC model. The error is computed as the ratio between the Frobenius norm, \(\lVert{\cdot}\rVert_F\), of the difference (original tensor minus reconstructed tensor) and the Frobenius norm of the original tensor. This provides a normalized measure of how well the decomposition approximates the data, with lower values indicating a more accurate reconstruction.
\[ \text{Relative Reconstruction Error} = \frac{\lVert\mathcal{T} - \hat{\mathcal{T}}\rVert_F}{\lVert \mathcal{T}\rVert_F} \]
Where \(\mathcal{T}\) is the original tensor and \(\lVert \mathcal{\hat{T}} \rVert_F\) is the reconstructed tensor.
def compute_error(result, data):
= result
weights, factors = cp_to_tensor((weights, factors))
reconstructed_T return np.linalg.norm(data - reconstructed_T) / np.linalg.norm(data)
= list(map(compute_error, results, datasets)) errors
for noise, err in zip(noise_levels, errors):
print(f"Noise level = {noise}, Relative reconstruction error = {err:.4f}")
Noise level = 0.001, Relative reconstruction error = 0.3160
Noise level = 0.01, Relative reconstruction error = 0.6662
Noise level = 0.05, Relative reconstruction error = 0.9623
Next, we visualize the decomposed modes obtained from the tensor factorization. Each of the three component represents a distinct pattern captured along a specific mode of the data, and plotting them allows us to interpret the underlying structures identified by the decomposition.
= ['Component 1', 'Component 2', 'Component 3'] titles
def plot_components(result, noise):
= result
weights, factors = plt.subplots(1, 3, figsize=(15, 5))
fig, axes f"Noise level = {noise}", fontsize=16)
fig.suptitle(
for i, ax in enumerate(axes):
= np.outer(factors[0][:, i], factors[1][:, i])
component /= np.max(np.abs(component))
component = component.sum(axis=0)
line_y = component.sum(axis=1)
line_x ='Mode 1 (Y-position)')
ax.plot(line_y, label='Mode 0 (X-position)')
ax.plot(line_x, label
ax.set_title(titles[i])'Index')
ax.set_xlabel('Amplitude (normalized)')
ax.set_ylabel(
ax.legend()
plt.tight_layout()
plt.show() plt.close()
for _ in map(plot_components, results, noise_levels):
pass
= ['Broad', 'Two-peak', 'Sharp'] titles
def plot_reconstruction_error(noisy, reconstructed, noise_level):
= noisy - reconstructed
error_tensor = plt.subplots(1, 3, figsize=(15, 5))
fig, axes f'Reconstruction Error - Noise level = {noise_level}', fontsize=16)
fig.suptitle(for i, ax in enumerate(axes):
= ax.imshow(error_tensor[:, :, i], cmap='RdBu', aspect='auto')
im
ax.set_title(titles[i])=ax, label='Error')
fig.colorbar(im, ax
plt.tight_layout()
plt.show() plt.close()
= [cp_to_tensor(res) for res in results]
reconstructed_tensors for _ in map(plot_reconstruction_error, datasets, reconstructed_tensors, noise_levels):
pass
Example 2
In this example, we use a fascinating dataset from pioneering linguistic research by Ladefoged et al. (1971), that captures how our tongues move when we speak. The study used X-ray imaging to study tongue shapes as five different speakers pronounced various English vowels. By mapping tongue positions onto a defined grid, they created a unique 3D dataset that reveals the biomechanics of human speech.
The tensor has dimensions \(5 \times 10 \times 13\), representing:
- 5 speakers (different individuals)
- 10 vowels (various English vowel sounds)
- 13 grid positions (spatial measurements in centimeters)
This creates a rich, multi-dimensional dataset perfect for demonstrating tensor decomposition techniques. The original study aimed to identify the fundamental patterns underlying tongue movement during speech. The data was preprocessed by centering across vowels, and the researchers found evidence for two definitive components in the speech patterns, with a possible third component that couldn’t be reliably established. This real-world complexity makes it an excellent example for comparing different tensor decomposition methods like PARAFAC and Tucker, as it contains the kind of structural challenges often found in genuine scientific data. For complete experimental details, see the original research paper.
Code
= np.array([
X 2.45, 2.40, 2.40, 2.50, 2.45, 2.05, 1.65, 1.00, 0.45, 0.40, 0.55, 0.55, 0.95],
[2.55, 2.05, 1.95, 1.90, 1.80, 1.60, 1.30, 0.95, 0.55, 0.65, 0.90, 0.90, 1.05],
[2.35, 1.90, 1.80, 1.80, 1.80, 1.55, 1.30, 0.95, 0.65, 0.70, 0.90, 0.85, 1.05],
[2.50, 2.05, 1.65, 1.65, 1.55, 1.45, 1.25, 1.05, 0.70, 1.05, 1.20, 1.15, 1.10],
[2.05, 1.50, 1.35, 1.50, 1.55, 1.45, 1.30, 1.15, 0.95, 1.40, 1.55, 1.40, 1.25],
[1.55, 1.00, 0.85, 1.00, 1.25, 1.35, 1.50, 2.00, 2.10, 2.55, 2.65, 2.35, 2.00],
[1.65, 0.90, 0.65, 0.65, 0.75, 0.95, 1.40, 1.90, 2.15, 2.60, 2.70, 2.60, 2.40],
[1.90, 1.30, 0.95, 0.85, 0.75, 0.75, 0.95, 1.30, 1.65, 2.15, 2.30, 2.25, 2.30],
[2.40, 1.60, 1.45, 1.25, 1.00, 0.95, 0.80, 0.85, 1.10, 1.50, 2.10, 2.00, 1.65],
[2.70, 1.95, 1.50, 1.30, 0.90, 0.70, 0.55, 0.55, 0.95, 1.45, 1.80, 1.90, 2.00],
[2.95, 2.70, 2.75, 2.75, 2.70, 2.60, 2.25, 1.00, 0.35, 0.15, 0.30, 0.60, 1.15],
[2.40, 2.20, 2.25, 2.20, 2.25, 2.15, 1.85, 1.25, 0.75, 0.75, 0.90, 1.05, 1.10],
[2.25, 2.45, 2.65, 2.65, 2.40, 2.20, 2.05, 1.55, 0.95, 0.85, 1.10, 1.40, 1.65],
[2.00, 1.75, 1.90, 2.30, 2.40, 2.20, 2.00, 1.45, 1.00, 1.05, 1.40, 1.75, 1.80],
[1.25, 1.15, 1.30, 1.65, 1.95, 1.90, 1.80, 1.65, 1.40, 1.70, 2.15, 2.45, 2.60],
[0.45, 0.25, 0.30, 0.40, 1.15, 1.70, 1.95, 2.30, 2.60, 2.95, 3.30, 3.15, 2.60],
[0.40, 0.20, 0.20, 0.30, 0.60, 1.05, 1.35, 1.65, 2.60, 3.05, 3.45, 3.60, 3.40],
[1.00, 0.55, 0.55, 0.45, 0.65, 0.80, 1.15, 1.55, 2.25, 2.75, 3.20, 3.35, 3.25],
[1.30, 0.70, 0.65, 0.45, 0.65, 0.90, 1.20, 1.45, 1.90, 2.40, 2.85, 2.80, 2.45],
[2.15, 1.80, 1.50, 1.05, 0.65, 0.55, 0.65, 0.80, 0.95, 1.55, 2.10, 2.35, 2.60],
[2.10, 2.00, 2.15, 2.05, 1.95, 1.80, 1.45, 1.10, 0.75, 0.65, 0.75, 0.80, 0.90],
[2.00, 1.70, 1.90, 1.95, 1.90, 1.75, 1.35, 1.15, 0.95, 1.00, 1.10, 0.90, 0.65],
[1.95, 1.80, 1.80, 1.95, 1.95, 1.95, 1.65, 1.25, 0.90, 0.85, 1.05, 0.95, 0.90],
[1.55, 1.40, 1.50, 1.70, 1.85, 1.80, 1.90, 1.80, 1.75, 1.70, 1.70, 1.40, 1.10],
[1.65, 1.25, 1.40, 1.70, 1.90, 1.95, 2.05, 2.10, 1.95, 1.95, 2.15, 2.10, 1.70],
[0.95, 0.55, 0.70, 1.15, 1.65, 2.20, 2.65, 2.95, 3.05, 3.20, 3.35, 2.95, 1.90],
[1.20, 0.65, 0.45, 0.65, 0.75, 1.00, 1.45, 2.10, 2.40, 2.65, 2.80, 2.55, 1.95],
[1.55, 1.45, 1.05, 1.15, 1.05, 1.00, 1.15, 1.45, 1.90, 2.40, 2.70, 2.65, 1.85],
[1.80, 1.05, 1.05, 1.05, 1.00, 1.00, 1.15, 1.40, 1.65, 1.95, 2.15, 1.85, 1.50],
[2.00, 1.70, 1.40, 1.20, 1.00, 0.85, 0.95, 1.00, 1.10, 1.55, 1.80, 1.70, 1.25],
[2.70, 2.60, 2.55, 2.50, 2.45, 2.40, 1.80, 1.35, 0.70, 0.55, 0.75, 0.85, 1.85],
[2.25, 1.90, 1.85, 1.90, 2.15, 2.05, 1.85, 1.65, 1.35, 1.40, 1.50, 1.90, 1.80],
[2.25, 2.20, 2.30, 2.25, 2.30, 2.20, 1.70, 1.45, 0.90, 0.90, 1.10, 1.25, 1.85],
[1.90, 1.50, 1.40, 1.40, 1.65, 1.75, 1.75, 1.85, 1.60, 1.80, 1.90, 1.65, 1.50],
[1.70, 1.20, 1.05, 1.05, 1.55, 1.70, 1.80, 1.90, 1.85, 2.10, 2.35, 2.40, 2.25],
[1.05, 0.90, 0.45, 0.60, 1.45, 2.05, 2.90, 2.90, 3.00, 3.20, 3.35, 2.95, 2.15],
[0.90, 0.40, 0.45, 0.55, 1.30, 1.80, 2.30, 2.80, 3.10, 3.40, 3.45, 3.00, 2.40],
[2.00, 1.30, 1.05, 0.90, 0.95, 0.90, 1.25, 1.65, 1.80, 2.30, 2.60, 2.60, 1.90],
[2.15, 1.70, 1.45, 1.30, 1.30, 1.25, 1.20, 1.35, 1.45, 1.95, 2.20, 2.25, 1.95],
[2.95, 2.30, 2.05, 1.80, 1.70, 1.45, 1.00, 0.80, 0.80, 1.15, 1.55, 1.90, 1.40],
[3.00, 2.45, 2.30, 2.20, 2.10, 1.45, 1.15, 0.80, 0.40, 0.60, 0.45, 0.40, 0.85],
[2.40, 2.10, 1.95, 1.90, 1.80, 1.45, 1.10, 0.90, 0.70, 0.95, 0.95, 0.75, 1.10],
[2.50, 2.40, 2.20, 2.05, 2.05, 1.70, 1.30, 0.95, 0.65, 0.95, 1.00, 0.85, 1.20],
[2.25, 2.10, 1.95, 1.90, 1.90, 1.55, 1.15, 1.00, 0.90, 1.10, 1.05, 0.90, 1.25],
[1.70, 1.95, 2.05, 2.10, 1.95, 1.50, 1.15, 1.15, 1.10, 1.30, 1.30, 1.20, 1.45],
[1.40, 0.85, 1.05, 1.30, 1.55, 1.55, 1.65, 2.00, 2.40, 2.75, 2.80, 2.60, 2.35],
[1.10, 0.70, 0.70, 0.90, 1.15, 1.00, 1.20, 1.80, 2.40, 2.75, 2.80, 2.35, 2.05],
[1.80, 1.05, 0.75, 0.70, 0.70, 0.55, 0.60, 1.20, 1.85, 2.40, 2.45, 2.25, 2.40],
[1.90, 1.25, 1.05, 0.90, 0.95, 0.65, 0.65, 1.25, 1.85, 2.35, 2.35, 2.05, 2.30],
[2.70, 2.05, 1.65, 1.40, 1.15, 0.60, 0.40, 0.50, 0.60, 1.15, 1.40, 1.60, 1.65]
[ ])
Let’s walk through the essential steps to convert our raw data into a proper 3D tensor for analysis. First, we transpose our original data matrix. This rearranges our data so that the spatial measurements (13 grid positions) become the first dimension, which will be important for our tensor structure.
= X.T X
Next, we’re specifying how to organize our data into a 3D structure.
- 13 = Grid positions (spatial measurements along the tongue)
- 10 = Vowel sounds (different English vowels)
- 5 = Speakers (different individuals)
Finally, the .reshape()
function reorganizes the 637 total elements \((13 \times 49)\) into our desired 3D structure while preserving the original data values. We now have a clean 3D tensor where X_tensor[i, j, k]
represents the tongue measurement at grid position i
, for vowel j
, spoken by speaker k
.
= [13, 10, 5]
tensor_dims = X.reshape(tensor_dims) X_tensor
PARAFAC decomposition
In the following, we first proceed with one important step. We systematically test different ranks (complexity levels) to find the best PARAFAC decomposition for our tongue shape data. We test multiple ranks from 2 to 50, asking: “How many components do we need to best represent our data?”
In doing so, we follow a systematic decomposition loop for each rank. We decompose the tensor into factor matrices using PARAFAC, then reconstruct the original tensor from these factors. We measure the error between the original and reconstructed data, and store all results for comparison. PARAFAC breaks down our 3D tensor into three factor matrices: a grid positions factor (13 × rank), a vowels factor (10 × rank), and a speakers factor (5 × rank). Think of this as finding the fundamental “building blocks” that, when combined, recreate the original tongue movement patterns.
def parafac_modeling(X_tensor, max_rank=50):
"""
PARAFAC modeling
Parameters:
-----------
X_tensor : numpy.ndarray
Input tensor to decompose
max_rank : intenger, default=50
Maximum rank
Returns:
--------
dict : Results dictionary with PARAFAC decomposition results
"""
= list(range(2, max_rank + 1))
ranks_to_test = {}
results
for rank in ranks_to_test:
try:
= parafac(
factors
X_tensor, =rank,
rank='random',
init=200,
n_iter_max=42
random_state
)
= tl.cp_to_tensor(factors)
reconstructed = tl.norm(X_tensor - reconstructed)
error = error / tl.norm(X_tensor)
rel_error
= {
results[rank] 'factors': factors,
'reconstructed': reconstructed,
'error': error,
'rel_error': rel_error
}
except Exception as e:
print(f" Models failed...")
= None
results[rank]
return results
Code
def optimal_rank(model, criteria='elbow'):
"""
Analyze and suggest optimal rank based on different criteria
"""
= []
ranks = []
rel_errors
for rank, result in model.items():
if result is not None:
ranks.append(rank)'rel_error'])
rel_errors.append(result[
= np.array(ranks)
ranks = np.array(rel_errors)
rel_errors
= ranks[np.argmin(rel_errors)]
best_rank = np.min(rel_errors)
best_error print(f" Best overall rank: {best_rank} (error: {best_error*100:.3f}%)")
if len(ranks) > 3:
= np.diff(rel_errors)
first_diff = np.diff(first_diff)
second_diff if len(second_diff) > 0:
= ranks[2:len(second_diff)+2]
second_diff_ranks = np.abs(second_diff) < 0.001
small_second_diff_mask if np.any(small_second_diff_mask):
= second_diff_ranks[small_second_diff_mask][0]
elbow_rank = rel_errors[ranks == elbow_rank][0]
elbow_error print(f" Elbow point: Rank {elbow_rank} (error: {elbow_error*100:.3f}%)")
else:
= np.argmax(np.abs(first_diff))
max_reduction_idx if max_reduction_idx < len(ranks) - 1:
= ranks[max_reduction_idx + 1]
elbow_rank = rel_errors[max_reduction_idx + 1]
elbow_error print(f" Largest improvement at: Rank {elbow_rank} (error: {elbow_error*100:.3f}%)")
print("\n Recommendations:")
= ranks[rel_errors < 0.05]
good_ranks if len(good_ranks) > 0:
= good_ranks[0] # First rank with < 5% error
practical_rank = rel_errors[ranks == practical_rank][0]
practical_error print(f" • For < 5% error: Rank {practical_rank} (error: {practical_error*100:.3f}%)")
= ranks[rel_errors < 0.01]
excellent_ranks if len(excellent_ranks) > 0:
= excellent_ranks[0]
excellent_rank = rel_errors[ranks == excellent_rank][0]
excellent_error print(f" • For < 1% error: Rank {excellent_rank} (error: {excellent_error*100:.3f}%)")
= np.prod([13, 10, 5])
tensor_size for rank in [5, 10, 15, 20]:
if rank in ranks:
= rank * (13 + 10 + 5)
params = tensor_size / params
compression_ratio = rel_errors[ranks == rank][0]
error print(f" • Rank {rank}: {params} params, {compression_ratio:.1f}x compression, {error*100:.2f}% error")
return ranks, rel_errors
Code
def rank_analysis(model, save_fig=False):
"""
Rank analysis showing reconstruction error vs rank
"""
= []
ranks = []
rel_errors = []
abs_errors
for rank, result in model.items():
if result is not None:
ranks.append(rank)'rel_error'])
rel_errors.append(result['error'])
abs_errors.append(result[
= np.array(ranks)
ranks = np.array(rel_errors)
rel_errors = np.array(abs_errors)
abs_errors
= plt.subplots(2, 2, figsize=(15, 10))
fig, axes
0, 0].plot(ranks, rel_errors * 100, 'bo-', linewidth=2, markersize=4)
axes[0, 0].set_xlabel('Rank')
axes[0, 0].set_ylabel('Relative Error (%)')
axes[0, 0].set_title('Reconstruction Error vs Rank')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].set_ylim(bottom=0)
axes[
= np.argmin(rel_errors)
min_error_idx 0, 0].annotate(
axes[f'Min: Rank {ranks[min_error_idx]}\nError: {rel_errors[min_error_idx]*100:.2f}%',
=(ranks[min_error_idx], rel_errors[min_error_idx]*100),
xy=(ranks[min_error_idx]+5, rel_errors[min_error_idx]*100+2),
xytext=dict(arrowstyle='->', color='red'),
arrowprops=9, color='red'
fontsize
)
0, 1].semilogy(ranks, rel_errors * 100, 'ro-', linewidth=2, markersize=4)
axes[0, 1].set_xlabel('Rank')
axes[0, 1].set_ylabel('Relative Error (%) [Log Scale]')
axes[0, 1].set_title('Error vs Rank (Log Scale)')
axes[0, 1].grid(True, alpha=0.3)
axes[
= np.diff(rel_errors) / rel_errors[:-1] * -100
error_reduction 1, 0].bar(ranks[1:], error_reduction, alpha=0.7, color='green')
axes[1, 0].set_xlabel('Rank')
axes[1, 0].set_ylabel('Error Reduction Rate (%)')
axes[1, 0].set_title('Error Improvement Rate (Rank i vs Rank i-1)')
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].axhline(y=0, color='black', linestyle='-', alpha=0.5)
axes[
= (1 - rel_errors**2) * 100
explained_variance 1, 1].plot(ranks, explained_variance, 'mo-', linewidth=2, markersize=4)
axes[1, 1].set_xlabel('Rank')
axes[1, 1].set_ylabel('Explained Variance (%)')
axes[1, 1].set_title('Explained Variance vs Rank')
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].set_ylim([0, 100])
axes[
for threshold in [90, 95, 99]:
1, 1].axhline(y=threshold, color='gray', linestyle='--', alpha=0.5)
axes[1, 1].text(ranks[-10], threshold+1, f'{threshold}%', fontsize=8, alpha=0.7)
axes[
plt.tight_layout()
if save_fig:
'rank_analysis.png', dpi=300, bbox_inches='tight')
plt.savefig(
plt.show()
return fig, axes
We obtain four-panel visualization that gives a clear picture of how well different ranks capture tongue shape patterns. The blue curve shows the classic elbow effect: reconstruction error drops quickly between ranks 2–10, then levels off, and finally approaches zero around ranks 40–45. In other words, using more than ~40 ranks lets us reconstruct the data almost perfectly. The log-scale view zooms in on the details, showing that most of the improvement happens early (ranks 2–15), with only small, gradual gains after that. The green bars highlight the “value added” by each rank. Big improvements show up at low ranks (2–10), with a few extra bumps around ranks 20–30. Occasionally, the bars dip below zero, which just means the algorithm stalled briefly due to some numerical hiccups.
From a practical standpoint, choosing a rank between 7–20 is a sweet spot. It captures the meaningful tongue movement patterns without overfitting. Going beyond 40 may look like perfect reconstruction, but in reality, it’s more likely fitting noise than real speech dynamics.
= parafac_modeling(X_tensor, max_rank=50)
parafac_mod = rank_analysis(model=parafac_mod) fig, axes
= optimal_rank(model=parafac_mod) ranks, rel_errors
Best overall rank: 50 (error: 0.000%)
Elbow point: Rank 7 (error: 7.084%)
Recommendations:
• For < 5% error: Rank 11 (error: 4.580%)
• For < 1% error: Rank 24 (error: 0.973%)
• Rank 5: 140 params, 4.6x compression, 8.91% error
• Rank 10: 280 params, 2.3x compression, 5.27% error
• Rank 15: 420 params, 1.5x compression, 2.91% error
• Rank 20: 560 params, 1.2x compression, 1.90% error
Tucker decomposition
For comparison, we also applied Tucker3 decomposition to the tongue shape data. Unlike PARAFAC, where a single rank is chosen for all modes, Tucker3 allows the number of components in each mode to be varied independently. We therefore tested different combinations of model sizes, systematically varying the dimensions of the grid positions, vowels, and speakers modes (e.g., [5, 8, 4] and other configurations).
As in the PARAFAC modeling, each Tucker3 model was fit by decomposing the tensor into three factor matrices and a core tensor. The factor matrices represent the main axes of variation within each mode, while the core tensor captures how these axes interact across grid positions, vowels, and speakers. After reconstruction, the error was computed and stored for each configuration, enabling a direct comparison to PARAFAC.
def tucker_modeling(X_tensor, max_rank=[10, 10, 10]):
"""
Tucker3 modeling
Parameters:
-----------
X_tensor : numpy.ndarray
Input tensor to decompose
max_rank : list, default=[4, 4, 3]
Maximum rank for each mode [mode1, mode2, mode3]
Returns:
--------
dict : Results dictionary with Tucker decomposition results
"""
= []
ranks_to_test for r1 in range(2, max_rank[0] + 1):
for r2 in range(2, max_rank[1] + 1):
for r3 in range(2, max_rank[2] + 1):
ranks_to_test.append([r1, r2, r3])
= {}
results = 0
successful_count
for rank_combo in ranks_to_test:
try:
= tucker(
factors
X_tensor, =rank_combo,
rank='random',
init=200,
n_iter_max=42
random_state
)
= tl.tucker_to_tensor(factors)
reconstructed = tl.norm(X_tensor - reconstructed)
error = error / tl.norm(X_tensor)
rel_error
= np.prod(rank_combo)
core_params = sum(X_tensor.shape[i] * rank_combo[i] for i in range(3))
factor_params = core_params + factor_params
total_params
= str(rank_combo)
key = {
results[key] 'ranks': rank_combo,
'factors': factors,
'reconstructed': reconstructed,
'error': error,
'rel_error': rel_error,
'core_params': core_params,
'factor_params': factor_params,
'total_params': total_params
}
+= 1
successful_count
except Exception as e:
print(f" Models failed - {str(e)[:30]}...")
str(rank_combo)] = None
results[
return results
Code
def tucker_results(model):
"""Analyze Tucker decomposition results"""
= {k: v for k, v in model.items() if v is not None}
successful_results
if not successful_results:
print(" No successful Tucker decompositions to analyze")
return
= min(successful_results.keys(), key=lambda k:
best_key 'rel_error'])
successful_results[k][
= successful_results[best_key]
best_result
print(f" Best configuration: {best_result['ranks']}")
print(f" - Relative error: {best_result['rel_error']*100:.3f}%")
print(f" - Total parameters: {best_result['total_params']}")
print(f" - Core parameters: {best_result['core_params']}")
print(f" - Factor parameters: {best_result['factor_params']}")
print(f"\n Top 5 configurations:")
= sorted(successful_results.items(), key=lambda x: x[1]['rel_error'])
sorted_results
for i, (key, result) in enumerate(sorted_results[:5]):
= np.prod(tensor_dims) / result['total_params']
compression_ratio print(
f" {i+1}. {result['ranks']}: {result['rel_error']*100:.3f}% error, "
f"{result['total_params']} params, {compression_ratio:.1f}x compression"
)
print(f"\n Recommendations:")
for threshold in [0.05, 0.01, 0.005]:
= [(k, v) for k, v in successful_results.items()
good_configs if v['rel_error'] < threshold]
if good_configs:
= min(good_configs, key=lambda x: x[1]['total_params'])
best_config = best_config[1]
result print(
f" • For <{threshold*100:.1f}% error: {result['ranks']} "
f"({result['rel_error']*100:.3f}% error, {result['total_params']} params)"
)
Code
def plot_tucker_results(model):
"""Create comprehensive Tucker visualization"""
import matplotlib.pyplot as plt
= {k: v for k, v in model.items() if v is not None}
successful_results
if not successful_results:
print(" No successful decompositions to plot")
return
= []
configs = []
rel_errors = []
total_params = []
core_params = []
factor_params = []
compression_ratios
= tensor_dims[0] * tensor_dims[1] * tensor_dims[2]
tensor_size
for key, result in successful_results.items():
configs.append(key)'rel_error'] * 100)
rel_errors.append(result['total_params'])
total_params.append(result['core_params'])
core_params.append(result['factor_params'])
factor_params.append(result[/ result['total_params'])
compression_ratios.append(tensor_size
= np.array(rel_errors)
rel_errors = np.array(total_params)
total_params = np.array(compression_ratios)
compression_ratios
= plt.figure(figsize=(18, 12))
fig
= plt.subplot(2, 3, 1)
ax1 = np.argsort(rel_errors)
sorted_indices = sorted_indices[:min(20, len(sorted_indices))]
top_20
= plt.bar(range(len(top_20)), rel_errors[top_20], alpha=0.7, color='skyblue')
bars 'Configuration Rank')
plt.xlabel('Relative Error (%)')
plt.ylabel('Tucker: Top 20 Configurations by Error')
plt.title(range(len(top_20)), [configs[i] for i in top_20], rotation=45, ha='right')
plt.xticks(True, alpha=0.3)
plt.grid(
if len(top_20) > 0:
0].set_color('gold')
bars[0, rel_errors[top_20[0]] + 0.1, 'Best', ha='center', fontweight='bold')
plt.text(
= plt.subplot(2, 3, 2)
ax2 = plt.scatter(total_params, rel_errors, c=compression_ratios,
scatter ='viridis', alpha=0.7, s=60)
cmap'Total Parameters')
plt.xlabel('Relative Error (%)')
plt.ylabel('Tucker: Error vs Model Complexity')
plt.title(True, alpha=0.3)
plt.grid(
= plt.colorbar(scatter)
cbar 'Compression Ratio')
cbar.set_label(
= _find_pareto_front(total_params, rel_errors)
pareto_indices
plt.plot(
total_params[pareto_indices],
rel_errors[pareto_indices], 'r-',
=2,
linewidth=0.8,
alpha='Pareto Front'
label
)
plt.legend()
= plt.subplot(2, 3, 3)
ax3 = sorted_indices[:10]
top_10_by_error
= [core_params[i] for i in top_10_by_error]
core_params_top = [factor_params[i] for i in top_10_by_error]
factor_params_top = [configs[i] for i in top_10_by_error]
configs_top
= range(len(top_10_by_error))
x_pos = plt.bar(
p1
x_pos,
core_params_top, =0.8,
alpha='coral',
color='Core Parameters'
label
)= plt.bar(
p2
x_pos,
factor_params_top, =core_params_top,
bottom=0.8,
alpha='lightblue',
color='Factor Parameters'
label
)
'Configuration')
plt.xlabel('Number of Parameters')
plt.ylabel('Tucker: Parameter Breakdown (Top 10)')
plt.title(=45, ha='right')
plt.xticks(x_pos, configs_top, rotation
plt.legend()True, alpha=0.3)
plt.grid(
= plt.subplot(2, 3, 4)
ax4 =0.7, color='green', s=60)
plt.scatter(compression_ratios, rel_errors, alpha'Compression Ratio')
plt.xlabel('Relative Error (%)')
plt.ylabel('Tucker: Compression vs Accuracy Trade-off')
plt.title(True, alpha=0.3)
plt.grid(
= compression_ratios > 1.0
high_compression if np.any(high_compression):
= np.argmax(compression_ratios[high_compression])
best_compression_idx = np.where(high_compression)[0][best_compression_idx]
best_idx
plt.annotate(f'{configs[best_idx]}\n{compression_ratios[best_idx]:.1f}x',
=(compression_ratios[best_idx], rel_errors[best_idx]),
xy=(compression_ratios[best_idx]+0.1, rel_errors[best_idx]+1),
xytext=dict(arrowstyle='->', color='red'),
arrowprops=9,
fontsize='red'
color
)
= plt.subplot(2, 3, 5)
ax5 =20, alpha=0.7, color='purple', edgecolor='black')
plt.hist(rel_errors, bins'Relative Error (%)')
plt.xlabel('Frequency')
plt.ylabel('Tucker: Error Distribution')
plt.title(
plt.axvline(
np.mean(rel_errors), ='red',
color='--',
linestyle=f'Mean: {np.mean(rel_errors):.2f}%'
label
)
plt.axvline(
np.median(rel_errors), ='orange',
color='--',
linestyle=f'Median: {np.median(rel_errors):.2f}%'
label
)
plt.legend()True, alpha=0.3)
plt.grid(= plt.subplot(2, 3, 6)
ax6
if len(successful_results) > 10:
= []
rank_data for key, result in successful_results.items():
= result['ranks']
r1, r2, r3 'rel_error']])
rank_data.append([r1, r2, r3, result[
= np.array(rank_data)
rank_data = sorted(set(rank_data[:, 0]))
unique_r1 = sorted(set(rank_data[:, 1]))
unique_r2
if len(unique_r1) > 1 and len(unique_r2) > 1:
= np.full((len(unique_r2), len(unique_r1)), np.nan)
heatmap_data
for i, r2 in enumerate(unique_r2):
for j, r1 in enumerate(unique_r1):
= (rank_data[:, 0] == r1) & (rank_data[:, 1] == r2)
mask if np.any(mask):
= np.min(rank_data[mask, 3]) * 100
heatmap_data[i, j]
= plt.imshow(heatmap_data, cmap='RdYlBu_r', aspect='auto')
im ='Relative Error (%)')
plt.colorbar(im, label'Mode 1 Rank')
plt.xlabel('Mode 2 Rank')
plt.ylabel('Tucker: Error Heatmap (Mode 1 vs Mode 2)')
plt.title(range(len(unique_r1)), unique_r1)
plt.xticks(range(len(unique_r2)), unique_r2)
plt.yticks(else:
plt.text(0.5,
0.5,
'Insufficient data\nfor heatmap',
='center',
ha='center',
va=ax6.transAxes, fontsize=12
transform
)else:
plt.text(0.5,
0.5,
'Too few configurations\nfor heatmap',
='center',
ha='center',
va=ax6.transAxes,
transform=12
fontsize
)
plt.tight_layout()'Tucker Decomposition Analysis', fontsize=16, y=0.98)
plt.suptitle(
plt.show()
def _find_pareto_front(params, errors):
"""Find Pareto front for parameter-error trade-off"""
= np.argsort(params)
sorted_indices = []
pareto_indices = float('inf')
min_error_so_far for idx in sorted_indices:
if errors[idx] < min_error_so_far:
pareto_indices.append(idx)= errors[idx]
min_error_so_far
return np.array(pareto_indices)
"""Analyze Tucker decomposition results"""
= {k: v for k, v in results.items() if v is not None}
successful_results
if not successful_results:
print(" No successful Tucker decompositions to analyze")
return
= min(successful_results.keys(), key=lambda k:
best_key 'rel_error'])
successful_results[k][= successful_results[best_key]
best_result
print(f" Best configuration: {best_result['ranks']}")
print(f" - Relative error: {best_result['rel_error']*100:.3f}%")
print(f" - Total parameters: {best_result['total_params']}")
print(f" - Core parameters: {best_result['core_params']}")
print(f" - Factor parameters: {best_result['factor_params']}")
print(f"\n Top 5 configurations:")
= sorted(successful_results.items(), key=lambda x:
sorted_results 1]['rel_error'])
x[
for i, (key, result) in enumerate(sorted_results[:5]):
= np.prod(tensor_dims) / result['total_params']
compression_ratio print(
f" {i+1}. {result['ranks']}: {result['rel_error']*100:.3f}% error, "
f"{result['total_params']} params, {compression_ratio:.1f}x compression"
)
print(f"\n Recommendations:")
for threshold in [0.05, 0.01, 0.005]:
= [(k, v) for k, v in successful_results.items()
good_configs if v['rel_error'] < threshold]
if good_configs:
= min(good_configs, key=lambda x: x[1]['total_params'])
best_config = best_config[1]
result print(
f" • For <{threshold*100:.1f}% error: {result['ranks']} "
f"({result['rel_error']*100:.3f}% error, {result['total_params']} params)"
)
Likewise, the six-panel visualization below provides insights into Tucker decomposition performance across different core tensor configurations.
The top-left panel showcases the twenty best-performing configurations, with reconstruction errors clustered impressively between 0.5-2.0%. This immediately tells us that Tucker decomposition can achieve excellent accuracy for our tongue shape data when properly configured. The scatter plot of error versus model complexity illustrates the trade-off between accuracy and efficiency. The red Pareto Front marks the “optimal” configurations, where we cannot improve both accuracy and efficiency at the same time. In other words, if we want a lower error, we must accept higher complexity, and vice versa.
The stacked bars break down model complexity into two parts: parameters from the core tensor (orange) and those from the factor matrices (blue). This shows where the computational cost comes from in each model. The green dots compare compression ratio against reconstruction error. Configurations above the diagonal line achieve good compression while still keeping the error low, which makes them especially useful in practice.
Finally, the histogram shows the overall distribution of errors. Most Tucker configurations fall between 8–10% error, with a median of 8.7% and a mean of 9.2%. A few standout models perform much better, achieving far lower error rates. The color-coded grid provides another view, showing how reconstruction error changes across rank combinations for the first two modes. This makes it easier to detect “sweet spots” where the model balances accuracy and efficiency most effectively.
= tucker_modeling(X_tensor)
tucker_mod = tucker_mod) plot_tucker_results(model
= tucker_mod) tucker_results(model
Best configuration: [10, 10, 10]
- Relative error: 1.348%
- Total parameters: 1280
- Core parameters: 1000
- Factor parameters: 280
Top 5 configurations:
1. [10, 10, 10]: 1.348% error, 1280 params, 0.5x compression
2. [10, 10, 9]: 1.348% error, 1175 params, 0.6x compression
3. [10, 10, 8]: 1.348% error, 1070 params, 0.6x compression
4. [10, 10, 5]: 1.348% error, 755 params, 0.9x compression
5. [10, 10, 6]: 1.348% error, 860 params, 0.8x compression
Recommendations:
• For <5.0% error: [5, 9, 5] (4.703% error, 405 params)
Model Comparison
After optimizing both decomposition methods, we can now directly compare their performance on our tongue shape data. The comparison reveals distinct characteristics and trade-offs between the two approaches.
The original tensor slices in the top row provide our baseline reference, showing the complex spatial patterns of tongue positions across different speakers and vowels. Both PARAFAC and Tucker successfully capture the general structure of these patterns, but with notable differences in their approach and results.
PARAFAC, configured with rank 7, demonstrates its signature strength in providing interpretable factor loadings across all three modes. The factor plots show clear, distinct patterns for each component, with the grid position factors revealing systematic spatial relationships and the vowel factors capturing acoustic-articulatory connections. The reconstruction achieves a 7.08% relative error with just 196 parameters, making it remarkably parameter-efficient. The error map shows relatively uniform reconstruction quality across the tensor space. Meanwhile, Tucker decomposition, using core dimensions [5, 9, 5], takes a fundamentally different approach with its more flexible structure. The factor matrices show more complex patterns, reflecting Tucker’s ability to capture asymmetric relationships between modes. With 405 parameters, Tucker achieves a superior 4.70% relative error, demonstrating the power of its additional flexibility. The core tensor visualization shows the internal structure that Tucker uses to combine these factors, something absent in PARAFAC’s simpler multiplicative model.
The direct comparison panels at the bottom quantify these trade-offs clearly. PARAFAC wins on parameter efficiency with 196 versus 405 parameters, translating to better compression ratios. However, Tucker delivers superior reconstruction accuracy with nearly 30% lower relative error. The reconstruction difference heatmap highlights where these methods disagree most strongly, typically in regions with complex multimodal interactions.
Code
def compare_optimized_models(X_tensor, parafac_rank, tucker_ranks):
"""
Compare optimized PARAFAC and Tucker models with comprehensive analysis
Parameters:
-----------
X_tensor : numpy.ndarray
Input tensor to decompose
parafac_rank : int
Optimal rank for PARAFAC
tucker_ranks : list
Optimal ranks for Tucker [mode1, mode2, mode3]
"""
= {}
results
# PARAFAC Decomposition
try:
= parafac(
parafac_factors
X_tensor, =parafac_rank,
rank='random',
init=200,
n_iter_max=42
random_state
)
= tl.cp_to_tensor(parafac_factors)
X_parafac_reconstructed = tl.norm(X_tensor - X_parafac_reconstructed)
parafac_error = parafac_error / tl.norm(X_tensor)
parafac_rel_error = parafac_rank * sum(X_tensor.shape)
parafac_params
'PARAFAC'] = {
results['success': True,
'factors': parafac_factors,
'reconstructed': X_parafac_reconstructed,
'error': parafac_error,
'rel_error': parafac_rel_error,
'rank': parafac_rank,
'params': parafac_params,
'method': 'PARAFAC'
}
except Exception as e:
'PARAFAC'] = {'success': False, 'error': str(e)}
results[
# Tucker Decomposition
try:
= tucker(
tucker_factors
X_tensor, =tucker_ranks,
rank='random',
init=200,
n_iter_max=42
random_state
)
= tl.tucker_to_tensor(tucker_factors)
X_tucker_reconstructed = tl.norm(X_tensor - X_tucker_reconstructed)
tucker_error = tucker_error / tl.norm(X_tensor)
tucker_rel_error
= np.prod(tucker_ranks)
core_params = sum(X_tensor.shape[i] * tucker_ranks[i] for i in range(3))
factor_params = core_params + factor_params
tucker_params
'Tucker'] = {
results['success': True,
'factors': tucker_factors,
'reconstructed': X_tucker_reconstructed,
'error': tucker_error,
'rel_error': tucker_rel_error,
'ranks': tucker_ranks,
'params': tucker_params,
'core_params': core_params,
'factor_params': factor_params,
'method': 'Tucker'
}
except Exception as e:
'Tucker'] = {'success': False, 'error': str(e)}
results[
_visualization(results, X_tensor)
_summary(results)
return results
def _visualization(results, X_tensor):
"""Create comprehensive comparison visualization"""
= [method for method, result in results.items() if result['success']]
successful_methods
if not successful_methods:
return
= plt.figure(figsize=(20, 14))
fig
# Original tensor slices
for i in range(min(5, X_tensor.shape[2])):
= plt.subplot(4, 6, i+1)
ax = plt.imshow(X_tensor[:, :, i], cmap='viridis', aspect='auto')
im f'Original Tensor\nSlice {i+1}', fontsize=10)
plt.title(=0.6)
plt.colorbar(im, shrinkif i == 0:
'Mode 1', fontsize=9)
plt.ylabel('Mode 2', fontsize=9)
plt.xlabel(
# Tensor statistics
= plt.subplot(4, 6, 6)
ax 0.1, 0.8, f'Shape: {X_tensor.shape}', fontsize=12, transform=ax.transAxes)
ax.text(0.1, 0.6, f'Elements: {np.prod(X_tensor.shape)}', fontsize=12, transform=ax.transAxes)
ax.text(0.1, 0.4, f'Norm: {tl.norm(X_tensor):.3f}', fontsize=12, transform=ax.transAxes)
ax.text(0.1, 0.2, f'Range: [{X_tensor.min():.3f}, {X_tensor.max():.3f}]', fontsize=12, transform=ax.transAxes)
ax.text('Tensor Stats', fontsize=10)
ax.set_title('off')
ax.axis(
# PARAFAC Analysis
if 'PARAFAC' in successful_methods:
= results['PARAFAC']
result = result['factors'][1]
factors = ['blue', 'red', 'green', 'orange']
colors
# PARAFAC factor matrices
for mode in range(3):
= plt.subplot(4, 6, 7 + mode)
ax = min(4, factors[mode].shape[1])
n_comps_show for comp in range(n_comps_show):
=colors[comp],
plt.plot(factors[mode][:, comp], color=f'Comp {comp+1}', marker='o', markersize=3, linewidth=1.5)
labelf'PARAFAC Mode {mode+1}', fontsize=10)
plt.title('Index', fontsize=9)
plt.xlabel('Factor Value', fontsize=9)
plt.ylabel(if mode == 0:
=7)
plt.legend(fontsizeTrue, alpha=0.3)
plt.grid(
# PARAFAC reconstruction
= plt.subplot(4, 6, 10)
ax = result['reconstructed'][:, :, 0]
recon_slice = plt.imshow(recon_slice, cmap='viridis', aspect='auto')
im f'PARAFAC Recon', fontsize=10)
plt.title(=0.6)
plt.colorbar(im, shrink
# PARAFAC error
= plt.subplot(4, 6, 11)
ax = np.abs(X_tensor[:, :, 0] - recon_slice)
error_slice = plt.imshow(error_slice, cmap='Reds', aspect='auto')
im f'PARAFAC Error', fontsize=10)
plt.title(=0.6)
plt.colorbar(im, shrink
# PARAFAC stats
= plt.subplot(4, 6, 12)
ax 0.1, 0.8, f"Rank: {result['rank']}", fontsize=12, transform=ax.transAxes)
ax.text(0.1, 0.6, f"Error: {result['rel_error']*100:.3f}%", fontsize=12, transform=ax.transAxes)
ax.text(0.1, 0.4, f"Params: {result['params']}", fontsize=12, transform=ax.transAxes)
ax.text(= np.prod(X_tensor.shape) / result['params']
compression 0.1, 0.2, f"Compression: {compression:.2f}x", fontsize=12, transform=ax.transAxes)
ax.text('PARAFAC Stats', fontsize=10)
ax.set_title('off')
ax.axis(
# Tucker Analysis
if 'Tucker' in successful_methods:
= results['Tucker']
result = result['factors'][1]
factors
# Tucker factor matrices
for mode in range(3):
= plt.subplot(4, 6, 13 + mode)
ax = min(4, factors[mode].shape[1])
n_comps_show for comp in range(n_comps_show):
=colors[comp],
plt.plot(factors[mode][:, comp], color=f'Comp {comp+1}', marker='s', markersize=3, linewidth=1.5)
labelf'Tucker Mode {mode+1}', fontsize=10)
plt.title('Index', fontsize=9)
plt.xlabel('Factor Value', fontsize=9)
plt.ylabel(if mode == 0:
=7)
plt.legend(fontsizeTrue, alpha=0.3)
plt.grid(
# Tucker core tensor
= plt.subplot(4, 6, 16)
ax = result['factors'][0]
core = core[:, :, 0] if core.ndim == 3 else core
core_slice = plt.imshow(core_slice, cmap='RdBu_r', aspect='auto')
im 'Tucker Core', fontsize=10)
plt.title(=0.6)
plt.colorbar(im, shrink
# Tucker reconstruction
= plt.subplot(4, 6, 17)
ax = result['reconstructed'][:, :, 0]
recon_slice = plt.imshow(recon_slice, cmap='viridis', aspect='auto')
im f'Tucker Recon', fontsize=10)
plt.title(=0.6)
plt.colorbar(im, shrink
# Tucker stats
= plt.subplot(4, 6, 18)
ax 0.1, 0.8, f"Ranks: {result['ranks']}", fontsize=12, transform=ax.transAxes)
ax.text(0.1, 0.6, f"Error: {result['rel_error']*100:.3f}%", fontsize=12, transform=ax.transAxes)
ax.text(0.1, 0.4, f"Params: {result['params']}", fontsize=12, transform=ax.transAxes)
ax.text(= np.prod(X_tensor.shape) / result['params']
compression 0.1, 0.2, f"Compression: {compression:.2f}x", fontsize=12, transform=ax.transAxes)
ax.text('Tucker Stats', fontsize=10)
ax.set_title('off')
ax.axis(
# Direct Comparison
if len(successful_methods) >= 2:
# Error comparison
= plt.subplot(4, 6, 19)
ax = [result['method'] for result in results.values() if result['success']]
methods = [result['rel_error'] for result in results.values() if result['success']]
errors = ['skyblue', 'lightcoral'][:len(methods)]
colors_comp
= plt.bar(methods, errors, color=colors_comp, alpha=0.7)
bars 'Error Comparison', fontsize=10)
plt.title('Relative Error', fontsize=9)
plt.ylabel(=45)
plt.xticks(rotation
for bar, error in zip(bars, errors):
plt.text(+ bar.get_width()/2,
bar.get_x() + 0.002,
bar.get_height() f'{error:.4f}',
='center',
ha='bottom',
va=8
fontsize
)
# Parameter comparison
= plt.subplot(4, 6, 20)
ax = [result['params'] for result in results.values() if result['success']]
params = plt.bar(methods, params, color=colors_comp, alpha=0.7)
bars 'Parameters', fontsize=10)
plt.title('# Parameters', fontsize=9)
plt.ylabel(=45)
plt.xticks(rotation
for bar, param in zip(bars, params):
plt.text(+ bar.get_width()/2,
bar.get_x() + max(params)*0.02,
bar.get_height() f'{param}',
='center',
ha='bottom',
va=8
fontsize
)
# Reconstruction difference
= plt.subplot(4, 6, 21)
ax if len(successful_methods) == 2:
= results['PARAFAC']['reconstructed'][:, :, 0] if 'PARAFAC' in results and results['PARAFAC']['success'] else None
parafac_recon = results['Tucker']['reconstructed'][:, :, 0] if 'Tucker' in results and results['Tucker']['success'] else None
tucker_recon
if parafac_recon is not None and tucker_recon is not None:
= np.abs(parafac_recon - tucker_recon)
diff = plt.imshow(diff, cmap='plasma', aspect='auto')
im 'Reconstruction\nDifference', fontsize=10)
plt.title(=0.6)
plt.colorbar(im, shrinkelse:
0.5, 0.5, 'Cannot compute\ndifference', ha='center', va='center',
plt.text(=ax.transAxes, fontsize=10)
transformelse:
0.5, 0.5, 'Need both methods\nfor comparison', ha='center', va='center',
plt.text(=ax.transAxes, fontsize=10)
transform
# Compression comparison
= plt.subplot(4, 6, 22)
ax = [np.prod(X_tensor.shape) / result['params'] for result in results.values() if result['success']]
compressions = plt.bar(methods, compressions, color=colors_comp, alpha=0.7)
bars 'Compression Ratio', fontsize=10)
plt.title('Compression', fontsize=9)
plt.ylabel(=45)
plt.xticks(rotation
for bar, comp in zip(bars, compressions):
plt.text(+ bar.get_width()/2,
bar.get_x() + max(compressions)*0.02,
bar.get_height() f'{comp:.2f}x',
='center',
ha='bottom',
va=8
fontsize
)
# Explained variance comparison
= plt.subplot(4, 6, 23)
ax = [(1 - result['rel_error']**2) * 100 for result in results.values() if result['success']]
explained_vars = plt.bar(methods, explained_vars, color=colors_comp, alpha=0.7)
bars 'Explained Variance', fontsize=10)
plt.title('Variance (%)', fontsize=9)
plt.ylabel(=45)
plt.xticks(rotation0, 100])
plt.ylim([
for bar, var in zip(bars, explained_vars):
plt.text(+ bar.get_width()/2,
bar.get_x() + 2,
bar.get_height() f'{var:.1f}%',
='center',
ha='bottom',
va=8
fontsize
)
plt.tight_layout()'PARAFAC vs Tucker: Comparison Results', fontsize=16, y=0.98)
plt.suptitle(
plt.show()
plt.close()
def _summary(results):
"""Print minimal final summary"""
= [method for method, result in results.items() if result['success']]
successful_methods
if successful_methods:
= min(successful_methods, key=lambda m: results[m]['rel_error'])
best_method = results[best_method]
best_result
print(f"Best performing method: {best_method}")
print(f" • Relative error: {best_result['rel_error']*100:.3f}%")
print(f" • Model parameters: {best_result['params']}")
= compare_optimized_models(
results
X_tensor, =7,
parafac_rank=[5, 9, 5]
tucker_ranks )
Best performing method: Tucker
• Relative error: 4.703%
• Model parameters: 405
Conclusion
Althrough, both methods achieve high explained variance (over 95%), but Tucker’s additional parameters allow it to capture subtle patterns that PARAFAC’s constrained structure cannot represent. In other word, PARAFAC offers cleaner, more interpretable results, while Tucker provides superior reconstruction quality for applications where accuracy trumps interpretability.
The comparison suggests that for our tongue shape data, both methods successfully identify meaningful patterns, but serve different analytical purposes depending on whether the priority is understanding the underlying structure or achieving the most accurate data representation.