LieBN on SPD Manifolds: The Additional Details and Experiments That You Don't Want to Miss

We use the official code of SPDNetBN[6] (Brooks et al., 2019b) and TSMNet[7] (Kobler et al., 2022a) to implement our experiments on the SPDNet and TSMNet backbones.

G.1 DATASETS AND PREPROCESSING

Radar dataset (Brooks et al., 2019b) contains 3,000 synthetic radar signals. Following the protocol in Brooks et al. (2019b), each signal is split into windows of length 20, resulting in 3,000 covariance matrices of the size 20 × 20 equally distributed in 3 classes. HDM05 dataset (M ¨uller et al., 2007) consists of 2,273 skeleton-based motion capture sequences executed by different actors. Each frame consists of 3D coordinates of 31 joints, allowing the representation of each sequence as a 93 × 93 covariance matrix. In line with Brooks et al. (2019b), we trim the dataset down to 2086 instances scattered throughout 117 classes by removing some under-represented clips. FPHA (Garcia-Hernando et al., 2018) includes 1,175 skeleton-based first-person hand gesture videos of 45 different categories with 600 clips for training and 575 for testing. Following Wang et al. (2021), we represent each sequence as a 63 × 63 covariance matrix. Hinss2021 dataset (Hinss et al., 2021) is a recently released competition dataset containing EEG signals for mental workload estimation. The dataset is employed for two tasks, namely inter-session and inter-subject, which are treated as domain adaptation problems. Geometry-aware methods (Yair et al., 2019; Kobler et al., 2022a) have demonstrated promising performance in EEG classification. We follow Kobler et al. (2022a) for data preprocessing. In detail, the python package MOABB (Jayaram & Barachant, 2018) and MNE (Gramfort, 2013) are used to preprocess the datasets. The applied steps include resampling the EEG signals to 250/256 Hz, applying temporal filters to extract oscillatory EEG activity in the 4 to 36 Hz range, extracting short segments ( ≤ 3s) associated with a class label, and finally obtaining 40 × 40 SPD covariance matrices.

G.2 HYPER-PARAMETERS

We implement the SPD LieBN and DSMLieBN induced by three standard left-invariant metrics, namely AIM, LEM, and LCM, along with their parameterized metrics. Therefore, our method has a maximum of three hyper-parameters, i.e.,(θ, α, β), where θ controls deformation. In our LieBN, (α, β) only affects variance calculation. Therefore, we set (α, β) = (1, 0) and only tune the deformation factor θ from the candidate values of ±0.5, ±1, and ±1.5. We denote [Baseline]+[BN Type]+[Metric]-[θ] as the baseline endowed with a specific LieBN, such as SPDNet+LieBN-AIM-(1) and TSMNet+DSMLieBN-LCM-(1).

G.3 EVALUATION METHODS

In line with the previous work (Brooks et al., 2019b; Kobler et al., 2022a), we use accuracy as the scoring metric for the Radar, HDM05, and FPHA datasets, and balanced accuracy (i.e.,the average recall across classes) for the Hinss2021 dataset. Ten-fold experiments on the Radar, HDM05, and FPHA datasets are carried out with randomized initialization and split (split is officially fixed for the FPHA dataset), while on the Hinss2021 dataset, models are fit and evaluated with a randomized leave 5% of the sessions (inter-session) or subjects (inter-subject) out cross-validation scheme.

G.4 EMPIRICAL INSIGHTS ON THE HYPER-PARAMETERS IN LIEBN ON SPD MANIFOLDS

Our SPD LieBN has at most three types of hyper-parameters: Riemannian metric, deformation factor θ, and O(n)-invariance parameters (α, β). The general order of importance should be Riemannian metric > θ > (α, β).

The most significant parameter is the choice of Riemannian metric, as all the geometric properties are sourced from a metric. A safe choice would start with AIM, and then decide whether to explore other metrics further. The most important reason is the property of affine invariance of AIM, which is a natural characteristic of covariance matrices. In our experiments, the LieBN-AIM generally achieves the best performance. However, AIM is not always the best metric. As shown in Tab. 4b, the best result on the HDM05 dataset is achieved by LCM-based LieBN, which improves the vanilla SPDNet by 11.71%. Therefore, when choosing Riemannian metrics on SPD manifolds, a safe choice would start with AIM and extend to other metrics. Besides, if efficiency is an important factor, one should first consider LCM, as it is the most efficient one.

The second one is the deformation factor θ. As we discussed in Sec. 5.1, θ interpolates between different types of metrics (θ = 1 and θ → 0). Inspired by this, we select θ around its deformation boundaries (1 and 0). In this paper we roughly select θ from {±0.5, ±1, ±1.5}

The less important parameters are (α, β). Recalling Alg. 1. and Tab. 1, (α, β) only affects the calculation of variance, which should have less effects compared with the above two parameters. Therefore, we simply set (α, β) = (1, 0) during experiments.

G.4.1 THE EFFECT OF β IN SPD LIEBN

We focus on AIM-based LieBN on the HDM05 dataset. We set θ = 1.5, as it is the best deformation factor under this scenario. Other network settings remain the same as the main paper. The 10-fold average results are presented in Tab. 7. Note that on this setting, n = 30. As expected, β has minor effects on our LieBN.

H PRELIMINARY EXPERIMENTS ON ROTATION MATRICES

This section implements our LieBN in Alg. 1 on the special orthogonal groups, i.e.,SO(n), also known as rotation matrices. We apply our LieBN to the classic LieNet (Huang & Van Gool, 2017), where the latent space is the special orthogonal group.

H.1 GEOMETRY ON ROTATION MATRICES

We denote R, S ∈ SO(n), and γ(R,S)(t) as the geodesic connecting R and S. The neutral elements of rotation matrices is the identity matrix. Tab. 8 summarizes all the necessary Riemannian ingredients of the invariant metric on SO(n)

For the specific SO(3), the matrix logarithm and exponentiation can be calculated without decomposition (Murray et al., 2017, Exs. A. 11 and A.14)

H.2 DATASETS AND PREPROCESSING

Following LieNet, we validate our LieBN on the G3D dataset (Bloom et al., 2012). This dataset (Bloom et al., 2012) consists of 663 sequences of 20 different gaming actions. Each sequence is recorded by 3D locations of 20 joints (i.e., 19 bones). Following Huang & Van Gool (2017), we use the code of Vemulapalli et al. (2014) to represent each skeleton sequence as a point on the Lie group SON×T (3), where N and T denote spatial and temporal dimensions. As preprocessed in Huang & Van Gool (2017), we set T as 100 for each sequence on the G3D.

H.3 IMPLEMENTATION DETAILS

LieNet: The LieNet consists of three basic layers: RotMap, RotPooling, and LogMap layers. The RotMap mimics the convolutional layer, while the RotPooling extends the pooling layers to rotation matrices. The logMap layer maps the rotation matrix into the tangent space at the identity for classification. Note that the official code of LieNet[8] is developed by Matlab. We follow the opensourced Pytorch code[9] to implement our experiments. To reproduce LieNet more faithfully, we made the following modifications to this Pytorch code. We re-code the LogMap and RotPooling layers to make them consistent with the official Matlab implementation. In addition, we also extend the existing Riemannian optimization package geoopt B´ecigneul & Ganea (2018) into SO(3) to allow for Riemannian version of SGD, ADAM, and AMSGrad on SO(3), which is missing in the current package. However, we find that SGD is the best optimizer for LieNet. Therefore, we adopt SGD during the experiments. We apply our LieBN before the LogMap layer and refer to this network as LieNetLieBN. Note that the dimension of features in LieNet is B × N ×T ×3×3, we calculate Lie group statistics along the batch and spatial dimensions (B × T ), resulting in an N × 3 × 3 running mean.

H.4 RESULTS

The 10-fold results are shown in Tab. 9. Due to different software, our reimplemented LieNet is slightly worse than the performance reported in Huang et al. (2017). However, we still can observe a clear improvement of LieNetLieBN over LieNet.

I PROOFS OF THE LEMMAS AND THEORIES IN THE MAIN PAPER

Proof of Prop. 4.1. Property 1:

The MLE of M is

Property 2:

The first equation is obtained by (Pennec, 2004, Thm. 7), while the last equation is obtained by the isometry of the left translation.

By Eq. (49), we can readily obtain the results.

Then we have

we can further simply the above equation as

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

[6] https://proceedings.neurips.cc/paper files/paper/2019/file/6e69ebbfad976d4637bb4b39de261bf7- Supplemental.zip

[7] https://github.com/rkobler/TSMNet

[8] https://github.com/zhiwu-huang/LieNet

[9] https://github.com/hjf1997/LieNet

Authors:

(1) Ziheng Chen, University of Trento;

(2) Yue Song, University of Trento and a Corresponding author;

(3) Yunmei Liu, University of Louisville;

(4) Nicu Sebe, University of Trento.

LieBN on SPD Manifolds: The Additional Details and Experiments That You Don't Want to Miss

Too Long; Didn't Read

Companies Mentioned

Coin Mentioned

Table of Links