Authors:
(1) Sanchit Sinha, University of Virginia ([email protected]);
(2) Guangzhi Xiong, University of Virginia ([email protected]);
(3) Aidong Zhang, University of Virginia ([email protected]).
Table of Links
3 Methodology and 3.1 Representative Concept Extraction
3.2 Self-supervised Contrastive Concept Learning
3.3 Prototype-based Concept Grounding
3.4 End-to-end Composite Training
4 Experiments and 4.1 Datasets and Networks
4.3 Evaluation Metrics and 4.4 Generalization Results
4.5 Concept Fidelity and 4.6 Qualitative Visualization
4 Experiments
4.1 Datasets and Networks
We consider four widely used task settings commonly utilized for domain adaptation. The task in each of the following settings is classification.
• Digits: This setting utilizes MNIST and USPS [LeCun et al., 1998; Hull, 1994] with Hand-written images of digits and Street View House Number Dataset (SVHN) [Netzer et al., 2011] with cropped house number photos.
• VisDA-2017 [Peng et al., 2017]: contains 12 classes of vehicles sampled from Real (R) and 3D domains.
• DomainNet [Venkateswara et al., 2017]: contains 126 classes of objects (clocks, bags, etc.) sampled from 4 domains - Real (R), Clipart (C), Painting (P), and Sketch (S).
• Office-Home [Peng et al., 2019]: Office-Home contains 65 classes of office objects like calculators, staplers, etc. sampled from 4 different domains - Art (A), Clipart (C), Product (P), and Real (R).
Network Choice: For Digits, we utilize a modified version of LeNet [LeCun et al., 1998] which consists of 3 convolutional layers for digit classification with ReLU activation functions and a dropout probability of 0.1 during training. For all other datasets we utilize a ResNet34 architecture similar to [Yu and Lin, 2023] and initialize it with pre-trained weights from Imagenet1k. For details, refer Appendix.
Baselines. We start by comparing against standard nonexplainable NN architectures - the S+T setting as described in [Yu and Lin, 2023]. Next, we compare our proposed method against 5 different self-explaining approaches. As none of the approaches specifically evaluate concept generalization in the form of domain adaptation, we replicate all approaches. SENN and DiSENN utilize a robustness loss calculated on the Jacobians of the relevance networks with DiSENN utilizing a VAE as the concept extractor. BotCL [Wang, 2023] also proposes to utilize contrastive loss but uses it for position grounding. Similar to BotCL, Ante-hoc concept learning [Sarkar et al., 2022] uses contrastive loss on datasets with known concepts, hence we do not explicitly compare against it. Lastly, UnsupervisedCBM [Sawada, 2022b] uses a mixture of known and unknown concepts and requires a small set of known concepts. For our purpose, we provide the onehot class labels as known concepts in addition to unknown. A visual summary of the salient features of each baseline is depicted in Table 1.
4.2 Hyperparameter Settings
RCE Framework: We utilize the Mean Square Error as the reconstruction loss and set sparsity regularizer λ to 1e-5 for all datasets. The weights ω1 = ω2 = 0.5 are utilized for digit, while they are set at ω1 = 0.8 and ω2 = 0.2 for object tasks.
Learning: We utilize the lightly[1] library for implementing SimCLR transformations [Chen, 2020]. We set the temperature parameter (τ ) to 0.5 by default [Xu et al., 2019] for all datasets. The hyperparameters for each transformation are defaults utilized from SimCLR. The training objective is Contrastive Cross Entropy (NTXent) [Chen, 2020]. Figure 4 depicts an example of various transformations along with the adjudged positive and negative transformations. For the training procedure, we utilize the SGD optimizer with momentum set to 0.9 and a cosine decay scheduler with an initial learning rate set to 0.01. We train each dataset for 10000 iterations with early stopping. The regularization parameters of λ1 and λ2 are set to 0.1 respectively. For Digits, β is set to 1 while it is set to 0.5 for objects. For further details, refer to Appendix.
This paper is available on arxiv under CC BY 4.0 DEED license.
[1] https://github.com/lightly-ai/lightly