Ducho: A Unified Framework for Multimodal Feature Extraction in AI-Powered Recommendations

Authors:

(1) Daniele Malitesta, Politecnico di Bari, Italy and [email protected] with Corresponding authors: Daniele Malitesta ([email protected]) and Giuseppe Gassi ([email protected]);

(2) Giuseppe Gassi, Politecnico di Bari, Italy and [email protected] with Corresponding authors: Daniele Malitesta ([email protected]) and Giuseppe Gassi ([email protected]);

(3) Claudio Pomo, Politecnico di Bari, Italy and [email protected];

(4) Tommaso Di Noia, Politecnico di Bari, Italy and [email protected].

Abstract and 1 Introduction and Motivation

2 Architecture and 2.1 Dataset

2.2 Extractor

2.3 Runner

3 Extraction Pipeline

4 Ducho as Docker Application

5 Demonstrations and 5.1 Demo 1: visual + textual items features

5.2 Demo 2: audio + textual items features

5.3 Demo 3: textual items/interactions features 6

Conclusion and Future Work, Acknowledgments and References

6 CONCLUSION AND FUTURE WORK

In this paper we propose Ducho, a framework for extracting highlevel features for multimodal-aware recommendation. Our main purpose is to provide a unified and shared tool to support practitioners and researchers in processing and extracting multimodal features used as side information in recommender systems. Concretely, Ducho involves three main modules: Dataset, Extractor, and Runner. The multimodal extraction pipeline can be highly customized through a Configuration component that allows the setup of the modalities involved (i.e., audio, visual, textual), the sources of multimodal information (i.e., items and/or user-item interactions), and the pre-trained models along with their main extraction parameters. To show how Ducho works in different scenarios and settings, we propose three demos accounting for the extraction of (i) visual/textual items features, (ii) audio/textual items features, and (iii) textual items/interactions features. They can be run locally, on Docker (as we also dockerize Ducho), and on Google Colab. As future directions, we plan to: (i) adopt all available backends (i.e., TensorFlow, PyTorch, and Transformers) to extract features for all modalities; (ii) implement a general extraction model interface allowing the users to follow the same naming/indexing scheme for all pre-trained models and their extraction layers; (iii) integrate the extraction of low-level multimodal features.

ACKNOWLEDGMENTS

This work was partially supported by the following projects: Secure Safe Apulia, MISE CUP: I14E20000020001 CTEMT - Casa delle Tecnologie Emergenti Comune di Matera, CT_FINCONS_III, OVS Fashion Retail Reloaded, LUTECH DIGITALE 4.0, KOINÈ.

REFERENCES

[1] Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, Antonio Ferrara, Daniele Malitesta, and Claudio Pomo. 2022. Reshaping Graph Recommendation with Edge Graph Collaborative Filtering and Customer Reviews. In DL4SR@CIKM (CEUR Workshop Proceedings, Vol. 3317). CEUR-WS.org.

[2] Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2021. A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems. In CVPR Workshops. Computer Vision Foundation / IEEE, 3961–3967.

[3] Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2022. Leveraging Content-Style Item Representation for Visual Recommendation. In ECIR (2) (Lecture Notes in Computer Science, Vol. 13186). Springer, 84–92.

[4] Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, and Tommaso Di Noia. 2023. Disentangling the Performance Puzzle of Multimodal-aware Recommender Systems. In EvalRS@KDD (CEUR Workshop Proceedings, Vol. 3450). CEUR-WS.org.

[5] Weiqing Min, Shuqiang Jiang, and Ramesh C. Jain. 2020. Food Recommendation: Framework, Existing Solutions, and Challenges. IEEE Trans. Multim. 22, 10 (2020), 2659–2671.

[6] Sergio Oramas, Oriol Nieto, Mohamed Sordo, and Xavier Serra. 2017. A Deep Multimodal Approach for Cold-start Music Recommendation. In DLRS@RecSys. ACM, 32–37.

[7] Aghiles Salah, Quoc-Tuan Truong, and Hady W. Lauw. 2020. Cornac: A Comparative Framework for Multimodal Recommender Systems. J. Mach. Learn. Res. 21 (2020), 95:1–95:5.

[8] Zixuan Yi, Xi Wang, Iadh Ounis, and Craig MacDonald. 2022. Multi-modal Graph Contrastive Learning for Micro-video Recommendation. In SIGIR. ACM, 1807– 1811.

This paper is available on arxiv under CC BY 4.0 DEED license.

Ducho: A Unified Framework for Multimodal Feature Extraction in AI-Powered Recommendations

Too Long; Didn't Read

6 CONCLUSION AND FUTURE WORK

ACKNOWLEDGMENTS

REFERENCES

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Ducho: A Unified Framework for Multimodal Feature Extraction in AI-Powered Recommendations

Too Long; Didn't Read

6 CONCLUSION AND FUTURE WORK

ACKNOWLEDGMENTS

REFERENCES

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics