Authors:
(1) Xueying Mao, School of Computer Science, Fudan University, China (xymao22@[email protected]);
(2) Xiaoxiao Hu, School of Computer Science, Fudan University, China ([email protected]);
(3) Wanli Peng, School of Computer Science, Fudan University, China ([email protected]);
(4) Zhenliang Gan, School of Computer Science, Fudan University, China (zlgan23@[email protected]);
(5) Qichao Ying, School of Computer Science, Fudan University, China ([email protected]);
(6) Zhenxing Qian, School of Computer Science, Fudan University, China and a Corresponding Author ([email protected]);
(7) Sheng Li, School of Computer Science, Fudan University, China ([email protected]);
(8) Xinpeng Zhang, School of Computer Science, Fudan University, China ([email protected]).
Editor's note: This is Part 4 of 7 of a study describing the development of a new method to hide secret messages in semantic features of videos, making it more secure and resistant to distortion during online sharing. Read the rest below.
Datasets. We use Vggface2 [61] for training and FFHQ [15] for validation. We crop and resize facial areas to a fixed 224 × 224 resolution for input images. To analyze quality and performance, we randomly select 100 videos from DeepFake MNIST+ [65] to evaluate the performance.
Evaluation Metrics. We employ Bits Per Frame (BPF), quantifying the bits number of secret message per frame in the stego video. To assess robustness, we evaluate secret message extraction accuracy under various scenarios. For security assessment, we use three steganalysis methods [62, 63, 64] to demonstrate our method’s anti-detection capability.
Baselines. To ensure fair comparison in our experiments, we align HiDDeN and LSB to this capacity. Detailed methods of HiDDeN and LSB are available in the supplementary materials. Additionally, due to its PU-based design, PWRN has a limited capacity of 15 BPF when resizing input images to 224 × 224.
This paper is available on arxiv under CC 4.0 license.