Privacy-Preserving Synthetic Data for ML: The Role of Masked Language Models

Table of Links

Abstract & Introduction
Proposal
1. Classification Target
2. Masked Conditional Density Estimation (MaCoDE)
Theoretical Results
1. With Missing Data
Experiments
Results
1. Related Works
2. Conclusions and Limitations
3. References
A1 Proof of Theorem 1
1. A2 Proof of Proposition 1
2. A3 Dataset Descriptions
A4 Missing Mechanism
1. A5 Experimental Settings for Reproduction
A6 Additional Experiments
A7 Detailed Experimental Results

Proof. This proof is based on Theorem 6.11 of [50] and Theorem 1 of [29].

Thus, for every ϵ > 0,

(B) Furthermore, by the continuous mapping theorem and the algebra of the convergence in probability, for every ϵ > 0,

Download links.

• abalone: https://archive.ics.uci.edu/dataset/1/abalone

• banknote: https://archive.ics.uci.edu/dataset/267/banknote+authentication

• breast: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic

• concrete: https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength

• covertype: https://www.kaggle.com/datasets/uciml/forest-cover-type-dataset

• kings: https://www.kaggle.com/datasets/harlfoxem/housesalesprediction

• letter: https://archive.ics.uci.edu/dataset/59/letter+recognition

• loan: https://www.kaggle.com/datasets/teertha/personal-loan-modeling

• redwine: https://archive.ics.uci.edu/dataset/186/wine+quality

• whitewine: https://archive.ics.uci.edu/dataset/186/wine+quality

Authors:

(1) Seunghwan An, Department of Statistical Data Science, University of Seoul, S. Korea ([email protected]);

(2) Gyeongdong Woo, Department of Statistical Data Science, University of Seoul, S. Korea ([email protected]);

(3) Jaesung Lim, Department of Statistical Data Science, University of Seoul, S. Korea ([email protected]);

(4) ChangHyun Kim, Department of Statistical Data Science, University of Seoul, S. Korea ([email protected]);

(5) Sungchul Hong, Department of Statistics, University of Seoul, S. Korea ([email protected]);

(6) Jong-June Jeon (corresponding author), Department of Statistics, University of Seoul, S. Korea ([email protected]).

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.