This is the English version of this blog. The translation may be not correct, if there are any mistakes, please contact me for delivering suggestions.

# Prelimaries ,Defination and classification

# Prelimaries-- Machine learning

Machine learning (ML) is the study of computer algorithms that improve automatically through experience and by learning from data. Generally, it can be be divided into these two categories

Supervised Learning :Uses labeled data where input-output pairs are known to train models. It aims to predict or classify new data based on learned patterns, minimizing the error between predicted and actual outcomes.

Unsupervised Learning :Works with unlabeled data to identify hidden patterns or structures. It clusters or finds relationships within the data without predefined labels.

# Membership Interface Attack–Defination

Membership interface attack (MIA) is a method of obtaining the data being trained in the targeted ML models byThe main thought of MIA is forming a models that can repersents the membership of the target models. training similar model called shadow model and training a classifier to check whether a data is being trained in the targeted ML model.

# Shadow models

The main thought of MIA is forming a models that can repersents the membership of the target models.

shadow models and training

# Basic Taxonomy of MIA

# based on informations

Based on the informations of the target model attackers can get, the MIA can be divided into these two categories:

White-box attack: Attackers can obtain all the information about the ML model, including the data distributions, training methods and relevant parameters

Black-box attack: Attackers can only get restricted informations, including restricted data distributions, training methods and parameters.

Compared with white-box attack, black-box attacks gets fewer informations, which leads it harder to achieve. However, the influence of a successful attacks can be larger than it. Today , the mainstream of the research is also focusing on the black-box attacks.

# based on Prediction vectors

The prediction vectors is a parameter for judging whether a data is in the models. Today, the main research of MIA can also be categorized based on prediction vectors in the following graph:

Taxonomy of Prediction vectors

# Based on the judgement methods

# Binary Classifier Based MIA.

The data can be classified as members or non-members by training a binary classifier for classify the models. the main method is as followed:

Training the shadow models: Attackers use multiple shadow models that have the same or similar distributions of the target training sets to learn and training the shadow models.

Collecting prediction models: Attackers will make search for the shadow training sets and get the prediction vectors of each data records. Each data of the vectors in shadow training data sets can be labeled as “Member” and the data in test data sets are labled “Non-member”

Training the attack models: Form the “member”and “non-member”data sets based on the labeled datas , and training a binary classifier to form it.

By this, Identifying the complex problem of recognizing the member and non-member of the model is converted into the binary classifier problem.

# Metric Based Membership Inference Attacks

Metric Based MIA obtain the relative metrics by collecting and analysing the prediction vectors and make analyzations by comparing the metrics and thresold values.

Compare with training the binary classifier, it is more simple and consume less computational resources. The next page will show the recent research realms of the analyse and settings.

Currently, the research on metric based MIA have these categories:

Prediction Correctness Based MIA: If the target models predict the input data x correctly, then attackers recognized it as a member. The inituation of it is that if a data is in the real data, then the target model will predict the input data x correctly.

Loss rate Based MIA: If the difference of the loss rate correspond to the target models and the lossrate of the origin data is less than a thresold, then attackers recognize it as a member. The inituation of it is that if the input data is in the true datas, then the loss rate of the target models is near to the total loss rate.

Prediction Confidence Based MIA: If the prediction confidence of some records is larger than some thresold, then recognize it as a member. The inituation of it is that the target models will minimize the difference between it and the real models, so the prediction confidence will close to 1.

Prediction Entropy Based MIA: If the prediction entropy of input a records is lower than a thresold, then recognize it as a member. The inituation is that the target model’s prediction entropy of the training data is larger than the prediction of entropy of the test data.

Modified prediction Entropy Based MIA: Some opinion suggest that current prediction loss didn’t think of the ground truth label, so it may make some misjudgement of some datas, so in some papers, the algorithm of the prediction entropy is modified.

# Relavant research

# Research on the classification models

Since Shokri et al. introduced this attack method, there has been a growing body of research focused on this direction. Salem et al. discussed the assumptions of Membership Inference Attacks (MIA) and attempted to relax the implementation conditions, demonstrating that two of the shadow model assumptions are not necessary and proposing an indicator-based MIA approach. Yeom et al. also proposed two indicator-based MIA methods; Long et al. achieved MIA attacks on certain data by focusing on data with unique effects on the target model, enabling accurate inference in generalized models with similar training and testing accuracy.

Additionally, existing research has also targeted more restricted MIAs. Li and Zhang proposed transfer-based and perturbation-based MIAs. Transfer-based MIAs construct shadow models to simulate the target model, using the shadow model’s confidence to determine membership; perturbation-based MIAs introduce noise to create adversarial examples and distinguish members based on the severity of the perturbation. Choquette et al. introduced data augmentation-based MIAs and decision boundary distance-based MIAs. Data augmentation attacks target common data augmentation phenomena in machine learning systems, creating additional records through different augmentation strategies to query the target model for predictions. Decision boundary attacks estimate the distance of records to the model boundary, similar to Li and Zhang’s attacks. Successful MIA cases suggest that machine learning models may be more vulnerable to MIA than previously anticipated.

Aside from black-box MIA attacks, Nasir et al. introduced white-box MIA, which can be seen as an extension of black-box MIA, enhancing attack efficiency by leveraging additional information. They use the gradient of the target model’s prediction loss for inference and train a model to distinguish between members and non-members using the SGD algorithm. However, Leino and Fredrikson pointed out that the assumptions of this method are too stringent, requiring attackers to know the approximate distribution of the target dataset. They proposed a Bayes-optimal attack-based MIA method, enabling MIA without background knowledge of the target model.

# Research on Generative Models

Corrent research on generative models are focusing on the generative adversarial network, whose models can be shown as follows:

Hayes et al. were the first to propose Membership Inference Attacks (MIA) against generative models. For white-box attacks, attackers collect all records and compute confidence scores to make inferences; for black-box attacks, attackers collect records from the generator to train a local GAN that mimics the target GAN, and then use the local GAN discriminator for inference. Hilprecht et al. introduced two additional attacks: a Monte Carlo-based black-box attack and a VAE-based white-box attack. Hilprecht et al. proposed a set-based attack to determine whether a data point belongs to a set, while Liu et al. introduced a similar co-membership inference attack, determining dataset membership by analyzing the distance of a data point to the target data. Chen et al. proposed a general method where attackers continuously reconstruct the attack model through optimization, calculate the distance between the generated results of the attack model and the target model, and estimate the probability of data membership based on this distance.

# Research on Embedding Models：

Current research primarily focuses on text and image embedding models. For text embedding models, attacks aim to infer membership of words or sentence pairs within a sliding window, using similarity scores to determine if they belong to a predefined set. For graph embedding models, attack methods involve using shadow models and confidence scores to infer whether nodes in the graph belong to specific categories, addressing node classification issues.

# Research on Regression Models：

Gupta et al. were the first to conduct MIA (Membership Inference Attack) research on regression models for age prediction, achieving attacks through the construction of a white-box binary classification model.

# Research in Federated Learning：

In federated learning, attackers can be either the central server or some of the participating clients. They can implement MIA by determining whether certain data was used in training the global model. Melis was the first to propose a gradient-based MIA by analyzing the update mechanism of the RNN training embeddings. Turex introduced heterogeneous FL (Federated Learning), which involves analyzing differences in aggregated parameters from different clients. Nasr et al. discussed how gradient ascent attacks can actively interfere with FL training. Hu et al. proposed source inference attacks aimed at determining which participants hold training records in FL. They argue that existing MIA attacks in FL overlook the source information of training members, which could lead to further privacy issues.

# Factors for a success MIA

# Overfitting of Target Models

Many studies have pointed out that overfitting of target ML models is a significant factor in the leakage of original datasets. Specifically:Models like DNNs, due to their high parameterization in applications, enhance their ability to handle large datasets but also record a lot of irrelevant information.Training machine learning models often requires many epochs, making them more prone to memorizing the content of the dataset.Machine learning datasets cannot fully represent real-world data.Existing articles indicate that for a classification system overfitting on training data, attackers can achieve an attack success probability higher than 50% based on the correctness of randomly guessed predictions.

# Features of the Model Itself：

When the decision boundary of the target model is not sensitive to the training data used, the effectiveness of MIA attacks is low. Current research shows that among DNN models, logistic regression models, Naive Bayes models, k-nearest neighbor models, and decision tree models, decision tree models have the highest attack accuracy, while the simple Naive Bayes algorithm has the lowest.

# Diversity of the Training Dataset:

When the training dataset used by the target model is highly diverse, it helps the model generalize better to test data. Consequently, the impact of MIA on the model will be smaller.

# Attacker’s Knowledge of the Target Model:

Existing research on MIA generally makes certain assumptions about the attacker: the attacker knows the relevant distribution of the training data and can construct a suitable shadow dataset based on this distribution. High-accuracy shadow models constructed under this assumption are needed for effective attacks.

# Research on Defense against MIA

# Confidence Score Masking

This method is mainly used for defending against black-box attacks by returning obfuscated true confidence scores to the classifier. It includes the following three approaches:

The target classifier does not provide the full prediction vector but only the top few confidence scores.
The target classifier only provides predicted labels when the attacker provides data input.
Noise is added to the returned vector.

These three methods affect the prediction vector but do not result in a loss of prediction accuracy.

# Regularization

Regularization mitigates MIA attack strength by reducing model overfitting.

Existing regularization methods include traditional techniques such as L2-norm regularization, dropout, data augmentation, model stacking, early stopping, and label smoothing. These methods lower overfitting by reducing the impact of different test datasets on samples, thereby also reducing the intensity of MIA attacks.

Additionally, specially designed regularization systems like adversarial regularization and Mixup + MMD (Maximum Mean Discrepancy) can also defend against MIA by introducing new regularization mechanisms to decrease the differences between members and non-members. Compared to masking techniques, regularization can resist both black-box and white-box attacks and can alter output parameters when modifying the output model.

# Knowledge Distallation

Knowledge distillation refers to the process of training a smaller student model using a larger teacher model to transfer knowledge from the large model to the small one, allowing the smaller model to achieve a similar level of approximation. Based on this, existing research has introduced methods such as DMP, CKD, and PCKD, they are generally called DMP (Distillation For Membership Privacy), whose steps are as followed:

Train an unprotected teacher model to record and label data in an unlabeled dataset.
Select data with lower prediction entropy for training, which is used for classification.
Train based on the labeled model.

Additionally, There are also some research that proposes Complementary Knowledge Distillation (CKD) and Pseudo Complementary Knowledge Distillation (PCKD) methods. In these methods, the transfer data for knowledge distillation comes from a private training set. CKD and PCKD eliminate the need for public data, which may be difficult to obtain in some applications, making knowledge distillation a more practical defense method for mitigating MIA attacks on machine learning models.

# Differencial Privacy

Differential privacy refers to protecting the original data by adding relevant noise to the dataset. When a deep learning model is trained with a model that incorporates differential privacy, if the privacy budget is small enough, the trained model will not retain specific user information. Therefore, different privacy models can limit the success rate of MIA attacks based solely on the model. Current research realms are shown in the next page:

Differential privacy provides theoretical protection for member privacy in training records and can mitigate MIAs in classification and generative models, regardless of whether the attacker is in a black-box or white-box setting. Despite its widespread and effective application, a drawback is its difficulty in providing an acceptable privacy-utility tradeoff in complex learning tasks. Additionally, differential privacy can also mitigate other forms of privacy attacks, such as attribute inference attacks and feature inference attacks, and is related to the robustness of models against adversarial examples.

Currently , the research realms about differencial privacy is:

The relationship between differential privacy and MIA: There are theoretical results and proofs on this, but practical evaluations have not achieved good utility.

Privacy-Utility Tradeoff: Existing research shows that current differential privacy performance in this regard is insufficient. Studies indicate that minority groups are more affected by MIAs, and differential privacy reduces model utility for these groups.

Training Methods: Current methods primarily include DP-SGD, with new methods like DP-Logits also being proposed.

Applications in Generative Models: Research shows that differential privacy can also defend against MIAs in generative models, with defense effectiveness related to generative quality and privacy budget 𝜖. Studies indicate that differential privacy limits overfitting and mitigates MIA.

# Possible future directions

# In Membership Inference Attacks

Attacks on Regularized Models: MIA systems often rely on the overfitting of machine learning systems, but this assumption is challenged by advancements in regularization techniques; attacks on overfitted models are still largely unexplored.

Attacks on Self-Supervised Models: Self-supervised models are becoming widespread in NLP and computer vision, and attacks on these models are still largely unknown.

Attacks on Adversarial Machine Learning: Adversarial machine learning shares some similarities and differences with membership inference attacks; combining these approaches could be a potential research direction.

Attacks on New Machine Learning Models (e.g., Contrastive Learning and Meta-Learning): These models differ significantly from traditional ones, and many areas for research remain in attacking them.

Attacks on Federated Learning: Existing MIAs are mainly applicable to homogeneous federated learning, with limited research on heterogeneous federated learning.

Applications Related to MIA: Includes source inference attacks in federated learning and deeper privacy protection studies through MIA audits of data contributions to ML models.

# In the Defence of Membership Inference Attacks

Defense Against Unsupervised Learning Models: Unsupervised learning models struggle with overfitting due to a lack of data labels, and research in this area is limited.

Defense Against Generative Models: Possible defenses include methods such as knowledge distillation and reinforcemnt learning to avoid leakage of raw data through the outputs of generative models.

Balancing Privacy and Utility: Existing differential privacy protections often add significant noise to classifier gradients, reducing prediction accuracy. Balancing privacy and utility remains an area for research.

Privacy Defenses in Federated Learning: With increasing privacy attacks in federated learning, developing defensive technologies is crucial, with differential privacy being a potential future direction.