# Introduction and Background

# Introduction of the Privacy Problem in Machine Learning

Machine learning is the foundation of popular Internet services such as image and speech recognition and natural language translation and recommend system, with many companies offering machine learning services via APIs (e.g., Google and Amazon).

However, these models often learn sensitive user data during training, leading to potential privacy risks. Specifically, machine learning models can inadvertently leak their training data, posing a threat to user privacy. This paper focuses on “membership inference attacks,” where an attacker can infer whether a specific data point was part of the model’s training set based on its outputs.

# Machine Learning Background

Machine learning is categorized into supervised and unsupervised learning: Supervised learning uses labeled data to train models to predict outputs from inputs.

A common issue in machine learning is overfitting, where the model performs well on training data but poorly on unseen data. Overfitted models are more likely to memorize training data, leading to privacy leaks, as they retain too much information about the data they trained on.

Well-regularized models should avoid overfitting, generalizing well to new data without revealing sensitive information from the training data.

# Privacy in Machine Learning

Machine learning models can unintentionally leak sensitive information about the data they were trained on.

Two primary types of privacy risks are:

Population-level inference: Inferring general patterns about the population used to train the model, which can reveal sensitive characteristics.

Membership inference: Determining if a specific individual’s data was included in the training set.

This paper focuses on membership inference attacks, as protecting the privacy of training set members is both practical and critical for users.

The risk is higher for models trained on private or sensitive data, such as healthcare records.

# Problem Statements

The core problem studied in this paper is membership inference attacks.
In a black-box setting, the attacker can query the model to get outputs but does not have access to the model’s structure or parameters.
The attacker’s goal is to determine if a specific data point was part of the model’s training set based on the model’s output.
The paper assumes that the attacker might have some background knowledge, such as understanding the input format or statistical distribution of the dataset.
The attack relies on detecting subtle differences in how the model behaves with data it has seen before (training data) versus new data (non-training data).

# Methods

# Main steps:

The attacker queries the target model with a data record and obtains the model’s prediction on that record. The prediction is a vector of probabilities, one per class, that the record belongs to a certain class. This prediction vector, along with the label of the target record, is passed to the attack model, which infers whether the record was in or out of the target model’s training dataset.

Main steps of MIA

# Shadow Models

The attacker uses the input and output data from shadow models to train the attack model. Specifically, the prediction results from the shadow models (confidence vectors or other outputs) are used as training data to train a binary classifier that can predict whether a particular data point was part of the shadow model’s training set.

Since the behavior of the shadow model is similar to that of the target model, the attack model can learn from the shadow model’s training to infer which data points were used in the training of the target model.

Shadow Models

# Training the Attack Model

The inputs and outputs of the shadow models are used to train the attack model：

The attack model is a binary classifier that learns to distinguish between “training data” (members) and“non-training data” (non-members).

It uses the prediction vectors from the shadow models to learn how to classify data points as members or non-members.

Main steps of training the attack models

# Experimental Evaluation

# Datasets and Target Models

The datasets used for experiments are described, including the type, size, and nature of the data, includes: Public datasets such as image datasets (e.g., CIFAR-10), location datasets, and some sensitive data sets .

In this paper, the authorevaluated our inference attacks on three types of target models: two constructed by cloud-based “machine learning as a service” platforms and one implemented locally. all the attacks treat the models as black boxes, which means they do not know the type or structure of the models they create, nor the values of the hyper-parameters used during the training process.

The training set and the test set of each target and shadow model are randomly selected from the respective datasets, have the same size, and are disjoint. There is no overlap between the datasets of the target model and those of the shadow models, but the datasets used for different shadow models can overlap with each other.

# Accuracy of the attack

This paper evaluate the attack by executing it on randomly reshuffled records from the target’s training and test datasets. And use the standard precision and recall metric to evaluate the percision.

The test accuracy of the target neural-network models with the largest training datasets is low, which means the models are heavily overfitted on their training sets.

For different deep learning APIs from different companies, the paper trained the same datas for the models and make the evaluation attacks, the result shows that Models trained using Google Prediction API exhibit the biggest leakage.

For the Texas hospital-stay dataset and location adtaset, the paper evaluated the attack against a Google-trained model.

The training accuracy of the Texas’s target model is 0.66 and its test accuracy is 0.51. Precision is mostly above 0.6, and for half of the classes, it is above 0.7. Precision is above 0.85 for more than 20 classes.

The training accuracy of the location’s target model is 1 and its test accuracy is 0.66.Precision is between 0.6 and 0.8, with an almost constant recall of 1.
The attacks against the google trained models for location data sets shows that the paper’s attacks are robust even if the attacker’s assumptions about the distribution of the target model’s training data are not very accurate.

For the majority of the target model’s classes, the paper’s attack achieves high precision. This demonstrates that a membership inference attack can be trained with only black-box access to the target model, without any prior knowledge about the distribution of the target model’s training data if the attacker can efficiently generate inputs that are classified by the target model with high confidence.

# Factors and Defenses

# Factors for Success of membership inference

Based on the evaluation, this paper bring forward two factors for a success membership inference attack:

generalizability of the target model

diversity of its training data

If the model overfits and does not generalize well to inputs beyond its training data, or if the training data is not representative, the model leaks information about its training inputs.

This paper also point out that overfitting is not the only reason why our inference attacks work. Different machine learning models, due to their different structures, “remember” different amounts of information about their training datasets. This leads to different amounts of information leakage even if the models are overfitted to the same degree

# Mitigation strategies

This paper indicates some strategies of mintigate the membership interface of the models:

Restrict the prediction vector to top k classes.
Coarsen（变粗糙） precision of the prediction vector.
Increase entropy of the prediction vector.
Use regularization.

However, in this paper, they make evaluation about these mitigation strategies, and find that their attack method is still robust against these mitigation strategies

# Conclusion

This paper have designed, implemented, and evaluated the first membership inference attack against machine learning models, notably black-box models trained in the commerical deep learning APIs.

This paper’s key technical innovation is the shadow training tech- nique that trains an attack model to distinguish the target model’s outputs on members versus non-members of its train- ing dataset.

Membership in hospital-stay and other health-care datasets is sensitive from the privacy perspective. Therefore, this method may have substantial practical privacy implications.