Here’s how machine learning can violate your privacy

Here's how machine learning can violate your privacy

Machine learning has pushed the boundaries in several fields, including personalized medicine, self-driving cars and customized advertisements. Research has shown, however, that these systems memorize aspects of the data they were trained with in order to learn patterns, which raises concerns for privacy.

In statistics and machine learning, the goal is to learn from past data to make new predictions or inferences about future data. In order to achieve this goal, the statistician or machine learning expert selects a model to capture the suspected patterns in the data. A model applies a simplifying structure to the data, which makes it possible to learn patterns and make predictions.

Complex machine learning models have some inherent pros and cons. On the positive side, they can learn much more complex patterns and work with richer datasets for tasks such as image recognition and predicting how a specific person will respond to a treatment.

However, they also have the risk of overfitting to the data. This means that they make accurate predictions about the data they were trained with but start to learn additional aspects of the data that are not directly related to the task at hand. This leads to models that aren’t generalized, meaning they perform poorly on new data that is the same type but not exactly the same as the training data.

While there are techniques to address the predictive error associated with overfitting, there are also privacy concerns from being able to learn so much from the data.

How machine learning algorithms make inferences

Each model has a certain number of parameters. A parameter is an element of a model that can be changed. Each parameter has a value, or setting, that the model derives from the training data. Parameters can be thought of as the different knobs that can be turned to affect the performance of the algorithm. While a straight-line pattern has only two knobs, the slope and intercept, machine learning models have a great many parameters. For example, the language model GPT-3, has 175 billion.

In order to choose the parameters, machine learning methods use training data with the goal of minimizing the predictive error on the training data. For example, if the goal is to predict whether a person would respond well to a certain medical treatment based on their medical history, the machine learning model would make predictions about the data where the model’s developers know whether someone responded well or poorly. The model is rewarded for predictions that are correct and penalized for incorrect predictions, which leads the algorithm to adjust its parameters – that is, turn some of the “knobs” – and try again.

The basics of machine learning explained.

To avoid overfitting the training data, machine learning models are checked against a validation dataset as well. The validation dataset is a separate dataset that…

Access the original article

Subscribe
Don't miss the best news ! Subscribe to our free newsletter :