Machine Unlearning: Can It Really Forget? -

Machine Unlearning: Can It Really Forget?

From theoretical concepts to practical applications, the journey of AI/ML has been nothing short of extraordinary. The first AI robot was built in the 1960s to lift hot metal pieces of iron. ML was developed in 1952 to play a game of checkers and was able to improve its game by learning from its own experience!

AI and ML have boomed since then and have quickly gained traction. But every technology comes at a price. And if not handled cautiously, it can cause more harm than good.

Take Amazon’s facial recognition tool as an example. In 2018, the American Civil Liberties Union showed how Amazon’s AI-based Rekognition facial recognition system failed horribly. Nearly 40% of Rekognition’s false matches were of people of color.

That’s exactly why machine unlearning plays an important role. It “forgets” unbiased information from your model, so that no one is discriminated against. In today’s article, we will be covering more details about this technology and how you can use it for your benefit.

What is Machine Unlearning?

As per Stanford, machine unlearning is the process of removing specific data without having to retrain the entire model from scratch. The goal is to make an “unlearned model” that behaves similarly to a retrained model.

This can be achieved in 2 ways –

i) Access Revocation (Privacy and Copyright): Confidential data that was borrowed without permission is returned. Returning the data through access revocation can be difficult, as this new data becomes accessible to the model.

ii) Model Correction: We fix the model through the concept of “corrective machine unlearning”, where biased or harmful content is removed or edited. This perspective of unlearning can also be viewed as a post-training risk mitigation mechanism for AI safety concerns.

However, it’s important to be mindful of the information you wish to erase. Erased data may cause chaos in your machine-learning algorithms and impact your model’s accuracy. We will discuss these challenges later in this discussion. If you want, you can jump to section 4. For now, let’s understand the basics.

Importance of Machine Unlearning

From data privacy to regulatory compliance, machine unlearning is being welcomed by businesses across the globe. Here are a few ways this can help your business –

1) Data Breach Mitigation

Many banks like Global Finance Corporation use machine learning algorithms to predict credit card risks and detect fraudulent activities. To prevent data breach from happening, they use machine-unlearning LLMs to retrain their model.

Additionally, follow data privacy laws like General Data Protection Regulation (GDPR) and Central Consumer Protection Authority (CCPA) which allows their customers to have their personal information erased from their system.

As a result, the risk of regulated penalties is reduced and their customer trust is improved tremendously.

2) Removing Bias and Harmful Information

A social media platform called ConnectSphere observed that some of their customer’s posts were getting flagged as harmful content because it exhibited a bias towards them.

To remove all kinds of discrimination against certain demographic groups, machine unlearning can remove skewed data and fine-tune the model’s data without having to retrain it from scratch.

This benefits all users due to improved fairness. It can be used to create an inclusive space online and be a welcoming platform for all users.

3) Protection Against Cyberattacks

ShopEasy is an online retailer that was targeted by cyberattacks. Hackers injected harmful data in their platform that caused to misclassify as fraudulent transactions as legitimate.

They used machine unlearning techniques to identify and remove suspicious activities from their platform. By selectively removing these data and using advanced security techniques such as Shared Adversarial Unlearning, it can help in defending yourself from such attacks.

The benefits include – Reduced financial losses, enhanced security from fraud, and improved customer trust with customers.

4) Cost Efficiency

Streamers is a popular video streaming service that uses machine learning algorithms to provide personalized recommendations to their customers.

When a customer requests their viewing history to be deleted, Streamverse can use machine unlearning llms to “forget” their viewing history. This saves time and resources for the company, as they don’t have to retrain their model from scratch.

Benefits include – Faster response times to data deletion requests and reduced computational costs.

3 Forms of Machine Unlearning

According to Stanford, there are 3 forms of machine unlearning llms, some of which are preferred over others because retraining the models can be expensive and time-consuming.

i) Exact Unlearning

This approach aims to completely remove the data and make your model behave as if this information never existed. Techniques such as the SISA (Sharding, Isolation, Slicing, and Aggregation) framework can support exact unlearning, as the training data can be divided into disjoint subsets.

ii) Differential Privacy Based Unlearning

This method uses privacy techniques to reduce a model’s dependency on specific data, rather than fully deleting it, while keeping the model working well. This is comparatively safer than the exact unlearning methods as it will not affect your machine learning algorithms and protect your customer’s privacy.

iii) Empirical Unlearning with Unknown Example Space

This machine unlearning approach focuses on scenarios that’s unknown. This approach teaches what to remove by watching how the model behaves, rather than needing a preset list of data. It’s flexible and can handle new information continuously, but when it forgets old examples to make room for new ones, its accuracy and performance may decrease.

Challenges of Machine Unlearning

Although this seems to be a promising approach to protecting sensitive data and your brand’s image, it has a few limitations of its own. Here are the top 3 challenges you should be aware of –

i) Limited Understanding

There’s one size fits all and each form would require a different approach. Even though this is a new technology, it has limited understanding or knowing the best practices the machine learning llm should follow.

For example – Exact unlearning acts as if this data never existed and can work well, but it can be quite expensive and impractical for businesses in the long run. Also, differential privacy-based unlearning might negatively impact the model’s accuracy.

ii) Computational Costs

Fully retraining these models to remove specific data can be expensive and time-consuming. Current retraining methods can harm model performance. For large models used in business applications, retraining after each data removal request isn’t feasible.

It’s simply not practical to do a full retraining every time someone wants the model to forget something.

iii) Measuring the Effectiveness

Even if we remove certain information, how do we know that it was successful? Unfortunately, it lacks clear evaluation metrics. Some of these questions always remain unanswered. How much of the effect of the information that was removed remains in the model?

Using machine unlearning in real commercial settings can be difficult because of these uncertainties.

Wrapping Up

Although still in its early stages, machine unlearning has emerged as an essential tool for businesses that want to build privacy-first and bias-free models. Despite the challenges, many businesses are relying on machine-learning llms to avoid legal risks and create a fairer and safer platform for their customers.

If you’re still unsure how to get started, get in touch with us. Our machine-learning experts can help you get started and make you the next frontier in the machine-learning realm.

Frequently Asked Questions (FAQ)

Q1. What’s machine unlearning?

Ans 1 – It’s the process of erasing unwanted information that can use biases and privacy concerns, without having to retrain your machine learning model.

Q2. What are the different approaches of machine unlearning?

Ans 2 – There are 3 main approaches or forms of maching unlearning. They are exact unlearning, differential privacy based unlearning, and Empirical Unlearning with Unknown Example Space.

Q3. Why is evaluating machine unlearning difficult?

Ans 3 – It’s difficult due to the lack of clear “unlearned” behavior. Measuring “forgetting quality” poses difficulties since we must assess efficiency, influence on model utility, and how well the data was really forgotten.

Machine Unlearning: Can It Really Forget?