Federated Machine Learning: Revolutionizing Data Privacy

Federated machine learning is a new approach to machine learning that has gained a lot of attention in recent years. It has significant implications for data privacy and security, as it allows sensitive data to be kept on local devices rather than being shared with third-party servers.

In this article, we will discuss the pros and cons of federated learning and how it can be used by various industries.

What is federated machine learning?

What is Federated__Machine_Learning

Federated machine learning or collaborative ML is a decentralized approach to machine learning where multiple devices collaborate to train a model without sharing their data with each other or a central server.

Training in ML

This differs from traditional machine learning methods where all data is centralized and processed on a single server. Instead of sharing the data with the central machine, the base model distributed by the company is trained on local devices. The updated parameters of the model are then shared with the central machine.

*“Federated learning is a way to train AI models without anyone seeing or touching your data, offering a way to unlock information to feed new AI applications.” ― IBM*

Training in Federated ML

Instead, the central node of the system receives updated parameters from many devices where the data is processed locally and creates, based on them, a consensus model. The base model is then again shared with the devices. The process is continuous and repetitive.

Federated learning happens in steps:

FL tarining steps

First, you need to set up client-server interactions known as a federated learning round, where the current global model state is transmitted to participating nodes.

Then, local models are trained on these nodes, and potential model updates are aggregated into a single global update and applied to the global model. The parameters are shared with the global model in the encrypted form.

FL steps

This methodology can be implemented with a central server for aggregation or in a peer-to-peer approach using gossip or consensus methodologies.

Recent developments have introduced techniques to tackle asynchronicity during the training process or training with dynamically varying models, such as split learning, which can be applied in both centralized and decentralized federated learning settings.

Pros and cons of federated learning

Federated learning helps to improve data privacy as data remains on the device or within the institution and isn’t shared with anybody, which leads to increased trust from users and compliance with data privacy regulations. Federated machine learning also enables more efficient model training as devices can train models locally and only send updates to the central server, reducing the amount of data that needs to be transmitted.

As for the disadvantages, federated learning still requires a significant amount of data to be transmitted, which can be challenging if the network bandwidth is limited. Moreover, the data collected by different clients may be heterogeneous, making it difficult to train a model that performs well across all clients. Federated learning can also make it difficult to interpret the final model’s behavior since it is trained on multiple clients’ data.

If you’re interested you can read more about federated learning on Google Research, where scientists from Google share how they learn to overcome challenges of federated learning.

What happens when companies don’t use federated learning?

Processing sensitive user data on third-party servers can potentially cause a lot of problems. Processing user data by third-parties always raises a lot of concerns about how this data is protected. Breaches, as well as deliberate sharing of the data with unauthorized third-parties, for example, for marketing purposes, is, unfortunately, a common practice. However, these practices can result in the loss or theft of sensitive information.

Users have little to no control over what data is being shared and with whom and are usually not aware of what exactly is happening with their information. This may result in lawsuits and other legal problems caused by the lack of compliance with privacy laws.

The accuracy of models or algorithms developed using user data processed on servers or by third parties may be impacted by errors or inaccuracies in the data.

That is why federated learning is an increasingly popular approach, especially among product companies.

Federated machine learning in practice

Here is how different companies use federated learning in their products.

Google

Google uses federated learning to improve the prediction accuracy of its keyboard app, Gboard. The app learns from the typing habits of users without sending their personal data to a central server.

The advantage of using federated learning in this case is also that the model is able to learn from the data of a particular user and, therefore, adapt to their needs. Different people use different common words, so training the model on a particular person’s data allows to tailor to those needs.

Federated ML at Google

Source: Google Research

Apple

Apple uses federated learning to improve the accuracy of its Siri voice assistant. The company trains Siri on users’ devices without collecting their personal data.

Again, ways in which different users apply Siri may be different, there is no one standard behavior to adapt the model to. Moreover, it is necessary to learn from user data for Siri to become better at understanding a particular voice and accent of its users. At the same time, a lot of information that Siri receives is private information and avoiding sharing it with third-parties can prevent malicious use and win public trust.

You can read more about how Apple uses federated learning on AnalyticsIndiaMag.

Healthcare

Federated machine learning is being used in healthcare to help with medical diagnosis and discover new drugs. Some examples of using federated learning for drug discovery include Novartis and Merck.

NVIDIA proposes a solution for federated machine learning for healthcare institutions. Instead of providing patient data to improve the results of medical diagnosis, institutions only share the parameters of the models. The data stays within the company.

What will be the future of federated machine learning?

Federated ML can have a huge impact on data privacy and security. Today when most big companies have lost the trust of their customers because of their negligent approach to data sharing, federated machine learning can help to fix things.

By allowing models to be trained on decentralized data, federated machine learning can help protect sensitive information from being exposed to potential breaches or unauthorized access.

Conclusion

Federated machine learning doesn’t require the participants to share their sensitive data with parties they don’t trust. Users and institutions remain the sole owners of their data while getting all the benefits that machine learning solutions can provide. That is why smartphone manufacturers, multimedia companies, fintech, and healthcare service providers transition to federated machine learning rather than traditional ML.

Further reading on machine learning use cases and techniques:

Banner that links to Serokell Shop. You can buy stylish FP T-shirts there!
More from Serokell
17 Resources to learn Rust programming language17 Resources to learn Rust programming language
Naive Bayes classifiers in MLNaive Bayes classifiers in ML
Haskell in Production: SimpleXHaskell in Production: SimpleX