Factories of the Future
Media & Entertainment
Smart Cities
Smart Energy
Smart Ports
SME Opportunities
Societal Impacts
Technology Development
Telecoms Providers
5G Automotive
5G CAM Standardisation
5G Corridors
5G Multimodal Connectivity
5G Transport Network
Artificial Intelligence & Machine Learning
Artificial Intelligence & Machine Learning in big data
Artificial Intelligence & Machine Learning technologies
Big data
Big data algorithms
Big data analytics
Collaborative Classification and Models
Business Models, Process Improvement, Contract Management, KPIs and Benchmarking Indexes
Collaboration Risk and Value Sharing
Collaborative Planning and Synchromodality
Customs & Regulatory Compliance
Environmental Performance Management
Logistics Optimisation
Stock Optimisation
Supply Chain Corrective and Preventive Actions (CAPA)
Supply Chain Financing
Supply Chain Visibility
Common Information Objects
Customs Declarations
Transport Service Description
Transport Status
Computing and Processing
Big Data Management and Analytics
Knowledge Graphs
Machine Learning
Stream Processing
Connectivity Interfaces
Technologies (Bluetooth, Ethernet, Wifi)
Data Management, Simulation and Dashboards
Data Fusion
Data Governance, Integrity, Quality Management and Harmonization
Event Handling
Open Data
Statistics and Key Performance Indicators (KPIs)
Data market
Data ecosystem
Data marketplace
Data Platform
Data Providers
IoT Controllers
IoT Gateways
IoT Sensors
Tracking Sensors
Digitisation Frameworks
Control Towers
Data Pipelines
National Single Windows
Port Community Systems
Data Federation
Platform Federation
Industrial IoT Sectors
Rail Sector Active Predictive Maintenance
Data interoperability
Data interoperability mechanisms
Interoperability solutions
Platform interoperability
IoT Secuirty, Privacy and Safety Systems
PKI Technology
Data privacy preserving technologies
Privacy preserving technologies
Project Results
5G-SOLUTIONS Deliverables
5G-SOLUTIONS Publications
CHARIOT Capacity Building and Trainings
CHARIOT Deliverables
CHARIOT Publications
SELIS Deliverables
SELIS Publications and Press Releases
Project Results - 5g Routes
5G-ROUTES Deliverables
5G-ROUTES Innovation
5G-ROUTES Publications
Project Results - TRUSTS
TRUSTS Deliverable
TRUSTS Publications
Safety, Security and Privacy Systems
Access Management
Coordinated Border Management
Information Security
International Organisations
Risk Assessment and Management
Risk Management
Safety and Security Assessment
Source Code Analysis
Sectors and Stakeholders
Airports and Air Transport
Banks, investors and other funding providers
Custom Authorities
Facilities, Warehouses
Freight Forwarders
Inland Waterways
Multimodal Operators
Ports and Terminals
Road Transport
Smart Buildings
Trusties and other Intermediary Organizations
Urban and Countryside Logistics
Urban Logistics
Sectors and Stakeholders - TRUSTS
Audit & Law firms
Corporate offices
Financial Institutions
Secured Data
Secured Infrastructure
Secured Platform
Data sovereignty
Good Distribution Practices
International data standards
International Organization for Standardization (ISO)
World Customs Organization (WCO)
Supply Chain Management
Business Models, Process Improvement, Contract Management, KPIs and Benchmarking Indexes
Risk Management
Risk-Based Controls
Screening and tracking
Supervision Approach
Agile Deployment, Configuration Management
Business Applications
Business Integration Patterns, Publish-Subscribe
Cloud Technologies/Computing, Services Virtualisation
Community Node Platform and Application Monitoring
Connectivity Technologies (Interfaces and Block Chain)
Hybrid S/T Communication and Navigation Platforms
IoT (Sensors, platforms)
Physical Internet (PI)
Public key infrastructure (PKI)
Radio-frequency identification (RFID)

Big Data Management and Analytics

Can Privacy-Preserving Machine Learning Overcome Data-Sharing Worries?
by Brian Buntz 08/07/2020 00:00:00

Data volumes are exploding faster than our ability to interpret or secure them. Can techniques such as privacy-preserving machine learning address those challenges?

Privacy-preserving AI techniques could allow researchers to extract insights from sensitive data if cost and complexity barriers can be overcome. But as the concept of privacy-preserving artificial intelligence matures, so do data volumes and complexity. This year, the size of the digital universe could hit 44 zettabytes, according to the World Economic Forum. That sum is 40 times more bytes than the number of stars in the observable universe. And by 2025, IDC projects that number could nearly double.

More Data, More Privacy Problems

While the explosion in data volume, together with declining computation costs, has driven interest in artificial intelligence, a significant portion of data poses potential privacy and cybersecurity questions. Regulatory and cybersecurity issues concerning data abound. AI researchers are constrained by data quality and availability. Databases that would enable them, for instance, to shed light on common diseases or stamp out financial fraud — an estimated $5 trillion global problem — are difficult to obtain. Conversely, innocuous datasets like ImageNet have driven machine learning advances because they are freely available.

A traditional strategy to protect sensitive data is to anonymize it, stripping out confidential information. “Most of the privacy regulations have a clause that permits sufficiently anonymizing it instead of deleting data at request,” said Lisa Donchak, associate partner at McKinsey.

But the catch is, the explosion of data makes the task of re-identifying individuals in masked datasets progressively easier. The goal of protecting privacy is getting “harder and harder to solve because there are so many data snippets available,” said Zulfikar Ramzan, chief technology officer at RSA.

The Internet of Things (IoT) complicates the picture. Connected sensors, found in everything from surveillance cameras to industrial plants to fitness trackers, collect troves of sensitive data. With the appropriate privacy protections in place, such data could be a gold mine for AI research. But security and privacy concerns stand in the way.

Addressing such hurdles requires two things. First, a framework providing user controls and rights on the front-end protects data coming into a database. “That includes specifying who has access to my data and for what purpose,” said Casimir Wierzynski, senior director of AI products at Intel. Second, it requires sufficient data protection, including encrypting data while it is at rest or in transit. The latter is arguably a thornier challenge.

Insights Only for Those Who Need Them

Traditionally, machine learning works on unencrypted data in a collaborative process. “In almost all cases with machine learning, you have multiple stakeholders working together,” Wierzynski said. One stakeholder could own a training data set while another could own the machine learning model, and yet another provides an underlying machine learning service. Third-party domain experts could be tapped to help tune a machine learning model. In other scenarios, multiple parties’ datasets could be combined. “The more data you have, the more powerful model you can build,” Wierzynski said. But as the number of parties and datasets increases, so do security risks using conventional machine learning techniques.

Over the years, security professionals have sought to reduce the liabilities of unsecured data by deploying cryptography, biometrics and multifactor authentication. Interest in such techniques has paved the way for privacy-preserving machine learning techniques, according to Rajesh Iyengar, founder and CEO of Lincode Labs. Such techniques, ranging from multiparty computation to homomorphic encryption can enable “independent data owners collaboratively train the models on datasets without compromising the integrity and privacy of data,” Iyengar said.

Multiparty computation. For decades, researchers have explored the concept of answering questions on behalf of a third-party using data that they can’t see. One example is a technique known as secure multiparty computation. “Let’s say, you and I have some data, and we want to somehow do some analysis on our joint data set without each of us sharing our individual data,” Ramzan said. Multiparty computation makes that feat possible. Adoption of the technique is early, but interest in it is growing.

Federated learning. A related concept is federated learning where multiple entities begin with an initial version of a machine learning model. “They use only their local data to make improvements to those models, and then they share all of those improvements with a central entity,” Wierzynski said.

The technique has also gained traction. The University of Pennsylvania and Intel, for instance, is working with 29 international healthcare organizations to enlist federated learning to detect brain tumors. Google has also explored use of the method.

Differential privacy. Differential privacy can provide a defined privacy level in a given analytics operation by adding noise to data to make a data breach more difficult. This encryption type works best with large data sets. “Let’s say you had data on a million patients. Data that averages results from those patients isn’t going to reveal much about anyone in that group,” Wierzynski said. Given a large enough data set, researchers can deduce the probability that an attacker could expose information about an individual and add noise to obscure such data while protecting the accuracy of the data at large. “It’s much more powerful than just deleting their names,” Wierzynski added. Differential privacy’s ability to protect confidential data, however, diminishes in attacks involving multiple queries of its data.

Homomorphic encryption. Another related technique is homomorphic encryption, a computation technique operating on encrypted data. The data owner using the technique can decrypt the result it generates. Interest in the technique is building, including for election security.

In radiology, for instance, the technique could protect privacy while using AI-based analysis. A hospital could send an x-ray to a cloud-based service to provide an AI equivalent of a second opinion for a diagnosis. They could do that by encrypting the image and sending it to a machine learning service that operates on the data without decrypting it. When the encrypted file returns, its recipient with a secret key can view the diagnostic result. “It’s a compelling way of resolving this tension between privacy and the power of AI,” Wierzynski said.

The downside is the slow speed of the technique, though it is improving. In the past, the method could be millions of times slower than unencrypted computation. “Now, it’s probably closer to like a factor of 10 to 100 [times slower] than regular computation,” Wierzynski said. In some cases, the difference doesn’t matter. If it takes 50 milliseconds to do AI inference on an unencrypted image, waiting 5 seconds for the encrypted version of processing would be acceptable in many cases.

While the concept of fully homomorphic encryption is a hot research topic, the technique remains immature, Ramzan said. “When you’re focusing on a specific domain, you might get things to be efficient enough to be useful in practice,” Ramzan said.

The idea of broadly deploying homomorphic encryption in AI is “maybe the equivalent is getting to Mars,” Ramzan said. While it is possible in the relatively near term, it could easily take several years for it to be feasible.

But while privacy-preserving machine learning techniques will become more practical in the long-term, context will likely dictate when they are useful. “Whether [such techniques] will be practical enough that people are willing to pay the cost penalty, that is a bit of an open question in my mind,” Ramzan said.

Reference Link

Attached Documents

The “CHARIOT IoT Search Index” aims to provide a web location where publications, articles, and relevant documents can be centralized hosted in a well-structured and easily accessed way.


Contact Us
Enter Text
Contact our department