Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation

Abdelrahman Abouelenin, Mohamed Abdelrehim, Raffy Fahim, Amr Hendy, Mohamed Afify

2025-05-08Privacy Preserving Language Modelling

Abstract

In this paper we train a transformer using differential privacy (DP) for language modeling in SwiftKey. We run multiple experiments to balance the trade-off between the model size, run-time speed and accuracy. We show that we get small and consistent gains in the next-word-prediction and accuracy with graceful increase in memory and speed compared to the production GRU. This is obtained by scaling down a GPT2 architecture to fit the required size and a two stage training process that builds a seed model on general data and DP finetunes it on typing data. The transformer is integrated using ONNX offering both flexibility and efficiency.

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 Transformer-Based Person Identification via Wi-Fi CSI Amplitude and Phase Perturbations2025-07-17 Privacy-Preserving Fusion for Multi-Sensor Systems Under Multiple Packet Dropouts2025-07-17 Making Language Model a Hierarchical Classifier and Generator2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17