Skip to main content

Applications and Systems

An edge-computing infrastructure diagram shows distributed edge devices, gateways, and a central cloud, illustrating the deployment topology federated learning targets.

Figure: Federated learning naturally fits edge-computing deployments where sensors, gateways, and a coordinating cloud share an optimization workload but not raw data. Image: Wikimedia Commons, Magnaweb, CC BY-SA 4.0.

Federated learning becomes concrete when the data boundary matters. The original motivating examples were mobile keyboards and on-device language models, where typed text is abundant, sensitive, and naturally local [1], [4]. The same pattern appears in hospitals that cannot pool patient records, banks that cannot freely exchange transaction histories, vehicles that observe different roads, labs that hold private genomic data, and modern LLM deployments where full-model fine-tuning is too large to ship every round.

Systems work decides whether those ideas survive contact with production. An FL deployment needs client orchestration, eligibility rules, secure aggregation, privacy accounting, update compression, model validation, rollback, monitoring, and governance. The LLM era adds model-size pressure: Yao et al. survey federated LLM fine-tuning and prompt learning under heterogeneity, privacy, and communication constraints [2], while Yang et al. survey FedLoRA, where clients transmit low-rank adapter parameters instead of full foundation-model updates [3].

Definitions

Mobile keyboard FL trains next-word prediction, emoji prediction, ranking, or personalization models on-device. The data are sensitive and self-labeled by user interaction. Clients are numerous, unreliable, and only briefly eligible [4].

Healthcare FL trains across hospitals, imaging centers, labs, or clinical networks. The data may be protected by regulation, consent, institutional review, and data-use agreements. Cross-silo FL is common because each hospital is a known party with governance obligations.

Finance FL includes cross-bank fraud detection, anti-money-laundering models, credit-risk features, and privacy-preserving analytics. It is often cross-silo and audit-heavy.

Autonomous-driving FL uses fleets or organizations that collect perception, mapping, driving-behavior, and rare-event data. Communication cost, safety validation, simulation, and data imbalance dominate.

IoT and TinyML FL trains or adapts models on sensors, microcontrollers, gateways, and edge devices. Constraints include memory, energy, intermittent networking, and nonstationary local environments.

Federated recommendation trains user or item representations without centralizing user histories. It may combine local personalization, negative sampling, secure aggregation, and differential privacy.

Federated LLMs apply FL to large language models through full fine-tuning, parameter-efficient tuning, prompt tuning, adapters, LoRA, black-box prompt optimization, or distillation [2]. The main challenges are model size, heterogeneous compute, heterogeneous tasks, privacy leakage from text, and evaluation.

LoRA represents a weight update for a matrix WRm×nW\in\mathbb{R}^{m\times n} as

ΔW=BA,BRm×r,ARr×n,rmin(m,n).\Delta W=BA,\qquad B\in\mathbb{R}^{m\times r},\quad A\in\mathbb{R}^{r\times n},\quad r\ll \min(m,n).

FedLoRA combines LoRA with FL: each client trains low-rank adapters (Ak,Bk)(A_k,B_k) and communicates adapter parameters or reconstructed low-rank updates. Yang et al. organize FedLoRA challenges into distributed learning, heterogeneity, and efficiency, including aggregation discordance, heterogeneous ranks, personalization, clustering, split learning, and compression [3].

Key results

Mobile keyboards are the canonical cross-device success story. Hard et al. describe federated learning for mobile keyboard prediction, and McMahan et al. study recurrent language models with differential privacy [4], [5]. The setting aligns well with FL: labels arise from typed text, raw records are sensitive, clients are plentiful, and the product can benefit from population-level improvement plus local personalization.

Healthcare is a natural cross-silo application because institutional data cannot always move. FL has been used or proposed for radiology, pathology, ICU prediction, clinical NLP, and multi-site studies [8], [9]. NVIDIA FLARE, OpenFL, and other frameworks focus heavily on this domain because cross-silo deployments need secure provisioning, job orchestration, auditability, and researcher-facing workflows. Drug-discovery consortia such as MELLODDY showed how multiple pharmaceutical partners can collaborate on predictive modeling while protecting proprietary compound data [10].

Finance has similar incentives but different governance. Fraud and AML models benefit from patterns across institutions, yet raw transaction records and customer data are highly restricted. Cross-bank FL must handle regulatory audit trails, model-risk management, privacy controls, and adversarial behavior. A design that is acceptable for research may be unacceptable if a regulator cannot reconstruct who trained what, when, and under which policy.

Autonomous driving and robotics emphasize distribution shift. Vehicles see different weather, roads, signs, driver behaviors, and rare events. FL can reduce raw-data upload, but safety-critical models need validation gates before deployment. In many cases, the better production pattern is not fully autonomous on-vehicle training, but federated or distributed collection of updates, simulation, offline validation, and staged rollout.

IoT and TinyML push resource constraints further. Sensors may have kilobytes to megabytes of memory, small batteries, and unreliable links. Training may occur on gateways rather than microcontrollers. Compression, event-triggered communication, and small personalized heads are often more realistic than full-model training.

Recommendation systems are a bridge between personalization and privacy. A global item model may be shared, while user embeddings, histories, or ranking adapters remain local. Federated recommendation often needs negative sampling, privacy-preserving aggregation, and careful evaluation because user distributions are extremely skewed.

The LLM era changes scale. Full-model FL for a 7B-parameter model is usually infeasible for cross-device clients: even float16 model deltas are about 1414 GB per client per round before optimizer state or protocol overhead. Federated LLM work therefore emphasizes prompt tuning, parameter-efficient fine-tuning, zeroth-order or forward-only methods, distillation, and heterogeneous device support [2]. The FedLLM survey highlights fine-tuning and prompt learning as the central categories, with open directions in pre-training, federated agents, and LLMs assisting FL workflows [2].

A FedLoRA diagram contrasts full-parameter federated fine-tuning with low-rank adapter exchange across clients and a server-side aggregator.

Figure: FedLoRA exchanges only the low-rank adapter parameters per client, slashing per-round communication relative to full-model federated fine-tuning. From Yang et al., 2025 — embedded under educational fair use with attribution.

FedLoRA is the most important adapter-based pattern. Instead of transmitting every parameter, clients send low-rank matrices. If WW is 4096×40964096\times4096 and rank r=8r=8, a full matrix update has 16,777,21616{,}777{,}216 parameters, while LoRA has 8(4096+4096)=65,5368(4096+4096)=65{,}536 parameters, a 256256 times reduction before metadata. The challenge is aggregation: averaging AA and BB separately is not generally equivalent to averaging BABA. Yang et al. describe approaches such as full-rank reconstruction then decomposition, stacking, selective aggregation, heterogeneous ranks, and personalized adapters [3].

DeploymentTypical FL typeMain blockerCommon system choices
Mobile keyboardCross-deviceAvailability, privacy, uplinkClient sampling, secure aggregation, DP
Healthcare imagingCross-siloGovernance, feature shiftFLARE/OpenFL, FedBN, audit logs
Finance fraud/AMLCross-siloRegulation, adversariesMPC/secure aggregation, strong audit
Autonomous drivingFleet/cross-siloSafety validation, rare eventsOffline validation, staged rollout
IoT/TinyMLEdge/cross-deviceMemory and energyCompression, small heads, gateways
RecommendationCross-device/hybridPersonalization and skewLocal embeddings, DP, secure aggregation
Federated LLMsCross-silo or edgeModel sizePrompt tuning, adapters, LoRA, distillation

Visual

Worked example 1: Full-model FL versus LoRA transmission

Problem. A transformer layer has a dense projection matrix WR4096×4096W\in\mathbb{R}^{4096\times4096}. Compare float16 communication for a full update ΔW\Delta W with LoRA rank r=8r=8, where each transmitted parameter uses 22 bytes.

Step 1: full update parameter count.

4096×4096=16,777,216.4096\times4096=16{,}777{,}216.

Step 2: full update bytes.

16,777,216×2=33,554,432 bytes32.0 MiB.16{,}777{,}216\times2=33{,}554{,}432\text{ bytes}\approx 32.0\text{ MiB}.

Step 3: LoRA parameter count.

LoRA sends AR8×4096A\in\mathbb{R}^{8\times4096} and BR4096×8B\in\mathbb{R}^{4096\times8}:

8(4096)+4096(8)=65,536.8(4096)+4096(8)=65{,}536.

Step 4: LoRA bytes.

65,536×2=131,072 bytes=128 KiB.65{,}536\times2=131{,}072\text{ bytes}=128\text{ KiB}.

Step 5: ratio.

33,554,432131,072=256.\frac{33{,}554{,}432}{131{,}072}=256.

Checked answer. For this matrix, LoRA rank 88 reduces transmitted parameters by 256256 times. Across many layers the exact ratio depends on which matrices receive adapters, but the principle explains why FedLoRA is central for foundation models.

Worked example 2: Cross-silo round time for full update versus adapter

Problem. A hospital consortium has 1212 hospitals. Each hospital uploads over a sustained 5050 Mbps link. A full model update is 200200 MB. A secure aggregation protocol adds 50%50\% communication overhead. An adapter update is 88 MB with the same overhead. Estimate the upload time per hospital and the synchronous round upload bottleneck if all hospitals have the same bandwidth.

Step 1: full update with overhead.

200 MB×1.5=300 MB.200\text{ MB}\times1.5=300\text{ MB}.

Convert to megabits:

300 MB×8=2400 Mb.300\text{ MB}\times8=2400\text{ Mb}.

Step 2: full update upload time.

2400 Mb50 Mbps=48 seconds.\frac{2400\text{ Mb}}{50\text{ Mbps}}=48\text{ seconds}.

Step 3: adapter update with overhead.

8 MB×1.5=12 MB.8\text{ MB}\times1.5=12\text{ MB}. 12 MB×8=96 Mb.12\text{ MB}\times8=96\text{ Mb}.

Step 4: adapter upload time.

96 Mb50 Mbps=1.92 seconds.\frac{96\text{ Mb}}{50\text{ Mbps}}=1.92\text{ seconds}.

Step 5: synchronous round bottleneck.

If all hospitals have the same bandwidth and upload in parallel, the upload bottleneck is the per-hospital time: 4848 seconds for full updates and 1.921.92 seconds for adapters. The number of hospitals affects server ingress capacity and aggregation work, but not this equal-link bottleneck assumption.

Checked answer. Adapter communication cuts this upload phase by 48/1.92=2548/1.92=25 times in the toy consortium.

Code

def lora_params(m, n, r):
return r * (m + n)

def bytes_to_mib(num_bytes):
return num_bytes / (1024 ** 2)

def upload_seconds(payload_mb, overhead_factor, bandwidth_mbps):
megabits = payload_mb * overhead_factor * 8.0
return megabits / bandwidth_mbps

m = n = 4096
r = 8
full_params = m * n
adapter_params = lora_params(m, n, r)

print("Full params:", full_params)
print("LoRA params:", adapter_params)
print("Ratio:", full_params / adapter_params)
print("Full MiB:", bytes_to_mib(full_params * 2))
print("LoRA KiB:", adapter_params * 2 / 1024)

print("Full upload seconds:", upload_seconds(200, 1.5, 50))
print("Adapter upload seconds:", upload_seconds(8, 1.5, 50))

Common pitfalls

  • Treating a research FL algorithm as a deployment architecture.
  • Ignoring client eligibility, retries, and failed rounds in cross-device systems.
  • Assuming healthcare FL removes the need for governance, consent, or audit.
  • Using a public validation set that does not represent every silo or client population.
  • Equating secure aggregation with regulatory compliance.
  • Forgetting model rollback and staged rollout for safety-critical applications.
  • Sending full foundation-model updates when adapters or prompts would match the system budget.
  • Averaging LoRA factors AA and BB separately without checking aggregation discordance.
  • Ignoring heterogeneous LoRA ranks, memory limits, and client-specific adapter choices.
  • Evaluating federated LLMs only on global benchmarks and not on client-local tasks.
  • Assuming prompt tuning is always cheaper; prompt length, rounds, and optimizer choice still matter.
  • Choosing an FL framework before specifying cross-device versus cross-silo requirements.
  • Underestimating logging, identity, certificate, and policy management in cross-silo deployments.
  • Treating recommendation personalization as optional when user distributions are highly skewed.

Connections

References

[1] H. B. McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data," AISTATS, 2017. https://arxiv.org/abs/1602.05629

[2] Y. Yao et al., "Federated Large Language Models: Current Progress and Future Directions," 2024. https://arxiv.org/abs/2409.15723

[3] Y. Yang et al., "Federated Low-Rank Adaptation for Foundation Models: A Survey," 2025. https://arxiv.org/abs/2505.13502

[4] A. Hard et al., "Federated Learning for Mobile Keyboard Prediction," 2018. https://arxiv.org/abs/1811.03604

[5] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, "Learning Differentially Private Recurrent Language Models," ICLR, 2018. https://arxiv.org/abs/1710.06963

[6] K. Bonawitz et al., "Towards Federated Learning at Scale: System Design," MLSys, 2019. https://arxiv.org/abs/1902.01046

[7] K. Bonawitz et al., "Practical Secure Aggregation for Privacy-Preserving Machine Learning," CCS, 2017. https://dl.acm.org/doi/10.1145/3133956.3133982

[8] N. Rieke et al., "The Future of Digital Health with Federated Learning," NPJ Digital Medicine, 2020. https://www.nature.com/articles/s41746-020-00323-1

[9] M. J. Sheller et al., "Federated Learning in Medicine: Facilitating Multi-Institutional Collaborations Without Sharing Patient Data," Scientific Reports, 2020. https://www.nature.com/articles/s41598-020-69250-1

[10] J. V. S. C. V. V. et al., "MELLODDY: Cross-Pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR Without Compromising Proprietary Information," Journal of Chemical Information and Modeling, 2022. https://doi.org/10.1021/acs.jcim.1c00799

[11] H. R. Roth et al., "NVIDIA FLARE: Federated Learning from Simulation to Real-World," 2022. https://arxiv.org/abs/2210.13291

[12] M. Foley et al., "OpenFL: The Open Federated Learning Library," Physics in Medicine and Biology, 2022. https://doi.org/10.1088/1361-6560/ac97d9

[13] D. J. Beutel et al., "Flower: A Friendly Federated Learning Research Framework," 2020. https://arxiv.org/abs/2007.14390

[14] T. Ryffel et al., "A Generic Framework for Privacy Preserving Deep Learning," 2018. https://arxiv.org/abs/1811.04017

[15] Y. Liu et al., "FATE: An Industrial Grade Platform for Collaborative Learning With Data Protection," Journal of Machine Learning Research, 2021. https://www.jmlr.org/papers/v22/20-815.html

[16] E. J. Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models," ICLR, 2022. https://arxiv.org/abs/2106.09685

[17] X. Li et al., "FedBN: Federated Learning on Non-IID Features via Local Batch Normalization," ICLR, 2021. https://openreview.net/forum?id=6YEQUn0QICG

[18] Q. Li, B. He, and D. Song, "Model-Contrastive Federated Learning," CVPR, 2021. https://arxiv.org/abs/2103.16257

[19] P. Kairouz et al., "Advances and Open Problems in Federated Learning," Foundations and Trends in Machine Learning, 2021. https://arxiv.org/abs/1912.04977

[20] L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, "Exploiting Shared Representations for Personalized Federated Learning," ICML, 2021. https://arxiv.org/abs/2102.07078