Incidence of the Sample Size Distribution on One-Shot Federated Learning
Marie Garin, Gonzalo Iñaki Quintana
Marie Garin, and Gonzalo Iñaki Quintana, Incidence of the Sample Size Distribution on One-Shot Federated Learning, Image Processing On Line, 13 (2023), pp. 57–64.

Communicated by Jean-Michel Morel
Demo edited by Marie Garin and Gonzalo Quintana


Federated Learning (FL) is a learning paradigm where multiple nodes collaboratively train a model by only exchanging updates or parameters. This enables to keep data locally, therefore enhancing privacy - statement requiring nuance, e.g. memorization of training data in language models. Depending on the application, the number of samples that each node contains can be very different, which can impact the training and the final performance. This work studies the impact of the per-node sample size distribution on the mean squared error (MSE) of the one-shot federated estimator. We focus on one-shot aggregation of statistical estimations made across disjoint, independent and identically distributed (i.i.d.) data sources, in the context of empirical risk minimization. In distributed learning, it is well-known that for a total number of m nodes, each node should contain at least m samples to equal the performance of centralized training. In a federated scenario, this result remains true, but now applies to the mean of the per-node sample size distribution. The demo enables to visualize this effect as well as to compare the behavior of the FESC (Federated Estimation with Statistical Correction) algorithm - a weighting scheme which depends on the local sample size - with respect to the classical federated estimator and the centralized one, for a large collection of distributions, number of nodes, and features space dimension.

This is an MLBriefs article, the source code has not been reviewed!
The original source code is available here (last checked 2023/02/12).