Bounding User Contributions for User-Level Differentially Private Mean Estimation

Bounding User Contributions for User-Level Differentially Private Mean Estimation

Authors: V. Arvind Rameshwar (IIT Madras) and Anshoo Tandon

We revisit the problem of releasing the sample mean of bounded samples in a dataset, privately, under user-level ε-differential privacy (DP). We aim to derive the optimal method of preprocessing data samples, within a canonical class of processing strategies, in terms of the estimation error. Typical error analyses of such bounding (or clipping) strategies in the literature assume that the data samples are independent and identically distributed (i.i.d.), and sometimes also that all users contribute the same number of samples (data homogeneity)—assumptions that do not accurately model real-world data distributions. Our main result in this work is a precise characterization of the preprocessing strategy that gives rise to the smallest worst-case error over all datasets – a distribution-independent error metric – while allowing for data heterogeneity. We also show via experimental studies that even for i.i.d. real-valued samples, our clipping strategy performs much better, in terms of average-case error, than the widely used bounding strategy of Amin et al. (2019).

Journal/Conference IEEE International Symposium of Information Theory (ISIT) 2026