New Publication: Inference for big data assisted by small area methods: an application on sustainable development goals sensitivity of enterprises in Italy

Francesco Schirripa Spagnolo, Gaia Bertarelli, Donato Summa, Monica Scannapieco, Monica Pratesi, Stefano Marchetti and Nicola Salvati have published their article Inference for big data assisted by small area methods: an application on sustainable development goals sensitivity of enterprises in Italy  in the Journal of the Royal Statistical Society.

Abstract

In this study, we proposed a new method for estimating the sensitivity of enterprises in Italy to the United Nation’s sustainable development goals at the provincial level using web-scraping data (a nonprobability sample) because this value is not surveyed by the Italian National Institute of Statistics. The proposed method used a probability sample to reduce the selection bias of estimates obtained from the nonprobability sample in the context of small area estimation and integrated nonprobability and probability samples using a double robust estimator that combined (i) propensity weighting to improve the representativeness of the nonprobability sample and (ii) a statistical model to predict the units that were not in the nonprobability sample. A bootstrap procedure for estimating variance was also proposed. To validate the proposed method, a Monte Carlo simulation was performed. Results showed that the proposed method allowed the correction of bias from the nonprobability sample while maintaining a good level of estimate reliability.