Scholarly research

My research interests lie in developing Bayesian statistical methodology for applied problems that intersect with the social sciences. In particular, I focus on Bayesian methodology on how to help statistical agencies to release their collected microdata from surveys to the public, in a useful and safe way. Through the synthetic data approach, data disseminators can generate simulated data from statistical models estimated from the confidential data, and release synthetic data to the public.

 

Publications/Manuscripts:

  1. Hu, J., Savitsky, T. D. and Williams, M. R., Private tabular survey data products through synthetic microdata generation, submitted.
  2. Hornby, R.* and Hu, J., Identification risk evaluation of continuous synthesized variables, submitted. arXiv link
  3. Hu, J., Savitsky, T. D. and Williams, M. R., Re-weighting of vector-weighted mechanisms for utility maximization under differential privacy, submitted. arXiv link
  4. Savitsky, T. D., Williams, M. R. and Hu, J., Bayesian pseudo posterior mechanism under differential privacy, submitted. arXiv link
  5. Hu, J., Savitsky, T. D. and Williams, M. R., Risk-efficient Bayesian pseudo posterior data synthesis for privacy protection, submitted. arXiv link
  6. Hu, J., Akande, O., and Wang, Q., Multiple imputation and synthetic data with the R package NPBayesImputeCat, submitted. arXiv link
  7. Hu, J. and Savitsky, T. D., Bayesian data synthesis and disclosure risk quantification: an application to the Consumer Expenditure Surveys, submitted. arXiv link
  8. Drechsler, J. and Hu, J. (forthcoming), Synthesizing geocodes to facilitate access to detailed geographical information in large scale administrative data, Journal of Survey Statistics and Methodology. Open Access
  9. Hu, J., Savitsky, T. D. and Williams, M. R.(2020), Risk-weighted data synthesizers for microdata dissemination, Special Issue: A New Generation of Statisticians Tackles Data Privacy, CHANCE, 33(4), 29-36.
  10. Ros, K.*, Olsson, H.* and Hu, J. (2020), Two-phase data synthesis for income: an application to the NHIS, Privacy in Statistical Databases e-proceedings. link
  11. Hu, J. (2019), Bayesian estimation of attribute and identification disclosure risks in synthetic data, Transactions on Data Privacy, 12:1, 61-89.
  12. Hu, J. and Hoshino, N. (2018), The Quasi-Multinomial synthesizer for categorical data,  Privacy in Statistical Databases, Lecture Notes in Computer Science 11126 ed. J. Domingo-Ferrer and F. Montes, Springer, 75-91.
  13. Manrique-Vallier, D. and Hu, J. (2018), Bayesian non-parametric generation of synthetic multivariate categorical data in the presence of structural zeros, Journal of the Royal Statistical Society, Series A (Statistics in Society), 181(3), 635-647.
  14. Hu, J., Reiter, J.P. and Wang, Q. (2018), Dirichlet Process mixture models for modeling and generating synthetic versions of nested categorical data, Bayesian Analysis, 13(1), 183-200. link; see software page for NestedCategBayesImpute for method implementation.
  15. Hu, J. and Drechsler, J. (2015), Generating synthetic geocoding information for public release, In: S. A. Europäische Kommission (Hrsg.), NTTS – Conferences on New Techniques and Technologies for Statistics, 56-59.
  16. Hu, J., Reiter, J.P. and Wang, Q. (2014), Disclosure risk evaluation for fully synthetic categorical dataPrivacy in Statistical Databases, Lecture Notes in Computer Science 8744 ed. J. Domingo-Ferrer, Springer, 185-199.; see software page for NPBayesImputeCat for method implementation.
  17. Hu, J., Mitra, R. and Reiter, J.P. (2013), Are independent parameter draws necessary for multiple imputation? The American Statistician. 67(3), 143-149.
  18. Hu, J. and Reiter, J.P. (2013), Non-parametric Bayesian model for generating synthetic household dataJoint UNECE/Eurostat Work Session on Statistical Data Condentiality 2013.

* indicates an undergraduate student co-author