My research interests lie in developing Bayesian statistical methodology for applied problems that intersect with the social sciences. In particular, I focus on Bayesian methodology on how to help statistical agencies to release their collected microdata from surveys to the public, in a useful and safe way. Through the synthetic data approach, data disseminators can generate simulated data from statistical models estimated from the confidential data, and release synthetic data to the public.
- Hu, J., Savitsky, T. D. and Williams, M. R., Private tabular survey data products through synthetic microdata generation, submitted.
- Hornby, R.* and Hu, J., Identification risk evaluation of continuous synthesized variables, submitted. arXiv link
- Hu, J., Savitsky, T. D. and Williams, M. R., Re-weighting of vector-weighted mechanisms for utility maximization under differential privacy, submitted. arXiv link
- Savitsky, T. D., Williams, M. R. and Hu, J., Bayesian pseudo posterior mechanism under differential privacy, submitted. arXiv link
- Hu, J., Savitsky, T. D. and Williams, M. R., Risk-efficient Bayesian pseudo posterior data synthesis for privacy protection, submitted. arXiv link
- Hu, J., Akande, O., and Wang, Q., Multiple imputation and synthetic data with the R package NPBayesImputeCat, submitted. arXiv link
- Hu, J. and Savitsky, T. D., Bayesian data synthesis and disclosure risk quantification: an application to the Consumer Expenditure Surveys, submitted. arXiv link
- Drechsler, J. and Hu, J. (forthcoming), Synthesizing geocodes to facilitate access to detailed geographical information in large scale administrative data, Journal of Survey Statistics and Methodology. Open Access
- Hu, J., Savitsky, T. D. and Williams, M. R.(2020), Risk-weighted data synthesizers for microdata dissemination, Special Issue: A New Generation of Statisticians Tackles Data Privacy, CHANCE, 33(4), 29-36.
- Ros, K.*, Olsson, H.* and Hu, J. (2020), Two-phase data synthesis for income: an application to the NHIS, Privacy in Statistical Databases e-proceedings. link
- Hu, J. (2019), Bayesian estimation of attribute and identification disclosure risks in synthetic data, Transactions on Data Privacy, 12:1, 61-89.
- Hu, J. and Hoshino, N. (2018), The Quasi-Multinomial synthesizer for categorical data, Privacy in Statistical Databases, Lecture Notes in Computer Science 11126 ed. J. Domingo-Ferrer and F. Montes, Springer, 75-91.
- Manrique-Vallier, D. and Hu, J. (2018), Bayesian non-parametric generation of synthetic multivariate categorical data in the presence of structural zeros, Journal of the Royal Statistical Society, Series A (Statistics in Society), 181(3), 635-647.
- Hu, J., Reiter, J.P. and Wang, Q. (2018), Dirichlet Process mixture models for modeling and generating synthetic versions of nested categorical data, Bayesian Analysis, 13(1), 183-200. link; see software page for NestedCategBayesImpute for method implementation.
- Hu, J. and Drechsler, J. (2015), Generating synthetic geocoding information for public release, In: S. A. Europäische Kommission (Hrsg.), NTTS – Conferences on New Techniques and Technologies for Statistics, 56-59.
- Hu, J., Reiter, J.P. and Wang, Q. (2014), Disclosure risk evaluation for fully synthetic categorical data, Privacy in Statistical Databases, Lecture Notes in Computer Science 8744 ed. J. Domingo-Ferrer, Springer, 185-199.; see software page for NPBayesImputeCat for method implementation.
- Hu, J., Mitra, R. and Reiter, J.P. (2013), Are independent parameter draws necessary for multiple imputation? The American Statistician. 67(3), 143-149.
- Hu, J. and Reiter, J.P. (2013), Non-parametric Bayesian model for generating synthetic household data, Joint UNECE/Eurostat Work Session on Statistical Data Condentiality 2013.
* indicates an undergraduate student co-author