{"id":303,"date":"2018-06-02T22:34:01","date_gmt":"2018-06-03T02:34:01","guid":{"rendered":"http:\/\/pages.vassar.edu\/jihu\/?page_id=303"},"modified":"2025-03-21T11:51:16","modified_gmt":"2025-03-21T15:51:16","slug":"scholarly-research","status":"publish","type":"page","link":"https:\/\/pages.vassar.edu\/jihu\/research\/scholarly-research\/","title":{"rendered":"Scholarly research"},"content":{"rendered":"<p>My research interests lie in developing Bayesian statistical methodology for applied problems that intersect with the social sciences.\u00a0In particular, I focus on Bayesian methodology on how to help statistical agencies to release their collected microdata from surveys to the public, in a useful and safe way. Through the synthetic data approach, data disseminators can generate simulated data from statistical models estimated from the confidential data, and release synthetic data to the public. I am also working on differential privacy approaches.<\/p>\n<p><b>Peer-reviewed publications:<\/b><\/p>\n<p>* indicates an undergraduate student co-author<\/p>\n<ol>\n<li>\u00a0<strong>Hu, J.<\/strong>, Williams, M. R. and Savitsky, T. D. (2025), Mechanisms for global differential privacy under Bayesian data synthesis, to appear in <em>Data Privacy special issue<\/em> at <em>Statistica Sinica<\/em>, 35, 563-584.<\/li>\n<li><strong>Hu, J. <\/strong>and Bowen, C. M. (2024), Advancing microdata privacy protection: a review of synthetic data methods, <em>WIREs Computational Statistics<\/em>, e1636. doi:10.1002\/wics.1636.<\/li>\n<li><strong>Hu, J.<\/strong> and Savitsky, T. D. (2023), Bayesian data synthesis and disclosure risk quantification: an application to the Consumer Expenditure Surveys, <i>Transactions on Data Privacy<\/i>, 16:2, 83-121.<\/li>\n<li>Guo, S.* and\u00a0<strong>Hu, J.\u00a0<\/strong>(2023), Data privacy protection and utility preservation through Bayesian data synthesis: a case study on Airbnb listings, <em>The American Statistician<\/em>, 77(2), 192-200.\u00a0<a href=\"https:\/\/www.tandfonline.com\/eprint\/FVHHWUWVEGGMZ6GGNDRD\/full?target=10.1080\/00031305.2022.2077440\" target=\"_blank\" rel=\"noopener\">link to the published paper<\/a><\/li>\n<li>Schneider, M. J,\u00a0<strong>Hu, J.<\/strong>, Mankad, S. and Bale, C. D. (2023), Protecting the anonymity of online users through Bayesian data synthesis, <em>Expert Systems With Applications<\/em>, 216, 119409.\u00a0<a href=\"https:\/\/www.researchgate.net\/publication\/350286375_Protecting_the_Anonymity_of_Online_Users_through_Bayesian_Data_Synthesis\" target=\"_blank\" rel=\"noopener\">ResearchGate link<\/a><\/li>\n<li><strong>Hu, J.<\/strong>, Savitsky, T. D. and Williams, M. R. (2022), Risk-efficient Bayesian pseudo posterior data synthesis for privacy protection,\u00a0<i>Journal of Survey Statistics and Methodology, <\/i>10(5), 1370-1399.\u00a0<a href=\"https:\/\/academic.oup.com\/jssam\/advance-article\/doi\/10.1093\/jssam\/smab013\/6252407?guestAccessKey=b1580a7c-9893-4b9e-a772-9b06101d401a\" target=\"_blank\" rel=\"noopener\">link to the published paper<\/a><\/li>\n<li>Cao, Y.* and\u00a0<strong>Hu, J.\u00a0<\/strong>(2022), <a href=\"http:\/\/pages.vassar.edu\/jihu\/files\/2022\/09\/PSD2022_CaoHu_website.pdf\">Privacy protection for youth risk behavior using Bayesian data synthesis: a case study to the YRBS<\/a>, <em>Privacy in Statistical Databases e-proceedings.<\/em><\/li>\n<li><strong>Hu, J.<\/strong>, Drechsler, J. and Kim, H. J. (2022), Accuracy gains from privacy amplification through sampling for differential privacy,\u00a0<em>Journal of Survey Statistics and Methodology, Special Issue: Privacy, Confidentiality, and Disclosure Protection<\/em>, 10(3), 688-719. <a href=\"https:\/\/academic.oup.com\/jssam\/advance-article\/doi\/10.1093\/jssam\/smac012\/6594323?guestAccessKey=cdf6bf7c-d841-4c0e-840f-8dfbcb3bffc6\" target=\"_blank\" rel=\"noopener\">link to the published paper<\/a><\/li>\n<li><strong>Hu, J.<\/strong>, Savitsky, T. D. and Williams, M. R. (2022), Private tabular survey data products through synthetic microdata generation,\u00a0<em>Journal of Survey Statistics and Methodology, Special Issue: Privacy, Confidentiality, and Disclosure Protection<\/em>, 10(3), 720-752.\u00a0<a href=\"https:\/\/academic.oup.com\/jssam\/advance-article\/doi\/10.1093\/jssam\/smac001\/6541788?guestAccessKey=56e59c6e-3ce5-48d7-bbe4-69594e519d17\" target=\"_blank\" rel=\"noopener\">link to the published paper<\/a><\/li>\n<li>Savitsky, T. D., Williams, M. R. and\u00a0<strong>Hu, J. <\/strong>(2022), Bayesian pseudo posterior mechanism under asymptotic differential privacy, <em>Journal of Machine Learning Research,\u00a0<\/em>23(55), 1\u221237.\u00a0<a href=\"https:\/\/jmlr.org\/papers\/v23\/21-0936.html\" target=\"_blank\" rel=\"noopener\">Open Access<\/a><\/li>\n<li><strong>Hu, J.<\/strong>,\u00a0Akande, O. and Wang, Q. (2021), Multiple imputation and synthetic data generation with the R package NPBayesImputeCat, <em>The R Journal<\/em>,\u00a013:2, 90-110. <a href=\"https:\/\/journal.r-project.org\/archive\/2021\/RJ-2021-080\/index.html\" target=\"_blank\" rel=\"noopener\">Open Access<\/a><\/li>\n<li>Drechsler, J. and\u00a0<strong>Hu, J. <\/strong>(2021), Synthesizing geocodes to facilitate access to detailed geographical information in large scale administrative data,<i>\u00a0Journal of Survey Statistics and Methodology,<\/i>\u00a09(3), 523-548.\u00a0<a href=\"https:\/\/academic.oup.com\/jssam\/advance-article\/doi\/10.1093\/jssam\/smaa035\/6049067?login=true\" target=\"_blank\" rel=\"noopener\">Open Access<\/a><\/li>\n<li>Hornby, R.* and\u00a0<strong>Hu, J.\u00a0<\/strong>(2021),\u00a0Identification risks evaluation of partially synthetic data with the IdentificationRiskCalculation R package,\u00a0<em>Transactions on Data Privacy,\u00a0<\/em>14:1, 37-52.\u00a0<a href=\"http:\/\/www.tdp.cat\/issues21\/abs.a425a21.php\" target=\"_blank\" rel=\"noopener\">Open Access<\/a><\/li>\n<li><strong>Hu, J.<\/strong>, Savitsky, T. D. and Williams, M. R. (2020), Risk-weighted data synthesizers for microdata dissemination,\u00a0<em>Special Issue: A New Generation of Statisticians Tackles Data Privacy, CHANCE, <\/em>33(4), 29-36. <a href=\"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/09332480.2020.1847957\" target=\"_blank\" rel=\"noopener\">Open Access<\/a><\/li>\n<li>Ros, K.*, Olsson, H.* and <strong>Hu, J.<\/strong> (2020),\u00a0Two-phase data synthesis for income: an application to the NHIS, <em>Privacy in Statistical Databases e-proceedings<\/em>. <a href=\"https:\/\/arxiv.org\/abs\/2006.01686\" target=\"_blank\" rel=\"noopener\">link<\/a><\/li>\n<li><strong>Hu, J. <\/strong>(2019), Bayesian estimation of attribute and identification disclosure risks in synthetic data,\u00a0<em>Transactions on Data Privacy,\u00a0<\/em>12:1, 61-89. <a href=\"http:\/\/www.tdp.cat\/issues16\/tdp.a313a18.pdf\" target=\"_blank\" rel=\"noopener\">Open Access<\/a><\/li>\n<li><strong>Hu, J.\u00a0<\/strong>and\u00a0Hoshino, N. (2018), <a href=\"http:\/\/pages.vassar.edu\/jihu\/files\/2018\/09\/The-Quasi-Multinomial-Synthesizer-for-Categorical-Data_HuHoshino.pdf\" target=\"_blank\" rel=\"noopener\">The Quasi-Multinomial synthesizer for categorical data<\/a>,\u00a0\u00a0<i>Privacy in Statistical Databases, Lecture Notes in Computer Science 11126<\/i>\u00a0ed. J. Domingo-Ferrer and F. Montes, Springer, 75-91.<\/li>\n<li>Manrique-Vallier, D. and\u00a0<strong>Hu, J<\/strong>. (2018),\u00a0Bayesian non-parametric generation of synthetic multivariate categorical data in the presence of structural zeros,\u00a0<i>Journal of the Royal Statistical Society, Series A (Statistics in Society), 181(3), 635-647<\/i><em>.<\/em><\/li>\n<li><strong>Hu, J.<\/strong>, Reiter, J. P. and Wang, Q. (2018), Dirichlet Process mixture models for modeling and generating synthetic versions of nested categorical data,\u00a0<em>Bayesian Analysis<\/em>, 13(1), 183-200.\u00a0<a href=\"https:\/\/projecteuclid.org\/euclid.ba\/1485227030\" target=\"_blank\" rel=\"noopener\">Open Access<\/a>; see software page for\u00a0<a href=\"https:\/\/cran.r-project.org\/web\/packages\/NestedCategBayesImpute\/index.html\" target=\"_blank\" rel=\"noopener\">NestedCategBayesImpute<\/a>\u00a0for method implementation.<\/li>\n<li><strong>Hu, J.\u00a0<\/strong>and Drechsler, J. (2015),\u00a0Generating synthetic geocoding information for public release,\u00a0<em>In: S. A. Europ\u00e4ische Kommission (Hrsg.), NTTS &#8211; Conferences on New Techniques and Technologies for Statistics,\u00a0<\/em>56-59.<\/li>\n<li><strong>Hu, J.<\/strong>, Reiter, J. P. and Wang, Q. (2014), <a href=\"http:\/\/pages.vassar.edu\/jihu\/files\/2018\/09\/Disclosure-Risk-Evaluation-for-Fully-Synthetic-Categorical-Data.pdf\" target=\"_blank\" rel=\"noopener\">Disclosure risk evaluation for fully synthetic categorical data<\/a>,\u00a0<i>Privacy in Statistical Databases, Lecture Notes in Computer Science 8744<\/i>\u00a0ed. J. Domingo-Ferrer, Springer, 185-199.; see software page for\u00a0<a href=\"https:\/\/cran.r-project.org\/web\/packages\/NPBayesImputeCat\/index.html\" target=\"_blank\" rel=\"noopener\">NPBayesImputeCat<\/a>\u00a0for method implementation.<\/li>\n<li><strong>Hu, J.<\/strong>, Mitra, R. and Reiter, J. P. (2013), Are independent parameter draws necessary for multiple imputation?\u00a0<i>The American Statistician<\/i>. 67(3), 143-149.<\/li>\n<li><strong>Hu, J.\u00a0<\/strong>and Reiter, J. P. (2013),\u00a0<a href=\"http:\/\/www.unece.org\/fileadmin\/DAM\/stats\/documents\/ece\/ces\/ge.46\/2013\/Topic_2_Duke_University.pdf\" target=\"_blank\" rel=\"noopener\">Non-parametric Bayesian model for generating synthetic household data<\/a>,\u00a0<em>Joint UNECE\/Eurostat Work Session on Statistical Data Condentiality 2013.<\/em><\/li>\n<\/ol>\n<hr \/>\n<p><b>Technical reports:<\/b><\/p>\n<p>* indicates an undergraduate student co-author<\/p>\n<ol>\n<li>Hornby, R.* and\u00a0<strong>Hu, J.<\/strong>, Bayesian estimation of attribute disclosure risks in synthetic data with the AttributeRiskCalculation R package.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2103.09805\" target=\"_blank\" rel=\"noopener\">arXiv link<\/a><\/li>\n<\/ol>\n<hr \/>\n<p><b>Work in progress:<\/b><\/p>\n<p>* indicates an undergraduate student co-author<\/p>\n<ol>\n<li>Savitsky, T. D.,\u00a0<strong>Hu, J.<\/strong> and Williams, M. R., Maximizing utility for vector-weighted pseudo posterior mechanisms under differential privacy, submitted. <a href=\"https:\/\/arxiv.org\/abs\/2006.01230\" target=\"_blank\" rel=\"noopener\">arXiv link<\/a><\/li>\n<li>Immerwahr, S., <strong>Hu, J.<\/strong>, Deng, W. Q., Bholanath, T., Lundy De La Cruz, N. and He, F., \u201cDisclosure risk evaluation and mitigation solutions for health survey microdata files: application to the New York City Community Health Survey\u201d, submitted.\n<div class=\"page\" title=\"Page 5\">\n<div class=\"layoutArea\">\n<div class=\"column\"><\/div>\n<\/div>\n<\/div>\n<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>My research interests lie in developing Bayesian statistical methodology for applied problems that intersect with the social sciences.\u00a0In particular, I focus on Bayesian methodology on how to help statistical agencies to release their collected microdata from surveys to the public, in a useful and safe way. Through the synthetic data approach, data disseminators can generate &hellip; <a href=\"https:\/\/pages.vassar.edu\/jihu\/research\/scholarly-research\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Scholarly research<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":5309,"featured_media":0,"parent":22,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-303","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/pages\/303","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/users\/5309"}],"replies":[{"embeddable":true,"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/comments?post=303"}],"version-history":[{"count":125,"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/pages\/303\/revisions"}],"predecessor-version":[{"id":1007,"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/pages\/303\/revisions\/1007"}],"up":[{"embeddable":true,"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/pages\/22"}],"wp:attachment":[{"href":"https:\/\/pages.vassar.edu\/jihu\/wp-json\/wp\/v2\/media?parent=303"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}