Please find a collection of useful information and links for Vassar students interested in DSS here!
DSS Mailing List
If you are interested in joining the DSS student community at Vassar and would like to get updates, event notifications, and information on data science related opportunities, please sign up for the DSS mailing list here.
DataFest
DataFest is a data analysis competition where teams of up to five students have a weekend to attack a large, complex, and surprise dataset. Work together with your team members and consult with mentors to find and communicate insights into these data. The teams that impress the judges the most will win prizes. Everyone else will have a great experience, learn some new skills, and have fun! Please find more information here: https://pages.vassar.edu/datafest/
Datasets
Without data, there would be no data science. While it is useful (and often necessary!) to collect data ourselves and while we must always ask ourselves where the data we are using comes from, working with existing data sets is an important aspect of any data scientist’s work. In addition, a good way to get comfortable with the tools and techniques of data science is to try them yourself, on data that interests you! Challenge yourself to answer important questions, keeping in mind what you have learned in class or through practice.
Below, you can find a list of dataset sources to play around with and use for your own work and projects. Remember to always check with your instructor before using one of these data sets for a class project or assignment.
- R Data Sources for Regression Analysis
- FiveThirtyEight data
- TidyTuesday
- World Health Organization
- The National Bureau of Economic Research
- International Monetary Fund
- General Social Survey
- United Nations Data
- United Nations Statistics Division
- U.K. Data
- U.S. Data
- U.S. Census Data
- European Statistics
- Statistics Canada
- Pew Research
- UNICEF
- CDC
- World Bank
- Election Studies
- IRIS (Repository for “research into languages, including first, second-, and beyond, and signed language learning, multilingualism, language education, language use, and language processing.”)
- Harvard Dataverse
- Redistricting Data Hub
- Integrated Public Use Microdata Series (IPUMS)
- Inter-university Consortium for Political and Social Research (ICPSR)
- National Health and Nutrition Examination Survey (NHANES)
- National Oceanic and Atmospheric Administration (NOAA)
- Environmental Protection Agency (EPA) Fuel Economy Testing
- CERN Open Data
- Opportunity Insight’s Data Library
- List of datasets (political science focused) from Dr. Coll (Assistant Professor at the College of Wooster
- USAID Development Data Library