Please find a collection of useful information and links for students interested in DSS here!
DSS Student Opportunities Mailing List
Subscribe to our DSS Student Opportunities Mailing List to hear about data science-related summer & career opportunities and events! Note: This mailing list is open to everyone, though is designed primarily for students. It differs from our general DSS Mailing List, which is to receive our Vassar DSS newsletters and Vassar DSS-sponsored events.
DataFest
DataFest is an annual data analysis competition in the spring where teams of up to five students have a weekend to attack a large, complex, and surprise dataset. Work together with your team members and consult with mentors to find and communicate insights into these data. The teams that impress the judges the most will win prizes. Everyone else will have a great experience, learn some new skills, and have fun! You can find more information about DataFest here: https://pages.vassar.edu/datafest/
Datasets
Without data, there would be no data science. While it is useful (and often necessary!) to collect data ourselves and while we must always ask ourselves where the data we are using comes from, working with existing data sets is an important aspect of any data scientist’s work. In addition, a good way to get comfortable with the tools and techniques of data science is to try them yourself, on data that interests you! Challenge yourself to answer important questions, keeping in mind what you have learned in class or through practice.
Below, you can find a list of dataset sources to play around with and use for your own work and projects. Remember to always check with your instructor before using one of these data sets for a class project or assignment.
- R Data Sources for Regression Analysis
- IPUMS Data
- FiveThirtyEight data
- TidyTuesday
- World Health Organization
- The National Bureau of Economic Research
- International Monetary Fund
- General Social Survey
- United Nations Data
- United Nations Statistics Division
- U.K. Data
- U.S. Data
- U.S. Census Data
- European Statistics
- Statistics Canada
- Pew Research
- UNICEF
- CDC
- World Bank
- Election Studies
- IRIS (Repository for “research into languages, including first, second-, and beyond, and signed language learning, multilingualism, language education, language use, and language processing.”)
- Harvard Dataverse
- Redistricting Data Hub
- Integrated Public Use Microdata Series (IPUMS)
- Inter-university Consortium for Political and Social Research (ICPSR)
- National Health and Nutrition Examination Survey (NHANES)
- National Oceanic and Atmospheric Administration (NOAA)
- Environmental Protection Agency (EPA) Fuel Economy Testing
- CERN Open Data
- Opportunity Insight’s Data Library
- List of datasets (political science focused) from Dr. Coll (Assistant Professor at the College of Wooster
- USAID Development Data Library