Data for Homework

The following data files are used in the module readings or homework problems.



Possible Sources of Portfolio Data

The following sources may be useful for locating data for your portfolio.

  • LTER Data Portal: A comprehensive search engine for data from all NSF Long-Term Ecological Research sites.
    • NTL-LTER: A variety of primarily aquatics-related datasets from the North Temperate Lakes Long-Term Ecological Research site in Northern Wisconsin.
  • DRYAD: An international repository of data underlying scientific and medical publications.
  • figshare: A repository for research output [Use search and then set “type” to “Dataset.”]
  • dataverse: A Harvard arhive to share research data from many fields.
  • Data is Plural Archive: A structured archive a data stories mentioned in the Data is Plural e-newsletter.
  • Christmas Bird Count: A large database of annual counts of many bird species at many locations for many years.
  • Journal of Fish and Wildlife Management: Most (all?) articles published in this journal contain the raw data as supplemental information that appears to be open-source. If you find data here that sounds interesting to you but you can not access then let me know.
  • Journal of Scientific Data: A searchable database of scientific data.
  • ORNL DAAC: A wide variety of “Earth data” at the Oak Ridge National Laboratoroy Distributed Active Archive Center.
  • U.S. Census data: Data from the U.S. Census Bureau.
  • areavibes.com: Find a wealth of information about each city that you select (you will have to drill down a bit to get the actual data).
  • GasBuddy.com: Search for current gas prices for any U.S. city.
  • USGS Water Data: A large database of water data from throughout the United States.
  • Bridge: An “ocean” of free marine education resources, including links to various NOAA databases.
  • Internet Crossroads in Social Science Data: An annotated list with links to over 825 data-related resources on the internet.
  • HealthData.gov: A large compilation of health-related data sets.
  • NOAA Climate Data: A large compilation of climate-related data.
  • Air Data: Air quality data collected by the EPA.
  • UNICEF: A large compilation of data related to women and children around the world.
  • Walmart sales data: A database of information about a sample of Walmart stores and their sales. [I can help you merge the various datasets available here.]
  • fivethirtyeight data: A list of data behind articles on fivethirtyeight.com.
  • Buzzfeed data: A list of data behind articles on buzzfeed.com.
  • socrata.com: A compendium of open-source clean data files.
  • kaggle.com: A list of datasets used in Kaggle data analysis competitions.
  • datahub.com: Lots of open-source datasets (will likely need to search to find something of interest).
  • sports-reference.com: A group of sites providing basic statistics and resources for sports fans.
  • MLB.com: Lots of data on major league baseball.
  • Forest Service Data Catalog: A list of data files from the USDA Forest Service.
  • Open Units: Open dataset containing units of alcohol in branded drinks in a variety of standard servings.
  • GDP by County: Gross domestic product data by U.S. county distributed by the Bureau of Economic Analysis (BEA).
  • UMESC Fisheries: A database of fisheries information provided by the Upper Midwest Environmental Science Center of the U.S. Geological Survey. This is largely related to data from the Mississippi R.
  • Incarceration Trends Dataset: County-level jail data (1970-) and prison data (1983-).
  • Monitoring the Future (MTF) Public-Use Datasets: A continuing study of the lifestyles of American youth since 1975.
  • Professional Disc Golf Association Standards Data: Data sets for approved disc golf discs and disc golf targes.
  • Bird Egg Shape: Data regarding the shape of bird eggs.