diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 23b0500d..00000000 --- a/.travis.yml +++ /dev/null @@ -1,8 +0,0 @@ -language: ruby -rvm: - - 2.2 -before_script: - - gem install awesome_bot -script: - - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,datamob.org,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov \ No newline at end of file diff --git a/README.rst b/README.rst index ee795d94..473c8eb5 100644 --- a/README.rst +++ b/README.rst @@ -1,506 +1,1556 @@ Awesome Public Datasets ======================= + .. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg :alt: Awesome :target: https://github.com/sindresorhus/awesome -.. image:: https://travis-ci.org/caesar0301/awesome-public-datasets.svg - :target: https://travis-ci.org/caesar0301/awesome-public-datasets - -`This list of public data sources `_ -are collected and tidied from blogs, answers, and user reponses. -Most of the data sets listed below are free, however, some are not. -Other amazingly awesome lists can be found in the -`awesome-awesomeness `_ and -`sindresorhus's awesome `_ list. - -* `Visit our Google Group on APD `_ -Agriculture ------------- -* `U.S. Department of Agriculture's PLANTS Database `_ +.. |OK_ICON| image:: https://raw.githubusercontent.com/awesomedata/apd-core/master/deploy/ok-24.png +.. |FIXME_ICON| image:: https://raw.githubusercontent.com/awesomedata/apd-core/master/deploy/fixme-24.png -Biology -------- +**NOTICE**: This repo is automatically generated by `apd-core `_. +Please **DO NOT** modify this file directly. We have provided +`a new way `_ +to contribute to Awesome Public Datasets. `Join `_ the `slack community `_ for more communication. -* `1000 Genomes `_ -* `American Gut (Microbiome Project) `_ -* `Collaborative Research in Computational Neuroscience (CRCNS) `_ -* `EBI ArrayExrepss `_ -* `ENCODE project `_ -* `Ensembl Genomes `_ -* `Gene Expression Omnibus (GEO) `_ -* `Gene Ontology (GO) `_ -* `Global Biotic Interations (GloBI) `_ -* `Human Microbiome Project (HMP) `_ -* `ICOS PSP Benchmark `_ -* `MIT Cancer Genomics Data `_ -* `NIH Microarray data `_ or `FTP `_ -* `OpenSNP genotypes data `_ -* `Pathguid: Protein-Protein Interactions Catalog `_ -* `Protein Data Bank `_ -* `PubChem Project `_ -* `PubGene (now Coremine Medical) `_ -* `Sequence Read Archive(SRA) `_ -* `Stanford Microarray Data `_ -* `The Catalogue of Life `_ -* `The Personal Genome Project `_ or `PGP `_ -* `UCSC Public Data `_ -* `UniGene `_ - - -Climate/Weather ---------------- - -* `Australian Weather `_ -* `Brazilian Weather - Historical data (In Portuguese) `_ -* `Canadian Meteorological Centre `_ -* `Climate Data from UEA (updated monthly) `_ -* `Global Climate Data Since 1929 `_ -* `NASA Global Imagery Browse Services `_ -* `NOAA Bering Sea Climate `_ -* `NOAA Climate Datasets `_ -* `NOAA Realtime Weather Models `_ -* `The World Bank Open Data Resources for Climate Change `_ -* `UEA Climatic Research Unit `_ -* `WorldClim - Global Climate Data `_ -* `WU Historical Weather Worldwide `_ - - -Complex Networks ----------------- +* |OK_ICON| I am well. +* |FIXME_ICON| Please fix me. -* `CrossRef DOI URLs `_ -* `DBLP Citation dataset `_ -* `NBER Patent Citations `_ -* `NIST complex networks data collection `_ -* `Protein-protein interaction network `_ -* `PyPI and Maven Dependency Network `_ -* `Scopus Citation Database `_ -* `Small Network Data `_ -* `Stanford GraphBase (Steven Skiena) `_ -* `Stanford Large Network Dataset Collection `_ -* `The Koblenz Network Collection `_ -* `The Laboratory for Web Algorithmics (UNIMI) `_ -* `The Nexus Network Repository `_ -* `UCI Network Data Repository `_ -* `UFL sparse matrix collection `_ -* `WSU Graph Database `_ - - -Computer Networks ------------------ +`This list of a topic-centric public data sources `_ +in high quality. They are collected and tidied from blogs, answers, and user responses. +Most of the data sets listed below are free, however, some are not. +Other amazingly awesome lists can be found in `sindresorhus's awesome `_ list. -* `3.5B Web Pages from CommonCraw 2012 `_ -* `53.5B Web clicks of 100K users in Indiana Univ. `_ -* `CAIDA Internet Datasets `_ -* `ClueWeb09 - 1B web pages `_ -* `ClueWeb12 - 733M web pages `_ -* `CommonCrawl Web Data over 7 years `_ -* `CRAWDAD Wireless datasets from Dartmouth Univ. `_ -* `Criteo click-through data `_ -* `Open Mobile Data by MobiPerf `_ -* `UCSD Network Telescope, IPv4 /8 net `_ +.. contents:: **Table of Contents** -Contextual Data + +Agriculture +----------- + +* |OK_ICON| `The global dataset of historical yields for major crops 1981–2016 - The [...] `_ + +* |OK_ICON| `Hyperspectral benchmark dataset on soil moisture - This dataset was [...] `_ + +* |OK_ICON| `Lemons quality control dataset - Lemon dataset has been prepared to [...] `_ + +* |OK_ICON| `Optimized Soil Adjusted Vegetation Index - The IDB is a tool for working [...] `_ + +* |OK_ICON| `U.S. Department of Agriculture's Nutrient Database `_ + +* |OK_ICON| `U.S. Department of Agriculture's PLANTS Database - The Complete PLANTS [...] `_ + +Biology +------- + +* |FIXME_ICON| `1000 Genomes - The 1000 Genomes Project ran between 2008 and 2015, [...] `_ [`fixme `_] + +* |OK_ICON| `American Gut (Microbiome Project) - The American Gut project is the [...] `_ + +* |OK_ICON| `Broad Bioimage Benchmark Collection (BBBC) - The Broad Bioimage Benchmark [...] `_ + +* |OK_ICON| `Broad Cancer Cell Line Encyclopedia (CCLE) `_ + +* |OK_ICON| `Cell Image Library - This library is a public and easily accessible [...] `_ + +* |OK_ICON| `Complete Genomics Public Data - A diverse data set of whole human genomes [...] `_ + +* |OK_ICON| `EBI ArrayExpress - ArrayExpress Archive of Functional Genomics Data [...] `_ + +* |OK_ICON| `EBI Protein Data Bank in Europe - The Electron Microscopy Data Bank [...] `_ + +* |OK_ICON| `ENCODE project - The Encyclopedia of DNA Elements (ENCODE) Consortium is [...] `_ + +* |OK_ICON| `Electron Microscopy Pilot Image Archive (EMPIAR) - EMPIAR, the Electron [...] `_ + +* |FIXME_ICON| `Ensembl Genomes `_ [`fixme `_] + +* |OK_ICON| `Gene Expression Omnibus (GEO) - GEO is a public functional genomics data [...] `_ + +* |OK_ICON| `Gene Ontology (GO) - GO annotation files `_ + +* |OK_ICON| `Global Biotic Interactions (GloBI) `_ + +* |OK_ICON| `Harvard Medical School (HMS) LINCS Project - The Harvard Medical School [...] `_ + +* |OK_ICON| `Human Genome Diversity Project - A group of scientists at Stanford [...] `_ + +* |OK_ICON| `Human Microbiome Project (HMP) - The HMP sequenced over 2000 reference [...] `_ + +* |OK_ICON| `ICOS PSP Benchmark - The ICOS PSP benchmarks repository contains an [...] `_ + +* |OK_ICON| `International HapMap Project `_ + +* |FIXME_ICON| `Journal of Cell Biology DataViewer `_ [`fixme `_] + +* |OK_ICON| `KEGG - KEGG is a database resource for understanding high-level functions [...] `_ + +* |OK_ICON| `MIT Cancer Genomics Data `_ + +* |OK_ICON| `NCBI Proteins `_ + +* |OK_ICON| `NCBI Taxonomy - The NCBI Taxonomy database is a curated set of names and [...] `_ + +* |OK_ICON| `NCI Genomic Data Commons - The GDC Data Portal is a robust data-driven [...] `_ + +* |OK_ICON| `NIH Microarray data `_ + +* |OK_ICON| `OpenSNP genotypes data - openSNP allows customers of direct-to-customer [...] `_ + +* |OK_ICON| `Palmer Penguins - The goal of palmerpenguins is to provide a great [...] `_ + +* |OK_ICON| `Pathguid - Protein-Protein Interactions Catalog `_ + +* |OK_ICON| `Protein Data Bank - This resource is powered by the Protein Data Bank [...] `_ + +* |OK_ICON| `Psychiatric Genomics Consortium - The purpose of the Psychiatric Genomics [...] `_ + +* |OK_ICON| `PubChem Project - PubChem is the world's largest collection of freely [...] `_ + +* |OK_ICON| `PubGene (now Coremine Medical) - COREMINE™ is a family of tools developed [...] `_ + +* |OK_ICON| `Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) - COSMIC, the [...] `_ + +* |OK_ICON| `Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) `_ + +* |OK_ICON| `Sequence Read Archive(SRA) - The Sequence Read Archive (SRA) stores raw [...] `_ + +* |OK_ICON| `Stanford Microarray Data `_ + +* |OK_ICON| `Stowers Institute Original Data Repository `_ + +* |OK_ICON| `Systems Science of Biological Dynamics (SSBD) Database - Systems Science [...] `_ + +* |OK_ICON| `The Cancer Genome Atlas (TCGA), available via Broad GDAC `_ + +* |OK_ICON| `The Catalogue of Life - The Catalogue of Life is a quality-assured [...] `_ + +* |OK_ICON| `The Personal Genome Project - The Personal Genome Project, initiated in [...] `_ + +* |OK_ICON| `UCSC Public Data `_ + +* |FIXME_ICON| `UniGene `_ [`fixme `_] + +* |OK_ICON| `Universal Protein Resource (UnitProt) - The Universal Protein Resource [...] `_ + +* |OK_ICON| `Rfam - The Rfam database is a collection of RNA families, each [...] `_ + +Climate+Weather --------------- - -* `Context-aware data sets from five domains `_ or `GitHub `_ - - -Data Challenges + +* |OK_ICON| `Actuaries Climate Index `_ + +* |OK_ICON| `Australian Weather `_ + +* |OK_ICON| `Aviation Weather Center - Consistent, timely and accurate weather [...] `_ + +* |OK_ICON| `Brazilian Weather - Historical data (In Portuguese) - Data related to [...] `_ + +* |OK_ICON| `Canadian Meteorological Centre `_ + +* |OK_ICON| `Climate Data from UEA (updated monthly) `_ + +* |OK_ICON| `Dutch Weather - The KNMI Data Center (KDC) portal provides access to KNMI [...] `_ + +* |OK_ICON| `European Climate Assessment & Dataset `_ + +* |OK_ICON| `Global Climate Data Since 1929 `_ + +* |OK_ICON| `Charting The Global Climate Change News Narrative 2009-2020 - These four [...] `_ + +* |OK_ICON| `NASA Global Imagery Browse Services `_ + +* |OK_ICON| `NOAA Bering Sea Climate `_ + +* |OK_ICON| `NOAA Climate Datasets `_ + +* |OK_ICON| `NOAA Realtime Weather Models `_ + +* |OK_ICON| `NOAA SURFRAD Meteorology and Radiation Datasets `_ + +* |OK_ICON| `The World Bank Open Data Resources for Climate Change `_ + +* |OK_ICON| `UEA Climatic Research Unit `_ + +* |OK_ICON| `WU Historical Weather Worldwide `_ + +* |OK_ICON| `Wahington Post Climate Change - To analyze warming temperatures in the [...] `_ + +* |OK_ICON| `WorldClim - Global Climate Data `_ + +ComplexNetworks --------------- - -* `Challenges in Machine Learning `_ -* `CrowdANALYTIX dataX `_ -* `D4D Challenge of Orange `_ -* `DrivenData Competitions for Social Good `_ -* `ICWSM Data Challenge (since 2009) `_ -* `Kaggle Competition Data `_ -* `KDD Cup by Tencent 2012 `_ -* `Localytics Data Visualization Challenge `_ -* `Netflix Prize `_ -* `Space Apps Challenge `_ -* `Telecom Italia Big Data Challenge `_ -* `Yelp Dataset Challenge `_ - - + +* |OK_ICON| `AMiner Citation Network Dataset `_ + +* |OK_ICON| `CrossRef DOI URLs `_ + +* |OK_ICON| `DBLP Citation dataset `_ + +* |OK_ICON| `DIMACS Road Networks Collection `_ + +* |OK_ICON| `NBER Patent Citations `_ + +* |OK_ICON| `NIST complex networks data collection `_ + +* |OK_ICON| `Network Repository with Interactive Exploratory Analysis Tools `_ + +* |OK_ICON| `Protein-protein interaction network `_ + +* |OK_ICON| `PyPI and Maven Dependency Network `_ + +* |OK_ICON| `Scopus Citation Database `_ + +* |OK_ICON| `Small Network Data `_ + +* |OK_ICON| `Stanford GraphBase `_ + +* |OK_ICON| `Stanford Large Network Dataset Collection `_ + +* |FIXME_ICON| `Stanford Longitudinal Network Data Sources `_ [`fixme `_] + +* |FIXME_ICON| `The Koblenz Network Collection `_ [`fixme `_] + +* |OK_ICON| `The Laboratory for Web Algorithmics (UNIMI) `_ + +* |OK_ICON| `UCI Network Data Repository `_ + +* |OK_ICON| `UFL sparse matrix collection `_ + +* |FIXME_ICON| `WSU Graph Database `_ [`fixme `_] + +* |OK_ICON| `Community Resource for Archiving Wireless Data At Dartmouth - Contains [...] `_ + +ComputerNetworks +---------------- + +* |OK_ICON| `3.5B Web Pages from CommonCrawl 2012 `_ + +* |OK_ICON| `53.5B Web clicks of 100K users in Indiana Univ. `_ + +* |OK_ICON| `CAIDA Internet Datasets `_ + +* |FIXME_ICON| `CRAWDAD Wireless datasets from Dartmouth Univ. `_ [`fixme `_] + +* |OK_ICON| `ClueWeb09 - 1B web pages `_ + +* |OK_ICON| `ClueWeb12 - 733M web pages `_ + +* |OK_ICON| `CommonCrawl Web Data over 7 years `_ + +* |OK_ICON| `Criteo click-through data `_ + +* |OK_ICON| `Internet-Wide Scan Data Repository `_ + +* |OK_ICON| `MIRAGE-2019 - MIRAGE-2019 is a human-generated dataset for mobile traffic [...] `_ + +* |OK_ICON| `OONI: Open Observatory of Network Interference - Internet censorship data `_ + +* |OK_ICON| `Open Mobile Data by MobiPerf `_ + +* |OK_ICON| `The Peer-to-Peer Trace Archive - Real-world measurements play a key role [...] `_ + +* |OK_ICON| `Rapid7 Sonar Internet Scans `_ + +* |OK_ICON| `UCSD Network Telescope, IPv4 /8 net `_ + +DataChallenges +-------------- + +* |OK_ICON| `Bruteforce Database `_ + +* |OK_ICON| `Challenges in Machine Learning `_ + +* |FIXME_ICON| `CrowdANALYTIX dataX `_ [`fixme `_] + +* |FIXME_ICON| `D4D Challenge of Orange `_ [`fixme `_] + +* |OK_ICON| `DrivenData Competitions for Social Good `_ + +* |OK_ICON| `ICWSM Data Challenge (since 2009) `_ + +* |OK_ICON| `KDD Cup by Tencent 2012 `_ + +* |OK_ICON| `Kaggle Competition Data `_ + +* |OK_ICON| `Localytics Data Visualization Challenge `_ + +* |OK_ICON| `Netflix Prize `_ + +* |OK_ICON| `Space Apps Challenge `_ + +* |OK_ICON| `Telecom Italia Big Data Challenge `_ + +* |OK_ICON| `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ + +* |OK_ICON| `TunedIT - Data mining & machine learning data sets, algorithms, challenges `_ + +* |FIXME_ICON| `Yelp Dataset Challenge `_ [`fixme `_] + +EarthScience +------------ + +* |OK_ICON| `38-Cloud (Cloud Detection) - Contains 38 Landsat 8 scene images and their [...] `_ + +* |OK_ICON| `AQUASTAT - Global water resources and uses `_ + +* |OK_ICON| `BODC - marine data of ~22K vars `_ + +* |OK_ICON| `EOSDIS - NASA's earth observing system data `_ + +* |OK_ICON| `Earth Models `_ + +* |OK_ICON| `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ + +* |OK_ICON| `Marinexplore - Open Oceanographic Data `_ + +* |OK_ICON| `Alabama Real-Time Coastal Observing System `_ + +* |OK_ICON| `National Estuarine Research Reserves System-Wide Monitoring Program - [...] `_ + +* |OK_ICON| `Oil and Gas Authority Open Data - The dataset covers 12,500 offshore [...] `_ + +* |OK_ICON| `Smithsonian Institution Global Volcano and Eruption Database `_ + +* |OK_ICON| `USGS Earthquake Archives `_ + Economics --------- - -* `American Economic Ass (AEA) `_ -* `EconData from UMD `_ -* `Internet Product Code Database `_ - - + +* |OK_ICON| `American Economic Association (AEA) `_ + +* |OK_ICON| `EconData from UMD `_ + +* |OK_ICON| `Economic Freedom of the World Data `_ + +* |OK_ICON| `Historical MacroEconomic Statistics `_ + +* |OK_ICON| `INFORUM - Interindustry Forecasting at the University of Maryland `_ + +* |OK_ICON| `DBnomics – the world's economic database - Aggregates hundreds of [...] `_ + +* |OK_ICON| `International Trade Statistics `_ + +* |OK_ICON| `Internet Product Code Database `_ + +* |OK_ICON| `Joint External Debt Data Hub `_ + +* |OK_ICON| `Jon Haveman International Trade Data Links `_ + +* |OK_ICON| `Long-Term Productivity Database - The Long-Term Productivity database was [...] `_ + +* |OK_ICON| `OpenCorporates Database of Companies in the World `_ + +* |OK_ICON| `Our World in Data `_ + +* |FIXME_ICON| `SciencesPo World Trade Gravity Datasets `_ [`fixme `_] + +* |OK_ICON| `The Atlas of Economic Complexity `_ + +* |OK_ICON| `The Center for International Data `_ + +* |OK_ICON| `The Observatory of Economic Complexity `_ + +* |FIXME_ICON| `UN Commodity Trade Statistics `_ [`fixme `_] + +* |OK_ICON| `UN Human Development Reports `_ + +Education +--------- + +* |OK_ICON| `College Scorecard Data `_ + +* |OK_ICON| `New York State Education Department Data - The New York State Education [...] `_ + +* |OK_ICON| `Student Data from Free Code Camp `_ + Energy ------ - -* `AMPds `_ -* `BLUEd `_ -* `COMBED `_ -* `Dataport `_ -* `ECO `_ -* `EIA `_ -* `HFED `_ -* `iAWE `_ -* `Plaid `_ -* `REDD `_ -* `UK-Dale `_ - - + +* |OK_ICON| `AMPds - The Almanac of Minutely Power dataset `_ + +* |OK_ICON| `BLUEd - Building-Level fUlly labeled Electricity Disaggregation dataset `_ + +* |OK_ICON| `COMBED `_ + +* |OK_ICON| `DEL - Domestic Electrical Load study datsets for South Africa (1994 - 2014) `_ + +* |OK_ICON| `ECO - The ECO data set is a comprehensive data set for non-intrusive load [...] `_ + +* |OK_ICON| `EIA `_ + +* |OK_ICON| `Global Power Plant Database - The Global Power Plant Database is a [...] `_ + +* |OK_ICON| `HES - Household Electricity Study, UK `_ + +* |OK_ICON| `HFED `_ + +* |OK_ICON| `PEM1 - Proton Exchange Membrane (PEM) Fuel Cell Dataset `_ + +* |OK_ICON| `PLAID - The Plug Load Appliance Identification Dataset `_ + +* |OK_ICON| `The Public Utility Data Liberation Project (PUDL) - PUDL makes US energy [...] `_ + +* |OK_ICON| `REDD `_ + +* |OK_ICON| `SYND - A synthetic energy dataset for non-intrusive load monitoring - [...] `_ + +* |OK_ICON| `Smart Meter Data Portal - The Smart Meter Data Portal is part of the [...] `_ + +* |OK_ICON| `Tracebase `_ + +* |OK_ICON| `Ukraine Energy Centre Datasets `_ + +* |OK_ICON| `UK-DALE - UK Domestic Appliance-Level Electricity `_ + +* |OK_ICON| `WHITED `_ + +* |OK_ICON| `iAWE `_ + Finance ------- - -* `CBOE Futures Exchange `_ -* `Google Finance `_ -* `Google Trends `_ -* `NASDAQ `_ -* `OANDA `_ -* `OSU Financial data `_ -* `Quandl `_ -* `St Louis Federal `_ -* `Yahoo Finance `_ - -Geology -------- - -* `Smithsonian Institution Global Volcano and Eruption Database `_ -* `USGS Earthquake Archives `_ - - -GeoSpace/GIS ------------- - -* `BODC - marine data of ~22K vars `_ -* `Cambridge, MA, US, GIS data on GitHub `_ -* `EOSDIS - NASA's earth observing system data `_ -* `Factual Global Location Data `_ -* `Geo Spatial Data from ASU `_ -* `GeoNames Worldwide `_ -* `Global Administrative Areas Database (GADM) `_ -* `Landsat 8 on AWS `_ -* `List of all countries in all languages `_ -* `Natural Earth - vectors and rasters of the world `_ -* `OpenAddresses `_ -* `OpenStreetMap (OSM) `_ -* `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_ -* `TIGER/Line - U.S. boundaries and roads `_ -* `TwoFishes - Foursquare's coarse geocoder `_ -* `TZ Timezones shapfiles `_ -* `World countries in multiple formats `_ - - + +* |OK_ICON| `BIS Statistics - BIS statistics, compiled in cooperation with central [...] `_ + +* |OK_ICON| `Blockmodo Coin Registry - A registry of JSON formatted information files [...] `_ + +* |OK_ICON| `CBOE Futures Exchange `_ + +* |OK_ICON| `Complete FAANG Stock data - This data set contains all the stock data of [...] `_ + +* |OK_ICON| `Google Finance `_ + +* |OK_ICON| `Google Trends `_ + +* |FIXME_ICON| `NASDAQ `_ [`fixme `_] + +* |OK_ICON| `NYSE Market Data `_ + +* |OK_ICON| `OANDA `_ + +* |FIXME_ICON| `OSU Financial data `_ [`fixme `_] + +* |OK_ICON| `Quandl `_ + +* |OK_ICON| `St Louis Federal `_ + +* |OK_ICON| `Yahoo Finance `_ + +GIS +--- + +* |OK_ICON| `ArcGIS Open Data portal `_ + +* |OK_ICON| `Cambridge, MA, US, GIS data on GitHub `_ + +* |OK_ICON| `Database of all continents, countries, States/Subdivisions/Provinces and [...] `_ + +* |OK_ICON| `Factual Global Location Data `_ + +* |OK_ICON| `IEEE Geoscience and Remote Sensing Society DASE Website `_ + +* |OK_ICON| `Geo Maps - High Quality GeoJSON maps programmatically generated `_ + +* |FIXME_ICON| `Geo Spatial Data from ASU `_ [`fixme `_] + +* |OK_ICON| `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ + +* |OK_ICON| `GeoFabrik - OSM data extracted to a variety of formats and areas `_ + +* |OK_ICON| `GeoNames Worldwide `_ + +* |OK_ICON| `Global Administrative Areas Database (GADM) - Geospatial data organized [...] `_ + +* |OK_ICON| `Homeland Infrastructure Foundation-Level Data `_ + +* |OK_ICON| `Landsat 8 on AWS `_ + +* |OK_ICON| `List of all countries in all languages `_ + +* |OK_ICON| `National Weather Service GIS Data Portal `_ + +* |OK_ICON| `Natural Earth - vectors and rasters of the world `_ + +* |OK_ICON| `OpenAddresses `_ + +* |OK_ICON| `OpenStreetMap (OSM) `_ + +* |OK_ICON| `Pleiades - Gazetteer and graph of ancient places `_ + +* |OK_ICON| `Reverse Geocoder using OSM data `_ + +* |OK_ICON| `Robin Wilson - Free GIS Datasets `_ + +* |OK_ICON| `TIGER/Line - U.S. boundaries and roads `_ + +* |OK_ICON| `TZ Timezones shapefile `_ + +* |OK_ICON| `TwoFishes - Foursquare's coarse geocoder `_ + +* |OK_ICON| `UN Environmental Data `_ + +* |OK_ICON| `World boundaries from the U.S. Department of State `_ + +* |OK_ICON| `World countries in multiple formats `_ + Government ---------- - -* `Antwerp, Belgium `_ -* `Austin, TX, US `_ -* `Australia (abs.gov.au) `_ -* `Australia (data.gov.au) `_ -* `Austria (data.gv.at) `_ -* `Belgium `_ -* `Brazil `_ -* `Cambridge, MA, US `_ -* `Canada `_ -* `Chicago `_ -* `Dallas Open Data `_ -* `Denver Open Data `_ -* `Durham, NC Open Data `_ -* `England LGInform `_ -* `EuroStat `_ -* `FedStats `_ -* `Finland `_ -* `France `_ -* `Germany `_ -* `Ghent, Belgium `_ -* `Glasgow, Scotland, UK `_ -* `Guardian world governments `_ -* `Houston Open Data `_ -* `Indian Government Data `_ -* `Indonesian Data Portal `_ -* `London Datastore, UK `_ -* `Los Angeles Open Data `_ -* `MassGIS, Massachusetts, U.S. `_ -* `Mexico `_ -* `Netherlands `_ -* `New Zealand `_ -* `NYC betanyc `_ -* `NYC Open Data `_ -* `OECD `_ -* `Oklahoma `_ -* `Open Government Data (OGD) Platform India `_ -* `Oregon `_ -* `Portland, Oregon `_ -* `Puerto Rico Government `_ -* `Rio de Janeiro, Brazil `_ -* `Romania `_ -* `Russia `_ -* `San Francisco Data sets `_ -* `Seattle `_ -* `Singapore Government Data `_ -* `South Africa `_ -* `Switzerland `_ -* `Texas Open Data `_ -* `The World Bank `_ -* `U.K. Government Data `_ -* `U.S. American Community Survey `_ -* `U.S. CDC Public Health datasets `_ -* `U.S. Census Bureau `_ -* `U.S. Department of Housing and Urban Development (HUD) `_ -* `U.S. Federal Government Agencies `_ -* `U.S. Federal Government Data Catalog `_ -* `U.S. Food and Drug Administration (FDA) `_ -* `U.S. National Center for Education Statistics (NCES) `_ -* `U.S. Open Government `_ -* `UK 2011 Census Open Atlas Project `_ -* `United Nations `_ -* `Uruguay `_ -* `Vancouver, BC Open Data Catalog `_ - - + +* |OK_ICON| `Alberta, Province of Canada `_ + +* |OK_ICON| `Antwerp, Belgium `_ + +* |FIXME_ICON| `Argentina (non official) `_ [`fixme `_] + +* |OK_ICON| `Datos Argentina - Portal de datos abiertos de la República Argentina. [...] `_ + +* |OK_ICON| `Austin, TX, US `_ + +* |OK_ICON| `Australia (abs.gov.au) `_ + +* |OK_ICON| `Australia (data.gov.au) `_ + +* |OK_ICON| `Austria (data.gv.at) `_ + +* |OK_ICON| `Baton Rouge, LA, US `_ + +* |OK_ICON| `Beersheba, Israel - Open Data Portal (Smart7 OpenData) `_ + +* |OK_ICON| `Belgium `_ + +* |OK_ICON| `Brazil `_ + +* |OK_ICON| `Buenos Aires, Argentina `_ + +* |OK_ICON| `Calgary, AB, Canada `_ + +* |OK_ICON| `Cambridge, MA, US `_ + +* |OK_ICON| `Canada `_ + +* |OK_ICON| `Chicago `_ + +* |OK_ICON| `Chile `_ + +* |FIXME_ICON| `China `_ [`fixme `_] + +* |OK_ICON| `Dallas Open Data `_ + +* |OK_ICON| `DataBC - data from the Province of British Columbia `_ + +* |OK_ICON| `Denver Open Data `_ + +* |OK_ICON| `Durham, NC Open Data `_ + +* |OK_ICON| `Edmonton, AB, Canada `_ + +* |OK_ICON| `England LGInform `_ + +* |OK_ICON| `EuroStat `_ + +* |OK_ICON| `EveryPolitician - Ongoing project collating and sharing data on every [...] `_ + +* |OK_ICON| `Federal Committee on Statistical Methodology (FCSM) (formerly FedStats) `_ + +* |OK_ICON| `Finland `_ + +* |OK_ICON| `France `_ + +* |OK_ICON| `Fredericton, NB, Canada `_ + +* |OK_ICON| `Gatineau, QC, Canada `_ + +* |OK_ICON| `Germany `_ + +* |FIXME_ICON| `Ghent, Belgium `_ [`fixme `_] + +* |FIXME_ICON| `Glasgow, Scotland, UK `_ [`fixme `_] + +* |OK_ICON| `Greece `_ + +* |OK_ICON| `Guardian world governments `_ + +* |OK_ICON| `Halifax, NS, Canada `_ + +* |OK_ICON| `Helsinki Region, Finland `_ + +* |OK_ICON| `Hong Kong, China `_ + +* |OK_ICON| `Houston, TX, US `_ + +* |OK_ICON| `Indian Government Data `_ + +* |OK_ICON| `Indonesian Data Portal `_ + +* |OK_ICON| `Iowa - Welcome to the State of Iowa's data portal. Please explore data [...] `_ + +* |OK_ICON| `Ireland's Open Data Portal `_ + +* |OK_ICON| `Israel's Open Data Portal `_ + +* |OK_ICON| `Istanbul Municipality Open Data Portal `_ + +* |OK_ICON| `Italy - Il Portale dati.gov.it è il catalogo nazionale dei metadati [...] `_ + +* |OK_ICON| `Japan `_ + +* |OK_ICON| `Laval, QC, Canada `_ + +* |OK_ICON| `Lexington, KY `_ + +* |OK_ICON| `London Datastore, UK `_ + +* |OK_ICON| `London, ON, Canada `_ + +* |OK_ICON| `Los Angeles Open Data `_ + +* |OK_ICON| `Luxembourg - Luxembourgish Open Data Portal `_ + +* |OK_ICON| `MassGIS, Massachusetts, U.S. `_ + +* |OK_ICON| `Metropolitan Transportation Commission (MTC), California, US `_ + +* |OK_ICON| `Mexico `_ + +* |OK_ICON| `Mississauga, ON, Canada `_ + +* |OK_ICON| `Moldova `_ + +* |OK_ICON| `Moncton, NB, Canada `_ + +* |OK_ICON| `Montreal, QC, Canada `_ + +* |OK_ICON| `Mountain View, California, US (GIS) `_ + +* |FIXME_ICON| `NYC Open Data `_ [`fixme `_] + +* |OK_ICON| `NYC betanyc `_ + +* |OK_ICON| `Netherlands `_ + +* |OK_ICON| `New York Department of Sanitation Monthly Tonnage - DSNY Monthly Tonnage [...] `_ + +* |OK_ICON| `New Zealand `_ + +* |OK_ICON| `OECD `_ + +* |OK_ICON| `Oakland, California, US `_ + +* |OK_ICON| `Oklahoma `_ + +* |OK_ICON| `Open Data for Africa `_ + +* |OK_ICON| `Open Government Data (OGD) Platform India `_ + +* |OK_ICON| `OpenDataSoft's list of 1,600 open data `_ + +* |OK_ICON| `Oregon `_ + +* |OK_ICON| `Ottawa, ON, Canada `_ + +* |OK_ICON| `Palo Alto, California, US `_ + +* |OK_ICON| `OpenDataPhilly - OpenDataPhilly is a catalog of open data in the [...] `_ + +* |OK_ICON| `Portland, Oregon `_ + +* |OK_ICON| `Portugal - Pordata organization `_ + +* |OK_ICON| `Puerto Rico Government `_ + +* |FIXME_ICON| `Quebec City, QC, Canada `_ [`fixme `_] + +* |OK_ICON| `Quebec Province of Canada `_ + +* |OK_ICON| `Regina SK, Canada `_ + +* |OK_ICON| `Rio de Janeiro, Brazil `_ + +* |FIXME_ICON| `Romania `_ [`fixme `_] + +* |OK_ICON| `Russia `_ + +* |OK_ICON| `San Diego, CA `_ + +* |FIXME_ICON| `San Antonio, TX - Community Information Now - CI:Now is a nonprofit [...] `_ [`fixme `_] + +* |OK_ICON| `San Francisco Data sets `_ + +* |OK_ICON| `San Jose, California, US `_ + +* |OK_ICON| `San Mateo County, California, US `_ + +* |OK_ICON| `Saskatchewan, Province of Canada `_ + +* |OK_ICON| `Seattle `_ + +* |OK_ICON| `Singapore Government Data `_ + +* |OK_ICON| `South Africa Trade Statistics `_ + +* |OK_ICON| `South Africa `_ + +* |OK_ICON| `State of Utah, US `_ + +* |OK_ICON| `Switzerland `_ + +* |OK_ICON| `Taiwan gov `_ + +* |OK_ICON| `Taiwan `_ + +* |OK_ICON| `Tel-Aviv Open Data `_ + +* |OK_ICON| `Texas Open Data `_ + +* |OK_ICON| `The World Bank `_ + +* |FIXME_ICON| `Toronto, ON, Canada `_ [`fixme `_] + +* |FIXME_ICON| `Tunisia `_ [`fixme `_] + +* |OK_ICON| `U.K. Government Data `_ + +* |OK_ICON| `U.S. American Community Survey `_ + +* |OK_ICON| `U.S. CDC Public Health datasets `_ + +* |OK_ICON| `U.S. Census Bureau `_ + +* |OK_ICON| `U.S. Department of Housing and Urban Development (HUD) `_ + +* |OK_ICON| `U.S. Federal Government Agencies `_ + +* |OK_ICON| `U.S. Federal Government Data Catalog `_ + +* |OK_ICON| `U.S. Food and Drug Administration (FDA) `_ + +* |OK_ICON| `U.S. National Center for Education Statistics (NCES) `_ + +* |OK_ICON| `U.S. Open Government `_ + +* |OK_ICON| `UK 2011 Census Open Atlas Project `_ + +* |OK_ICON| `US Counties - This is a repository of various data, broken down by US [...] `_ + +* |OK_ICON| `U.S. Patent and Trademark Office (USPTO) Bulk Data Products `_ + +* |FIXME_ICON| `Uganda Bureau of Statistics `_ [`fixme `_] + +* |OK_ICON| `Ukraine `_ + +* |OK_ICON| `United Nations `_ + +* |OK_ICON| `Uruguay `_ + +* |OK_ICON| `Valley Transportation Authority (VTA), California, US `_ + +* |FIXME_ICON| `Vancouver, BC Open Data Catalog `_ [`fixme `_] + +* |OK_ICON| `Victoria, BC, Canada `_ + +* |OK_ICON| `Vienna, Austria `_ + +* |FIXME_ICON| `Statistics from the General Statistics Office of Vietnam - Data in [...] `_ [`fixme `_] + +* |OK_ICON| `U.S. Congressional Research Service (CRS) Reports `_ + Healthcare ---------- - -* `EHDP Large Health Data Sets `_ -* `Gapminder World, demographic databases `_ -* `Medicare Coverage Database (MCD), U.S. `_ -* `Medicare Data Engine of medicare.gov Data `_ -* `Medicare Data File `_ -* `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ -* `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ -* `Open-ODS (structure of the UK NHS) `_ -* `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ - - -Image Processing ----------------- - -* `10k US Adult Faces Database `_ -* `2GB of Photos of Cats `_ or `Archive version `_ -* `Affective Image Classification `_ -* `Animals with attributes `_ -* `Face Recognition Benchmark `_ -* `ImageNet (in WordNet hierarchy) `_ -* `Indoor Scene Recognition `_ -* `International Affective Picture System, UFL `_ -* `Massive Visual Memory Stimuli, MIT `_ -* `Stanford Dogs Dataset `_ -* `SUN database, MIT `_ -* `The Oxford-IIIT Pet Dataset `_ -* `YouTube Faces Database `_ -* `Several Shape-from-Silhouette Datasets `_ - - -Machine Learning ----------------- - -* `Delve Datasets for classification and regression (Univ. of Toronto) `_ -* `Discogs Monthly Data `_ -* `eBay Online Auctions (2012) `_ -* `IMDb Database `_ -* `Keel Repository for classification, regression and time series `_ -* `Lending Club Loan Data `_ -* `Machine Learning Data Set Repository `_ -* `Million Song Dataset `_ -* `More Song Datasets `_ -* `MovieLens Data Sets `_ -* `RDataMining - "R and Data Mining" ebook data `_ -* `Registered Meteorites on Earth `_ -* `Restaurants Health Score Data in San Francisco `_ -* `UCI Machine Learning Repository `_ -* `Yahoo! Ratings and Classification Data `_ - - + +* |OK_ICON| `AWS COVID-19 Datasets - We're working with organizations who make [...] `_ + +* |OK_ICON| `COVID-19 Case Surveillance Public Use Data - The COVID-19 case [...] `_ + +* |OK_ICON| `2019 Novel Coronavirus COVID-19 Data Repository by Johns Hopkins CSSE - [...] `_ + +* |OK_ICON| `Coronavirus (Covid-19) Data in the United States - The New York Times is [...] `_ + +* |OK_ICON| `Composition of Foods Raw, Processed, Prepared USDA National Nutrient Database for Standard [...] `_ + +* |OK_ICON| `The COVID Tracking Project - The COVID Tracking Project collects and [...] `_ + +* |OK_ICON| `EHDP Large Health Data Sets `_ + +* |OK_ICON| `GDC - GDC supports several cancer genome programs for CCG, TCGA, TARGET etc. `_ + +* |OK_ICON| `Gapminder World demographic databases `_ + +* |FIXME_ICON| `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ [`fixme `_] + +* |OK_ICON| `Medicare Coverage Database (MCD), U.S. `_ + +* |OK_ICON| `Medicare Data Engine of medicare.gov Data `_ + +* |OK_ICON| `Medicare Data File `_ + +* |OK_ICON| `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ + +* |OK_ICON| `Open-ODS (structure of the UK NHS) `_ + +* |OK_ICON| `OpenPaymentsData, Healthcare financial relationship data `_ + +* |OK_ICON| `PhysioBank Databases - A large and growing archive of physiological data. `_ + +* |OK_ICON| `The Cancer Imaging Archive (TCIA) `_ + +* |OK_ICON| `The Cancer Genome Atlas project (TCGA) `_ + +* |OK_ICON| `World Health Organization Global Health Observatory `_ + +* |OK_ICON| `Yahoo Knowledge Graph COVID-19 Datasets - The Yahoo Knowledge Graph team [...] `_ + +* |OK_ICON| `Informatics for Integrating Biology & the Bedside `_ + +ImageProcessing +--------------- + +* |OK_ICON| `10k US Adult Faces Database `_ + +* |OK_ICON| `2GB of Photos of Cats `_ + +* |OK_ICON| `Audience Unfiltered faces for gender and age classification `_ + +* |OK_ICON| `Affective Image Classification `_ + +* |OK_ICON| `Animals with attributes `_ + +* |FIXME_ICON| `CADDY Underwater Stereo-Vision Dataset of divers' hand gestures - [...] `_ [`fixme `_] + +* |OK_ICON| `Cytology Dataset – CCAgT: Images of Cervical Cells with AgNOR Stain [...] `_ + +* |OK_ICON| `Caltech Pedestrian Detection Benchmark `_ + +* |OK_ICON| `Chars74K dataset - Character Recognition in Natural Images (both English [...] `_ + +* |OK_ICON| `Danbooru Tagged Anime Illustration Dataset - A large-scale anime image [...] `_ + +* |FIXME_ICON| `DukeMTMC Data Set - DukeMTMC aims to accelerate advances in multi-target [...] `_ [`fixme `_] + +* |OK_ICON| `Face Recognition Benchmark `_ + +* |FIXME_ICON| `Flickr: 32 Class Brand Logos `_ [`fixme `_] + +* |OK_ICON| `GDXray - X-ray images for X-ray testing and Computer Vision `_ + +* |OK_ICON| `HumanEva Dataset - The HumanEva-I dataset contains 7 calibrated video [...] `_ + +* |OK_ICON| `ImageNet (in WordNet hierarchy) `_ + +* |OK_ICON| `Indoor Scene Recognition `_ + +* |OK_ICON| `International Affective Picture System, UFL `_ + +* |OK_ICON| `KITTI Vision Benchmark Suite `_ + +* |OK_ICON| `Labeled Information Library of Alexandria - Biology and Conservation - [...] `_ + +* |OK_ICON| `MNIST database of handwritten digits, near 1 million examples `_ + +* |OK_ICON| `Multi-View Region of Interest Prediction Dataset for Autonomous Driving - [...] `_ + +* |FIXME_ICON| `Massive Visual Memory Stimuli, MIT `_ [`fixme `_] + +* |OK_ICON| `Open Images From Google - Pictures with segmentation masks for 2.8 [...] `_ + +* |OK_ICON| `RuFa - Contains images of text written in one of two Arabic fonts (Ruqaa [...] `_ + +* |OK_ICON| `SUN database, MIT `_ + +* |OK_ICON| `SVIRO Synthetic Vehicle Interior Rear Seat Occupancy - 25.000 synthetic [...] `_ + +* |FIXME_ICON| `Several Shape-from-Silhouette Datasets `_ [`fixme `_] + +* |OK_ICON| `Stanford Dogs Dataset `_ + +* |OK_ICON| `The Action Similarity Labeling (ASLAN) Challenge `_ + +* |OK_ICON| `The Oxford-IIIT Pet Dataset `_ + +* |OK_ICON| `Violent-Flows - Crowd Violence / Non-violence Database and benchmark `_ + +* |OK_ICON| `Visual genome `_ + +* |OK_ICON| `YouTube Faces Database `_ + +MachineLearning +--------------- + +* |OK_ICON| `All-Age-Faces Dataset - Contains 13'322 Asian face images distributed [...] `_ + +* |OK_ICON| `Audi Autonomous Driving Dataset - We have published the Audi Autonomous [...] `_ + +* |OK_ICON| `Context-aware data sets from five domains `_ + +* |OK_ICON| `Delve Datasets for classification and regression `_ + +* |OK_ICON| `Discogs Monthly Data `_ + +* |OK_ICON| `Free Music Archive `_ + +* |OK_ICON| `IMDb Database `_ + +* |OK_ICON| `Keel Repository for classification, regression and time series `_ + +* |OK_ICON| `Labeled Faces in the Wild (LFW) `_ + +* |OK_ICON| `Lending Club Loan Data `_ + +* |FIXME_ICON| `Machine Learning Data Set Repository `_ [`fixme `_] + +* |OK_ICON| `Million Song Dataset `_ + +* |OK_ICON| `More Song Datasets `_ + +* |OK_ICON| `MovieLens Data Sets `_ + +* |OK_ICON| `New Yorker caption contest ratings `_ + +* |OK_ICON| `RDataMining - "R and Data Mining" ebook data `_ + +* |FIXME_ICON| `Registered Meteorites on Earth `_ [`fixme `_] + +* |OK_ICON| `Restaurants Health Score Data in San Francisco `_ + +* |OK_ICON| `UCI Machine Learning Repository `_ + +* |OK_ICON| `Yahoo! Ratings and Classification Data `_ + +* |OK_ICON| `YouTube-BoundingBoxes `_ + +* |OK_ICON| `Youtube 8m `_ + +* |OK_ICON| `eBay Online Auctions (2012) `_ + Museums ------- - -* `Cooper-Hewitt's Collection Database `_ -* `Minneapolis Institute of Arts metadata `_ -* `Natural History Museum (London) Data Portal `_ -* `Rijksmuseum Historical Art Collection `_ -* `Tate Collection metadata `_ -* `The Getty vocabularies `_ -* `Canada Science and Technology Museums Corporation's Open Data `_ - - -Natural Language ----------------- - -* `Blogger Corpus `_ -* `ClueWeb09 FACC `_ -* `ClueWeb12 FACC `_ -* `DBpedia - 4.58M things with 583M facts `_ -* `Flickr Personal Taxonomies `_ -* `Google Books Ngrams (2.2TB) `_ -* `Google Web 5gram (1TB, 2006) `_ -* `Gutenberg eBooks List `_ -* `Hansards text chunks of Canadian Parliament `_ -* `Machine Translation of European languages `_ -* `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ -* `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ -* `SMS Spam Collection in English `_ -* `USENET postings corpus of 2005~2011 `_ -* `Wikidata - Wikipedia databases `_ -* `Wikipedia Links data - 40 Million Entities in Context `_ -* `WordNet databases and tools `_ - - + +* |OK_ICON| `Canada Science and Technology Museums Corporation's Open Data `_ + +* |OK_ICON| `Cooper-Hewitt's Collection Database `_ + +* |OK_ICON| `Minneapolis Institute of Arts metadata `_ + +* |OK_ICON| `Natural History Museum (London) Data Portal `_ + +* |OK_ICON| `Rijksmuseum Historical Art Collection `_ + +* |OK_ICON| `Tate Collection metadata `_ + +* |OK_ICON| `The Getty vocabularies `_ + +NaturalLanguage +--------------- + +* |OK_ICON| `Automatic Keyphrase Extraction `_ + +* |OK_ICON| `The Big Bad NLP Database `_ + +* |OK_ICON| `Blizzard Challenge Speech - The speech + text data comes from [...] `_ + +* |OK_ICON| `Blogger Corpus `_ + +* |FIXME_ICON| `CLiPS Stylometry Investigation Corpus `_ [`fixme `_] + +* |OK_ICON| `ClueWeb09 FACC `_ + +* |OK_ICON| `ClueWeb12 FACC `_ + +* |OK_ICON| `DBpedia - 4.58M things with 583M facts `_ + +* |OK_ICON| `Dirty Words - With millions of images in our library and billions of [...] `_ + +* |OK_ICON| `Flickr Personal Taxonomies `_ + +* |FIXME_ICON| `Freebase of people, places, and things `_ [`fixme `_] + +* |OK_ICON| `German Political Speeches Corpus - Collection of political speeches from [...] `_ + +* |OK_ICON| `Google Books Ngrams (2.2TB) `_ + +* |OK_ICON| `Google MC-AFP - Generated based on the public available Gigaword dataset [...] `_ + +* |OK_ICON| `Google Web 5gram (1TB, 2006) `_ + +* |OK_ICON| `Gutenberg eBooks List `_ + +* |OK_ICON| `Hansards text chunks of Canadian Parliament `_ + +* |OK_ICON| `LJ Speech - Speech dataset consisting of 13,100 short audio clips of a [...] `_ + +* |FIXME_ICON| `M-AILabs Speech - The M-AILABS Speech Dataset is the first large dataset [...] `_ [`fixme `_] + +* |OK_ICON| `Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) `_ + +* |OK_ICON| `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ + +* |FIXME_ICON| `Machine Translation of European languages `_ [`fixme `_] + +* |FIXME_ICON| `Making Sense of Microposts 2013 - Concept Extraction `_ [`fixme `_] + +* |OK_ICON| `Making Sense of Microposts 2016 - Named Entity rEcognition and Linking `_ + +* |OK_ICON| `Multi-Domain Sentiment Dataset (version 2.0) `_ + +* |OK_ICON| `Noisy speech database for training speech enhancement algorithms and TTS [...] `_ + +* |OK_ICON| `Open Multilingual Wordnet `_ + +* |OK_ICON| `POS/NER/Chunk annotated data `_ + +* |FIXME_ICON| `Personae Corpus `_ [`fixme `_] + +* |OK_ICON| `SMS Spam Collection in English `_ + +* |OK_ICON| `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ + +* |OK_ICON| `Stanford Question Answering Dataset (SQuAD) `_ + +* |OK_ICON| `USENET postings corpus of 2005~2011 `_ + +* |OK_ICON| `Universal Dependencies `_ + +* |OK_ICON| `Webhose - News/Blogs in multiple languages `_ + +* |OK_ICON| `Wikidata - Wikipedia databases `_ + +* |OK_ICON| `Wikipedia Links data - 40 Million Entities in Context `_ + +* |OK_ICON| `WordNet databases and tools `_ + +* |OK_ICON| `WorldTree Corpus of Explanation Graphs for Elementary Science Questions - [...] `_ + +Neuroscience +------------ + +* |OK_ICON| `Allen Institute Datasets `_ + +* |OK_ICON| `Brain Catalogue `_ + +* |OK_ICON| `Brainomics `_ + +* |FIXME_ICON| `CodeNeuro Datasets `_ [`fixme `_] + +* |OK_ICON| `Collaborative Research in Computational Neuroscience (CRCNS) `_ + +* |OK_ICON| `FCP-INDI `_ + +* |OK_ICON| `Human Connectome Project `_ + +* |OK_ICON| `NDAR `_ + +* |OK_ICON| `NIMH Data Archive `_ + +* |OK_ICON| `NeuroData `_ + +* |OK_ICON| `NeuroMorpho - NeuroMorpho.Org is a centrally curated inventory of [...] `_ + +* |OK_ICON| `Neuroelectro `_ + +* |OK_ICON| `OASIS `_ + +* |OK_ICON| `OpenNEURO `_ + +* |FIXME_ICON| `OpenfMRI `_ [`fixme `_] + +* |OK_ICON| `Study Forrest `_ + Physics ------- - -* `CERN Open Data Portal `_ -* `NASA Exoplanet Archive `_ -* `NSSDC (NASA) data of 550 space spacecraft `_ -* `Sloan Digital Sky Survey (SDSS) - Mapping the Universe `_ - - -Psychology/Cognition + +* |OK_ICON| `CERN Open Data Portal `_ + +* |OK_ICON| `Crystallography Open Database `_ + +* |OK_ICON| `IceCube - South Pole Neutrino Observatory `_ + +* |OK_ICON| `Ligo Open Science Center (LOSC) - Gravitational wave data from the LIGO [...] `_ + +* |OK_ICON| `NASA Exoplanet Archive `_ + +* |OK_ICON| `NSSDC (NASA) data of 550 space spacecraft `_ + +* |OK_ICON| `Sloan Digital Sky Survey (SDSS) - Mapping the Universe `_ + +ProstateCancer -------------- - -* `OSU Cognitive Modeling Repository Datasets `_ - - -Public Domains + +* |OK_ICON| `EOPC-DE-Early-Onset-Prostate-Cancer-Germany - Early Onset Prostate Cancer [...] `_ + +* |OK_ICON| `GENIE - Data from the Genomics Evidence Neoplasia Information Exchange [...] `_ + +* |OK_ICON| `Genomic-Hallmarks-Prostate-Adenocarcinoma-CPC-GENE - Comprehensive [...] `_ + +* |OK_ICON| `MSK-IMPACT-Clinical-Sequencing-Cohort-MSKCC-Prostate-Cancer - Targeted [...] `_ + +* |OK_ICON| `Metastatic-Prostate-Adenocarcinoma-MCTP - Comprehensive profiling of 61 [...] `_ + +* |OK_ICON| `Metastatic-Prostate-Cancer-SU2CPCF-Dream-Team - Comprehensive analysis of [...] `_ + +* |OK_ICON| `NPCR-2001-2015 - Database from CDC's National Program of Cancer [...] `_ + +* |OK_ICON| `NPCR-2005-2015 - Database from CDC's National Program of Cancer [...] `_ + +* |OK_ICON| `NaF-Prostate - NaF Prostate is a collection of F-18 NaF positron emission [...] `_ + +* |OK_ICON| `Neuroendocrine-Prostate-Cancer - Whole exome and RNA Seq data of [...] `_ + +* |OK_ICON| `PLCO-Prostate-Diagnostic-Procedures - The Prostate Diagnostic Procedures [...] `_ + +* |OK_ICON| `PLCO-Prostate-Medical-Complications - The Prostate Medical Complications [...] `_ + +* |OK_ICON| `PLCO-Prostate-Screening-Abnormalities - The Prostate Screening [...] `_ + +* |OK_ICON| `PLCO-Prostate-Screening - The Prostate Screening dataset (177,315 [...] `_ + +* |OK_ICON| `PLCO-Prostate-Treatments - The Prostate Treatments dataset (13,409 [...] `_ + +* |OK_ICON| `PLCO-Prostate - The Prostate dataset is a comprehensive dataset that [...] `_ + +* |OK_ICON| `PRAD-CA-Prostate-Adenocarcinoma-Canada - Prostate Adenocarcinoma - [...] `_ + +* |OK_ICON| `PRAD-FR-Prostate-Adenocarcinoma-France - Prostate Adenocarcinoma - [...] `_ + +* |OK_ICON| `PRAD-UK-Prostate-Adenocarcinoma-United-Kingdom - Prostate Adenocarcinoma [...] `_ + +* |OK_ICON| `PROSTATEx-Challenge - Retrospective set of prostate MR studies. All [...] `_ + +* |OK_ICON| `Prostate-3T - The Prostate-3T project provided imaging data to TCIA as [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-Broad-Cornell-2012 - Comprehensive profiling of [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-Broad-Cornell-2013 - Comprehensive profiling of [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-CNA-study-MSKCC - Copy-number profiling of 103 [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-Fred-Hutchinson-CRC - Comprehensive profiling of [...] `_ + +* |OK_ICON| `Prostate Adenocarcinoma (MSKCC/DFCI) - Whole Exome Sequencing of 1013 [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-MSKCC - MSKCC Prostate Oncogenome Project. 181 [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-Organoids-MSKCC - Exome profiling of prostate [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-Sun-Lab - Whole-genome and Transcriptome [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-TCGA-PanCancer-Atlas - Comprehensive TCGA [...] `_ + +* |OK_ICON| `Prostate-Adenocarcinoma-TCGA - Integrated profiling of 333 primary [...] `_ + +* |OK_ICON| `Prostate-Diagnosis - PCa T1- and T2-weighted magnetic resonance images [...] `_ + +* |OK_ICON| `Prostate-Fused-MRI-Pathology - The Prostate Fused-MRI-Pathology [...] `_ + +* |OK_ICON| `Prostate-MRI - The Prostate-MRI collection of prostate Magnetic Resonance [...] `_ + +* |OK_ICON| `Prostate-R - The R package 'ElemStatLearn' contains a prostate cancer [...] `_ + +* |OK_ICON| `QIN-PROSTATE-Repeatability - The QIN-PROSTATE-Repeatability dataset is a [...] `_ + +* |OK_ICON| `QIN-PROSTATE - The QIN PROSTATE collection of the Quantitative Imaging [...] `_ + +* |OK_ICON| `SEER-YR1973_2015.SEER9 - The SEER November 2017 Research Data files from [...] `_ + +* |OK_ICON| `SEER-YR1992_2015.SJ_LA_RG_AK - The SEER November 2017 Research Data files [...] `_ + +* |OK_ICON| `SEER-YR2000_2015.CA_KY_LO_NJ_GA - The SEER November 2017 Research Data [...] `_ + +* |OK_ICON| `SEER-YR2000_2015.CA_KY_LO_NJ_GA - The July - December 2005 diagnoses for [...] `_ + +* |OK_ICON| `TCGA-PRAD-US - TCGA Prostate Adenocarcinoma (499 samples). `_ + +Psychology+Cognition +-------------------- + +* |FIXME_ICON| `OSU Cognitive Modeling Repository Datasets `_ [`fixme `_] + +PublicDomains +------------- + +* |OK_ICON| `Amazon `_ + +* |OK_ICON| `Archive.org Datasets `_ + +* |OK_ICON| `Archive-it from Internet Archive `_ + +* |OK_ICON| `CMU JASA data archive `_ + +* |OK_ICON| `CMU StatLab collections `_ + +* |FIXME_ICON| `Data.World `_ [`fixme `_] + +* |FIXME_ICON| `Data360 `_ [`fixme `_] + +* |OK_ICON| `Enigma Public `_ + +* |OK_ICON| `Google `_ + +* |OK_ICON| `Grand Comics Database - The Grand Comics Database (GCD) is a nonprofit, [...] `_ + +* |FIXME_ICON| `Infochimps `_ [`fixme `_] + +* |OK_ICON| `KDNuggets Data Collections `_ + +* |OK_ICON| `Microsoft Azure Data Market Free DataSets `_ + +* |OK_ICON| `Microsoft Data Science for Research `_ + +* |OK_ICON| `Microsoft Research Open Data `_ + +* |FIXME_ICON| `Numbray `_ [`fixme `_] + +* |OK_ICON| `Open Library Data Dumps `_ + +* |FIXME_ICON| `Reddit Datasets `_ [`fixme `_] + +* |OK_ICON| `RevolutionAnalytics Collection `_ + +* |OK_ICON| `Sample R data sets `_ + +* |OK_ICON| `StatSci.org `_ + +* |OK_ICON| `Stats4Stem R data sets (archived) `_ + +* |OK_ICON| `The Washington Post List `_ + +* |OK_ICON| `UCLA SOCR data collection `_ + +* |OK_ICON| `UFO Reports `_ + +* |OK_ICON| `Wikileaks 911 pager intercepts `_ + +* |OK_ICON| `Yahoo Webscope `_ + +SearchEngines +------------- + +* |OK_ICON| `Academic Torrents of data sharing from UMB `_ + +* |FIXME_ICON| `DataMarket (Qlik) `_ [`fixme `_] + +* |OK_ICON| `Datahub.io `_ + +* |OK_ICON| `Domains Project - Sorted list of Internet domains `_ + +* |OK_ICON| `Harvard Dataverse Network of scientific data `_ + +* |FIXME_ICON| `ICPSR (UMICH) `_ [`fixme `_] + +* |OK_ICON| `Institute of Education Sciences `_ + +* |OK_ICON| `National Technical Reports Library `_ + +* |OK_ICON| `Open Data Certificates (beta) `_ + +* |OK_ICON| `OpenDataNetwork - A search engine of all Socrata powered data portals `_ + +* |OK_ICON| `Statista.com - statistics and Studies `_ + +* |OK_ICON| `Zenodo - An open dependable home for the long-tail of science `_ + +SocialNetworks -------------- - -* `Amazon `_ -* `Archive.org Datasets `_ -* `CMU JASA data archive `_ -* `CMU StatLab collections `_ -* `Data360 `_ -* `Datamob.org `_ -* `Google `_ -* `Infochimps `_ -* `KDNuggets Data Collections `_ -* `Microsoft Azure Data Market Free DataSets `_ -* `Numbray `_ -* `Reddit Datasets `_ -* `RevolutionAnalytics Collection `_ -* `Sample R data sets `_ -* `Stats4Stem R data sets `_ -* `StatSci.org `_ -* `The Washington Post List `_ -* `UCLA SOCR data collection `_ -* `UFO Reports `_ -* `Wikileaks 911 pager intercepts `_ -* `Yahoo Webscope `_ - - -Search Engines + +* |OK_ICON| `72 hours #gamergate Twitter Scrape `_ + +* |OK_ICON| `CMU Enron Email of 150 users `_ + +* |OK_ICON| `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ + +* |OK_ICON| `China Biographical Database - The China Biographical Database is a freely [...] `_ + +* |OK_ICON| `A Twitter Dataset of 40+ million tweets related to COVID-19 - Due to the [...] `_ + +* |OK_ICON| `43k+ Donald Trump Twitter Screenshots - This archive contains screenshots [...] `_ + +* |OK_ICON| `EDRM Enron EMail of 151 users, hosted on S3 `_ + +* |OK_ICON| `Facebook Data Scrape (2005) `_ + +* |OK_ICON| `Facebook Social Networks from LAW (since 2007) `_ + +* |OK_ICON| `Foursquare from UMN/Sarwat (2013) `_ + +* |OK_ICON| `GitHub Collaboration Archive `_ + +* |FIXME_ICON| `Google Scholar citation relations `_ [`fixme `_] + +* |OK_ICON| `High-Resolution Contact Networks from Wearable Sensors `_ + +* |OK_ICON| `Indie Map: social graph and crawl of top IndieWeb sites `_ + +* |OK_ICON| `Mobile Social Networks from UMASS `_ + +* |OK_ICON| `Network Twitter Data `_ + +* |OK_ICON| `Reddit Comments `_ + +* |OK_ICON| `Skytrax' Air Travel Reviews Dataset `_ + +* |OK_ICON| `Social Twitter Data `_ + +* |OK_ICON| `SourceForge.net Research Data `_ + +* |OK_ICON| `Twitch Top Streamer's Data `_ + +* |OK_ICON| `Twitter Data for Online Reputation Management `_ + +* |OK_ICON| `Twitter Data for Sentiment Analysis `_ + +* |OK_ICON| `Twitter Graph of entire Twitter site `_ + +* |FIXME_ICON| `Twitter Scrape Calufa May 2011 `_ [`fixme `_] + +* |OK_ICON| `UNIMI/LAW Social Network Datasets `_ + +* |OK_ICON| `United States Congress Twitter Data - Daily datasets with tweets of 1100+ [...] `_ + +* |OK_ICON| `Yahoo! Graph and Social Data `_ + +* |OK_ICON| `Youtube Video Social Graph in 2007,2008 `_ + +SocialSciences -------------- - -* `Academic Torrents of data sharing from UMB `_ -* `Archive-it from Internet Archive `_ -* `Datahub.io `_ -* `DataMarket (Qlik) `_ -* `Freebase.com of people, places, and things `_ -* `Harvard Dataverse Network of scientific data `_ -* `ICPSR (UMICH) `_ -* `Open Data Certificates (beta) `_ -* `Statista.com - statistics and Studies `_ - - -Social Networks ---------------- - -* `72 hours #gamergate scrape `_ -* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ -* `May 2011 Calufa Twitter Scrape `_ -* `Network Twitter Data `_ -* `Social Twitter Data `_ -* `Twitter Data for Sentiment Analysis `_ - - -Social Sciences ---------------- - -* `Ancestry.com Forum Dataset over 10 years `_ -* `CMU Enron Email of 150 users `_ -* `EDRM Enron EMail of 151 users, hosted on S3 `_ -* `Facebook Data Scrape (2005) `_ -* `Facebook Social Networks from LAW (since 2007) `_ -* `FBI Hate Crime 2013 - aggregated data `_ -* `Foursquare from UMN/Sarwat (2013) `_ -* `GDELT Global Events Database `_ -* `General Social Survey (GSS) since 1972 `_ -* `GetGlue - users rating TV shows `_ -* `GitHub Collaboration Archive `_ -* `Google Scholar citation relations `_ -* `MIT Reality Mining Dataset `_ -* `Mobile Social Networks from UMASS `_ -* `PewResearch Internet Survey Project `_ -* `Political Polarity Data `_ -* `Reddit Comments `_ -* `Skytrax' Air Travel Reviews Dataset `_ -* `SourceForge.net Research Data `_ -* `StackExchange Data Explorer `_ -* `Texas Inmates Executed Since 1984 `_ -* `Titanic Survival Data Set `_ -* `Twitter Graph of entire Twitter site `_ -* `UCB's Archive of Social Science Data (D-Lab) `_ -* `UCLA Social Sciences Data Archive `_ -* `UNIMI/LAW Social Network Datasets `_ -* `Universities Worldwide `_ -* `UPJOHN for Labor Employment Research `_ -* `Yahoo! Graph and Social Data `_ -* `Youtube Video Social Graph in 2007,2008 `_ - - + +* |OK_ICON| `ACLED (Armed Conflict Location & Event Data Project) `_ + +* |OK_ICON| `Authoritarian Ruling Elites Database - The Authoritarian Ruling Elites [...] `_ + +* |OK_ICON| `Canadian Legal Information Institute `_ + +* |FIXME_ICON| `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc `_ [`fixme `_] + +* |OK_ICON| `Correlates of War Project `_ + +* |OK_ICON| `Cryptome Conspiracy Theory Items `_ + +* |FIXME_ICON| `Datacards `_ [`fixme `_] + +* |OK_ICON| `European Social Survey `_ + +* |OK_ICON| `FBI Hate Crime 2013 - aggregated data `_ + +* |FIXME_ICON| `Fragile States Index `_ [`fixme `_] + +* |OK_ICON| `GDELT Global Events Database `_ + +* |OK_ICON| `General Social Survey (GSS) since 1972 `_ + +* |OK_ICON| `German Social Survey `_ + +* |OK_ICON| `Global Religious Futures Project `_ + +* |OK_ICON| `Gun Violence Data - A comprehensive, accessible database that contains [...] `_ + +* |OK_ICON| `Humanitarian Data Exchange `_ + +* |OK_ICON| `INFORM Index for Risk Management `_ + +* |OK_ICON| `Institute for Demographic Studies `_ + +* |OK_ICON| `International Networks Archive `_ + +* |OK_ICON| `International Social Survey Program ISSP `_ + +* |OK_ICON| `International Studies Compendium Project `_ + +* |OK_ICON| `James McGuire Cross National Data `_ + +* |OK_ICON| `MIT Reality Mining Dataset `_ + +* |OK_ICON| `MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ + +* |OK_ICON| `Mass Mobilization Data Project - The Mass Mobilization (MM) data are an [...] `_ + +* |OK_ICON| `Microsoft Academic Knowledge Graph - The Microsoft Academic Knowledge [...] `_ + +* |OK_ICON| `Minnesota Population Center `_ + +* |OK_ICON| `Notre Dame Global Adaptation Index (ND-GAIN) `_ + +* |OK_ICON| `Open Crime and Policing Data in England, Wales and Northern Ireland `_ + +* |OK_ICON| `OpenSanctions - A global database of persons and companies of political, [...] `_ + +* |OK_ICON| `Paul Hensel General International Data Page `_ + +* |OK_ICON| `PewResearch Internet Survey Project `_ + +* |OK_ICON| `PewResearch Society Data Collection `_ + +* |FIXME_ICON| `Political Polarity Data `_ [`fixme `_] + +* |OK_ICON| `StackExchange Data Explorer `_ + +* |OK_ICON| `Terrorism Research and Analysis Consortium `_ + +* |OK_ICON| `Texas Inmates Executed Since 1984 `_ + +* |OK_ICON| `Titanic Survival Data Set `_ + +* |FIXME_ICON| `UCB's Archive of Social Science Data (D-Lab) `_ [`fixme `_] + +* |OK_ICON| `UCLA Social Sciences Data Archive `_ + +* |OK_ICON| `UN Civil Society Database `_ + +* |OK_ICON| `UPJOHN for Labor Employment Research `_ + +* |OK_ICON| `Universities Worldwide `_ + +* |OK_ICON| `Uppsala Conflict Data Program `_ + +* |OK_ICON| `World Bank Open Data `_ + +* |OK_ICON| `WorldPop project - Worldwide human population distributions `_ + +Software +-------- + +* |OK_ICON| `FLOSSmole data about free, libre, and open source software development `_ + +* |OK_ICON| `GHTorrent - Scalable, queryable, offline mirror of data offered through [...] `_ + +* |OK_ICON| `Libraries.io Open Source Repository and Dependency Metadata `_ + +* |OK_ICON| `Public Git Archive - a Big Code dataset for all – dataset of 182,014 top- [...] `_ + +* |OK_ICON| `Code duplicates - 2k Java file and 600 Java function pairs labeled as [...] `_ + +* |OK_ICON| `Commit messages - 1.3 billion GitHub commit messages till March 2019 `_ + +* |OK_ICON| `Pull Request review comments - 25.3 million GitHub PR review comments [...] `_ + +* |OK_ICON| `Source Code Identifiers - 41.7 million distinct splittable identifiers [...] `_ + Sports ------ - -* `Betfair Historical Exchange Data `_ -* `Cricsheet Matches (cricket) `_ -* `Ergast Formula 1, from 1950 up to date (API) `_ -* `Football/Soccer resources (data and APIs) `_ -* `Lahman's Baseball Database `_ -* `Retrosheet Baseball Statistics `_ - - -Time Series ------------ - -* `Hard Drive Failure Rates `_ -* `Heart Rate Time Series from MIT `_ -* `Time Series Data Library (TSDL) from MU `_ -* `UC Riverside Time Series Dataset `_ - - + +* |OK_ICON| `American Ninja Warrior Obstacles - Contains every obstacle in the history [...] `_ + +* |OK_ICON| `Betfair Historical Exchange Data `_ + +* |OK_ICON| `Cricsheet Matches (cricket) `_ + +* |OK_ICON| `Ergast Formula 1, from 1950 up to date (API) `_ + +* |OK_ICON| `Football/Soccer resources (data and APIs) `_ + +* |OK_ICON| `Lahman's Baseball Database `_ + +* |OK_ICON| `NFL play-by-play data - NFL play-by-play data sourced from: [...] `_ + +* |OK_ICON| `Pinhooker: Thoroughbred Bloodstock Sale Data `_ + +* |OK_ICON| `Pro Kabadi season 1 to 7 - Pro Kabadi League is a professional-level [...] `_ + +* |OK_ICON| `Retrosheet Baseball Statistics `_ + +* |OK_ICON| `Tennis database of rankings, results, and stats for ATP `_ + +* |OK_ICON| `Tennis database of rankings, results, and stats for WTA `_ + +TimeSeries +---------- + +* |OK_ICON| `3W dataset - To the best of its authors' knowledge, this is the first [...] `_ + +* |OK_ICON| `Databanks International Cross National Time Series Data Archive `_ + +* |OK_ICON| `Hard Drive Failure Rates `_ + +* |OK_ICON| `Heart Rate Time Series from MIT `_ + +* |OK_ICON| `Time Series Data Library (TSDL) from MU `_ + +* |OK_ICON| `Turing Change Point Dataset - Contains 42 annotated time series collected [...] `_ + +* |OK_ICON| `UC Riverside Time Series Dataset `_ + Transportation -------------- - -* `Airlines OD Data 1987-2008 `_ -* `Bay Area Bike Share Data `_ -* `Bike Share Systems (BSS) collection `_ -* `GeoLife GPS Trajectory from Microsoft Research `_ -* `German train system by Deutsche Bahn `_ -* `Hubway Million Rides in MA `_ -* `Marine Traffic - ship tracks, port calls and more `_ -* `NYC Taxi Trip Data 2009- `_ -* `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ -* `NYC Uber trip data April 2014 to September 2014 `_ -* `OpenFlights - airport, airline and route data `_ -* `Plane Crash Database, since 1920 `_ -* `RITA Airline On-Time Performance data `_ -* `RITA/BTS transport data collection (TranStat) `_ -* `Transport for London (TFL) `_ -* `Travel Tracker Survey (TTS) for Chicago `_ -* `U.S. Bureau of Transportation Statistics (BTS) `_ -* `U.S. Domestic Flights 1990 to 2009 `_ -* `U.S. Freight Analysis Framework since 2007 `_ + +* |OK_ICON| `Airlines OD Data 1987-2008 `_ + +* |OK_ICON| `Ford GoBike Data (formerly Bay Area Bike Share Data) `_ + +* |OK_ICON| `Bike Share Systems (BSS) collection `_ + +* |OK_ICON| `Dutch Traffic Information `_ + +* |OK_ICON| `GeoLife GPS Trajectory from Microsoft Research `_ + +* |OK_ICON| `German train system by Deutsche Bahn `_ + +* |FIXME_ICON| `Hubway Million Rides in MA `_ [`fixme `_] + +* |OK_ICON| `Montreal BIXI Bike Share `_ + +* |OK_ICON| `NYC Taxi Trip Data 2009- `_ + +* |OK_ICON| `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ + +* |OK_ICON| `NYC Uber trip data April 2014 to September 2014 `_ + +* |OK_ICON| `Open Traffic collection `_ + +* |OK_ICON| `OpenFlights - airport, airline and route data `_ + +* |OK_ICON| `Philadelphia Bike Share Stations (JSON) `_ + +* |OK_ICON| `Plane Crash Database, since 1920 `_ + +* |OK_ICON| `RITA Airline On-Time Performance data `_ + +* |OK_ICON| `RITA/BTS transport data collection (TranStat) `_ + +* |FIXME_ICON| `Renfe (Spanish National Railway Network) dataset `_ [`fixme `_] + +* |OK_ICON| `Toronto Bike Share Stations (JSON and GBFS files) `_ + +* |OK_ICON| `Transport for London (TFL) `_ + +* |OK_ICON| `Travel Tracker Survey (TTS) for Chicago `_ + +* |OK_ICON| `U.S. Bureau of Transportation Statistics (BTS) `_ + +* |OK_ICON| `U.S. Domestic Flights 1990 to 2009 `_ + +* |OK_ICON| `U.S. Freight Analysis Framework since 2007 `_ + +* |OK_ICON| `U.S. National Highway Traffic Safety Administration - Fatalities since [...] `_ + +eSports +------- + +* |OK_ICON| `FIFA-2021 Complete Player Dataset `_ + +* |OK_ICON| `OpenDota data dump `_ Complementary Collections ------------------------- +* `Data Packaged Core Datasets `_ + +* `Database of Scientific Code Contributions `_ + +* A growing collection of public datasets: `CoolDatasets. `_ + * DataWrangling: `Some Datasets Available on the Web `_ + * Inside-r: `Finding Data on the Internet `_ + * OpenDataMonitor: `An overview of available open data resources in Europe `_ -* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ + * Quora: `Where can I find large datasets open to the public? `_ + * RS.io: `100+ Interesting Data Sets for Statistics `_ -* StaTrek: `Leveraging open data to understand urban lives `_ -* Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_ + +* StaTrek: `Leveraging open data to understand urban lives `_ + +* CV Papers: `CV Datasets on the web `_ + +* CVonline: `Image Databases `_ +