Some links to data sets
-
The
UCI Machine Learning Repository
is one of the oldest and best sources of datasets on the web.
-
Kaggle
is a data science community that hosts machine learning competitions. You can download data from Kaggle by
entering a competition. Each
competition has its own associated dataset.
There are also user-contributed datasets that can be found there.
-
FiveThirtyEight
makes the datasets used in its articles available online on Github.
-
Amazon Datasets
(but you need an Amazon Web Services account, which I think is free.)
-
Google
lists all of the datasets on a page. You will need to sign up for a GCP account, but the first 1TB of queries
you make are free.
-
Data.gov
is a relatively new site that is part of a US effort towards open government.
-
Reddit, a popular community discussion site, has a section devoted to sharing
interesting datasets.
-
Springboard has a list of various data sets, ranging from US Census data to Crime data and the Enron email
data.