Some links to data sets
UCI Machine Learning Repository
is one of the oldest and best sources of datasets on the web.
is a data science community that hosts machine learning competitions. You can download data from Kaggle by
entering a competition. Each
competition has its own associated dataset.
There are also user-contributed datasets that can be found there.
makes the datasets used in its articles available online on Github.
Amazon Datasets
(but you need an Amazon Web Services account, which I think is free.)
lists all of the datasets on a page. You will need to sign up for a GCP account, but the first 1TB of queries
you make are free.
is a relatively new site that is part of a US effort towards open government.
Reddit, a popular community discussion site, has a section devoted to sharing
interesting datasets.
Springboard has a list of various data sets, ranging from US Census data to Crime data and the Enron email