Datasets to practice Machine Learning

If you are looking for datasets to practice your data analysis and Machine Learning skills, here are a few websites. All the datasets listed are available for free.

CSIE

CSIE’s page contains Classification, Regression, multi-label and string datasets from different sources including UCI, Statlog, StatLib. CSIE is the Department of Computer Science of the National Taiwan University. Despite a few broken links to the source of the dataset, the dataset itself can still be downloaded on CSIE website.


UCI

UCI lists more than 400 datasets, from Binary, Multivariate Classification, to Regression. The page is maintained by the Center for Machine Learning and Intelligent Systems of University of California Irvine. There is a very interesting article about what’s good or bad about the UCI datasets, and some advice on how to best use it.


Academic Torrent

The torrent site was created in 2014 by 2 researchers from the University of Massachusetts. The datasets are from a variety of sources (academic, industry): the dataset used for the Netflix challenge for example is available for download. You can also download research papers and courses: all the videos of CS231 Standford Course on Deep Learning can be downloaded. This is a great source to try yourself at image processing, object recognition algorithms.

https://github.com/curran/data

https://data.world/