Top 5 Free Dataset Sources
You’ve built skills in analytics and now you’re looking to expand them or put them into practice.
Where do you turn for free datasets to use?
Here are my top 5 free dataset sources. There are MANY more resources available than what I’m going to mention here. These are a few that will give you access to millions of different datasets around the world though.
Data.gov is a large dataset aggregator and the home of the US Government’s open data.
There are 14 different major topics (from agriculture, public safety, to local government) so you have high chances to find a dataset that will be really interesting for you.
The search here is simple and you can browse the data sets directly, without registering. You can also apply filters by topic category, location, tags, file format, organizations and more to make your search more effective.
If you’ve spent any time looking at analytical commentary - especially around US elections - you’ve almost certainly heard of FiveThirtyEight.
FiveThiryEight originally focused on political analysis. The name comes from the number of electoral college members.
Now they also cover sports and general social topics of interest.
What you might not know about Five Thirty Eight though is they share the datasets behind their work both on their website and on Github.
This is a great resource to not only see datasets, but also see how a well-respected analytics organization provides meaningful insights and commentary on the data.
Kaggle is a great resource not only for free datasets, but for data science topics in general.
In addition to a plethora of datasets that are easy to search, you can also build skills and submit your solutions to their competitions for a chance to win some substantial prizes.
Data.world is a bit different than some of the other resources on this list. Unlike many of the other resources which provide training / commentary / etc, data.world is focused primarily on data collaboration for companies.
While their primary focus is providing a platform for companies to store, organize, and collaborate around their own data, they also offer hundreds of thousands of free datasets for anyone that sets up an account.
The data in data.world covers all kinds of topics like finance, crime, economy, Twitter, NASA and more.
You can write SQL and SPARQL queries to explore numerous files at once and join multiple datasets as well as upload your own data.
No list of dataset resources would be complete without mentioning Google Dataset Search.
By accessing thousands of different repositories across the web, they’re able to provide access to almost 25 million different publicly available datasets.
A lot of the data in the search index comes from government agencies. In total, Google says, there are about 2 million U.S. government data sets in the index right now.
This means there is some overlap with the other sources mentioned, but it’s not a complete overlap.