paint-brush
10 Best Stock Market Datasets for Machine Learningby@limarc
13,164 reads
13,164 reads

10 Best Stock Market Datasets for Machine Learning

by Limarc AmbalinaJuly 9th, 2021
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Historical Stock Market Dataset includes historical daily prices and volume information for US stocks and ETFs trading on NASDAQ, NYSE, and NYSE MKT. The data is in a CSV file and includes information from 1977 to 2017. Free Forex data is available for the following applications/platforms: General ASCII, MetaStock, MetaTrader, Microsoft Excel, and Ninja Trader. With the rise of cryptocurrencies around the world, more and more people are looking to invest in it.

Company Mentioned

Mention Thumbnail
featured image - 10 Best Stock Market Datasets for Machine Learning
Limarc Ambalina HackerNoon profile picture

This article I wrote on Lionbridge AI is one of my most-viewed articles on the blog and has received a lot of attention in data science forums. Since a lot of people seem to find it useful, I’m happy to share it with the Hacker Noon community.

With the rise of cryptocurrencies around the world, there are now more ways than ever for people to invest their money. However, it’s not as simple as buying low and selling high. If you could accurately predict the stock market, you’d be one of the richest people on earth. As a result, there have been previous studies on how to predict the stock market using sentiment analysis. For those of you looking to build similar predictive models, this article will introduce 10 stock market and cryptocurrency datasets for machine learning.

Stock Market Datasets

1. Historical Stock Market Dataset — This dataset includes the historical daily prices and volume information for US stocks and ETFs trading on NASDAQ, NYSE, and NYSE MKT. The data was last updated on November 10th, 2017 and the files are all in CSV format.

2. Istanbul Stock Exchange — With data taken from imkb.gov.tr and finance.yahoo.com, this dataset was created to test predictive algorithms. The dataset includes info from the Istanbul stock exchange national 100 index, S&P 500, and MSCI. Furthermore, it includes the stock market return indexes of Brazil, Germany, Japan, and the UK.

3. News and Stock Data — Originally prepared for a deep learning and NLP class, this dataset was meant to be used for a binary classification task. News and Stock Data includes historical news headlines crawled from Reddit’s r/worldnews subreddit from June 8th, 2008 to July 1st, 2016. Additionally, it includes Dow Jones Industrial Average data from August 8th, 2008 to July 1st, 2016.

4. Stock Market from a High Level — This dataset includes historical stock market data from Dow Jones, NASDAQ, and S&P 500. The data is in a CSV file and includes information from 1977 to 2017.

5. Stock Market Turnover Ratio — This information comes from the Federal Reserve Bank of St. Louis. The dataset contains data about the total value of shares traded during certain time periods versus the average market capitalization for that period.

6. Uniqlo Stock Price Prediction — The previous items on this list featured general stock market data. However, this dataset focuses solely on a single company, Uniqlo. One of the largest clothing retailers in Japan, Uniqlo has been around for over five decades. This dataset includes the stock information for the company from 2012 to 2016.

National Currencies and Cryptocurrency Datasets

7. CoinMarketCap Dataset — With the rise of cryptocurrency around the world, more and more people are looking to invest in it. CoinMarketCap is a market analysis website that provides information on thousands of cryptocurrencies. This dataset includes information taken from CoinMarketCap with the following columns: date, symbol, open, high, low, close, volume, and market cap.

8. Currency Exchange Rates — This dataset includes information about the daily currency exchange rates reported to the International Monetary Fund. Furthermore, the data contains info on 51 currencies from January 1st, 1995 to November 4th, 2018.

9. Daily Prices for All Cryptocurrencies — This is a large dataset including historical price data for all cryptocurrencies on the market. The data ranges from April 28th, 2013 to November 30th, 2018. Furthermore, it includes the following information: coin names, date, rank, close ratio, and spread.

10. Free Forex Data — From Histdata.com, this dataset resource provides free Forex data for multiple currencies. The data is available for the following applications/platforms: General ASCII, MetaStock, MetaTrader, Microsoft Excel, and Ninja Trader.