A basic HTML scraper in Python for stock prices that are only available via website. (i.e. not available via free API).
Notice that the URL takes the form [https://stocks.finance.yahoo.co.jp/stocks/detail/?code=8306.T].
We need Python to:
Don't just cut, paste, and run! The Python script looks for a 'YahooJP.csv' file in the same directory, formatted so:
Run with the file present, output will look like:
Start with ...
- Raspbian - has Python and the Thonny IDE pre-installed.
- Ubuntu - Install Thonny from the Software store, or enter 'sudo apt-get install thonny'.
- Windows - don't bother. If you can get Python working under Windows then you're a masochist/God who doesn't need to read this post.
Install packages in Thonny ...
After launching Thonny, go to 'Tools - Manage Packages', and install:- requests
- lxml
Analyse the Website
Free Japan Exchange (JPX) information through APIs (JSON or otherwise) is hard to come by. Our aim today is to scrape prices from Yahoo Japan. Here's the information for stock code 8306, MUFJ Financial Group (三菱UFJフィナンシャル・グループ).Notice that the URL takes the form [https://stocks.finance.yahoo.co.jp/stocks/detail/?code=8306.T].
- Right-click on the price and select 'Inspect Element'.
- Notice that the price is in a table cell with the class 'stoksPrice'.
We need Python to:
- Take a stock code;
- Grab the webpage for that stock code. (requests);
- Parse the (poorly formed) markup into an XML tree. (lxml);
- Extract the data from the element with class 'stoksPrice' using an XPATH query;
- Convert the data into a number according to the locale. (locale.atof).
My Code ...
import requests from lxml import html import locale from locale import atof import csv locale.setlocale(locale.LC_ALL, 'en_AU.UTF-8') def yahooLookup(code): url = 'https://stocks.finance.yahoo.co.jp/stocks/detail/?code=' + code + '.T' pageContent=requests.get(url) tree=html.fromstring(pageContent.content) pricestring=tree.xpath('string(//td[@class="stoksPrice"])') try: return atof(pricestring) except ValueError: print('pricestring error:', code, pricestring) return '0' with open('YahooJP.csv', newline='') as csvfile: reader = csv.DictReader(csvfile) for row in reader: print(row['Code'], row['Name'], yahooLookup(row['Code']))
Don't just cut, paste, and run! The Python script looks for a 'YahooJP.csv' file in the same directory, formatted so:
Code | Name |
8411 | Mizuho Financial Group |
8604 | Nomura Holdings |
9432 | NTT |
Run with the file present, output will look like:
8411 Mizuho Financial Group 179.3 8604 Nomura Holdings 461.2 9432 NTT 4696.0
You can adapt this code for use with other finance websites!
What if I don't want Thonny?
This script runs in Python3 which is preinstalled on Ubuntu. You don't need an IDE like Thonny. In order to run the script we must:
- Install the 'pip' package manager.
- Use pip to install the lxml and requests packages.
sudo apt install python3-pip
sudo pip3 install lxml
sudo pip3 install requests
Now run the script with:
python3 YahooJP.py
Notes
This is my first Python project and I am astounded at:- How Python doubles as a shell language.
- The ease with which packages can be imported to add HTTP, XML, XPATH, and CSV support.
- How fast it runs, even on a Raspberry Pi.
Comments
Post a Comment