Python Stock Scraper

A basic HTML scraper in Python for stock prices that are only available via website. (i.e. not available via free API).

Start with ...

Raspbian - has Python and the Thonny IDE pre-installed.
Ubuntu - Install Thonny from the Software store, or enter 'sudo apt-get install thonny'.
Windows - don't bother. If you can get Python working under Windows then you're a masochist/God who doesn't need to read this post.

Install packages in Thonny ...

After launching Thonny, go to 'Tools - Manage Packages', and install:

requests
lxml

Analyse the Website

Free Japan Exchange (JPX) information through APIs (JSON or otherwise) is hard to come by. Our aim today is to scrape prices from Yahoo Japan. Here's the information for stock code 8306, MUFJ Financial Group (三菱ＵＦＪフィナンシャル・グループ).
Notice that the URL takes the form [https://stocks.finance.yahoo.co.jp/stocks/detail/?code=8306.T].

Right-click on the price and select 'Inspect Element'.
Notice that the price is in a table cell with the class 'stoksPrice'.

We need Python to:

Take a stock code;
Grab the webpage for that stock code. (requests);
Parse the (poorly formed) markup into an XML tree. (lxml);
Extract the data from the element with class 'stoksPrice' using an XPATH query;
Convert the data into a number according to the locale. (locale.atof).

My Code ...

import requests
from lxml import html
import locale
from locale import atof
import csv

locale.setlocale(locale.LC_ALL, 'en_AU.UTF-8')

def yahooLookup(code):
    url = 'https://stocks.finance.yahoo.co.jp/stocks/detail/?code=' + code + '.T'
    pageContent=requests.get(url)
    tree=html.fromstring(pageContent.content)
    pricestring=tree.xpath('string(//td[@class="stoksPrice"])')
    try:
        return atof(pricestring)
    except ValueError:
        print('pricestring error:', code, pricestring)
        return '0'

with open('YahooJP.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row['Code'], row['Name'], yahooLookup(row['Code']))

Don't just cut, paste, and run! The Python script looks for a 'YahooJP.csv' file in the same directory, formatted so:

Code	Name
8411	Mizuho Financial Group
8604	Nomura Holdings
9432	NTT

Run with the file present, output will look like:

8411 Mizuho Financial Group 179.3
8604 Nomura Holdings 461.2
9432 NTT 4696.0

You can adapt this code for use with other finance websites!

What if I don't want Thonny?

This script runs in Python3 which is preinstalled on Ubuntu. You don't need an IDE like Thonny. In order to run the script we must:

Install the 'pip' package manager.
Use pip to install the lxml and requests packages.

sudo apt install python3-pip

sudo pip3 install lxml

sudo pip3 install requests

Now run the script with:

python3 YahooJP.py

Notes

This is my first Python project and I am astounded at:

How Python doubles as a shell language.
The ease with which packages can be imported to add HTTP, XML, XPATH, and CSV support.
How fast it runs, even on a Raspberry Pi.

dtcwee: The Value of Nothing

Search This Blog