Skip to main content

Python Stock Scraper

A basic HTML scraper in Python for stock prices that are only available via website. (i.e. not available via free API).


Start with ...

  • Raspbian - has Python and the Thonny IDE pre-installed.
  • Ubuntu - Install Thonny from the Software store, or enter 'sudo apt-get install thonny'.
  • Windows - don't bother. If you can get Python working under Windows then you're a masochist/God who doesn't need to read this post.

Install packages in Thonny ...

After launching Thonny, go to 'Tools - Manage Packages', and install:
  1. requests
  2. lxml

Analyse the Website

Free Japan Exchange (JPX) information through APIs (JSON or otherwise) is hard to come by. Our aim today is to scrape prices from Yahoo Japan. Here's the information for stock code 8306, MUFJ Financial Group (三菱UFJフィナンシャル・グループ).
Notice that the URL takes the form [https://stocks.finance.yahoo.co.jp/stocks/detail/?code=8306.T].
  • Right-click on the price and select 'Inspect Element'.
  • Notice that the price is in a table cell with the class 'stoksPrice'.

We need Python to:
  1. Take a stock code;
  2. Grab the webpage for that stock code. (requests);
  3. Parse the (poorly formed) markup into an XML tree. (lxml);
  4. Extract the data from the element with class 'stoksPrice' using an XPATH query;
  5. Convert the data into a number according to the locale. (locale.atof).

My Code ...


import requests
from lxml import html
import locale
from locale import atof
import csv

locale.setlocale(locale.LC_ALL, 'en_AU.UTF-8')

def yahooLookup(code):
    url = 'https://stocks.finance.yahoo.co.jp/stocks/detail/?code=' + code + '.T'
    pageContent=requests.get(url)
    tree=html.fromstring(pageContent.content)
    pricestring=tree.xpath('string(//td[@class="stoksPrice"])')
    try:
        return atof(pricestring)
    except ValueError:
        print('pricestring error:', code, pricestring)
        return '0'

with open('YahooJP.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row['Code'], row['Name'], yahooLookup(row['Code']))


Don't just cut, paste, and run! The Python script looks for a 'YahooJP.csv' file in the same directory, formatted so:

Code Name
8411 Mizuho Financial Group
8604 Nomura Holdings
9432 NTT

Run with the file present, output will look like:


8411 Mizuho Financial Group 179.3
8604 Nomura Holdings 461.2
9432 NTT 4696.0 

You can adapt this code for use with other finance websites!

What if I don't want Thonny?

This script runs in Python3 which is preinstalled on Ubuntu. You don't need an IDE like Thonny. In order to run the script we must:
  1. Install the 'pip' package manager.
  2. Use pip to install the lxml and requests packages.
sudo apt install python3-pip
sudo pip3 install lxml
sudo pip3 install requests
Now run the script with:
python3 YahooJP.py

Notes

This is my first Python project and I am astounded at:
  • How Python doubles as a shell language.
  • The ease with which packages can be imported to add HTTP, XML, XPATH, and CSV support.
  • How fast it runs, even on a Raspberry Pi.

Comments

Popular posts from this blog

Transcode to PSP using Handbrake

Source: Handbrake 0.9.9.5530 64-bit edition Target: (Phat) Playstation Portable PSP-1000 , System Software: 6.60 Many internet articles on how to transcode video to PSP using Handbrake have not worked for me. Even the most helpful are incomplete. I hope this post will help fill in the blanks. There is no longer any PSP preset for Handbrake, but from what I can gather, the preset had only limited success as the x264 encoder would change syntax and settings between versions. Other presets that may have worked before, like 'iPod' and 'Apple-Universal' now do not. Here is what worked for me, step by step:

Firefox History Statistics - Extracting from Places.sqlite

If you want to take a look at Firefox surfing activity, the about:me add-on is a good start. However, it presents only one view of data and is thus limited in its ability to present more detailed statistics. We will view that data in a different program. So let's first extract it from the browsing history stored in the Places.sqlite file into a CSV file using a Firefox add-on. Step 1 - Locate and copy Places.sqlite to a working location On Windows machines, Places.sqlite is found in a directory similar to: C:\Users\User1\AppData\Roaming\Mozilla\Firefox\Profiles\ .default\places.sqlite Copy the file to another location. The database will be locked while using Firefox, and the SQLite plugin we will use to open it.

Bloomberg JSON data into Libreoffice Calc

LibreOffice Calc has no inbuilt stock market functions, and a popular plugin which offered those has stopped working along with changes to Yahoo Finance. Luckily, we can get the latest quotes from Bloomberg. [2018-12-15] Bloomberg Finance is, understandably, blocking multiple simultaneous requests. A more flexible solution is using a Python Stock Scraper .