Skip to main content

Python Stock Scraper

A basic HTML scraper in Python for stock prices that are only available via website. (i.e. not available via free API).


Start with ...

  • Raspbian - has Python and the Thonny IDE pre-installed.
  • Ubuntu - Install Thonny from the Software store, or enter 'sudo apt-get install thonny'.
  • Windows - don't bother. If you can get Python working under Windows then you're a masochist/God who doesn't need to read this post.

Install packages in Thonny ...

After launching Thonny, go to 'Tools - Manage Packages', and install:
  1. requests
  2. lxml

Analyse the Website

Free Japan Exchange (JPX) information through APIs (JSON or otherwise) is hard to come by. Our aim today is to scrape prices from Yahoo Japan. Here's the information for stock code 8306, MUFJ Financial Group (三菱UFJフィナンシャル・グループ).
Notice that the URL takes the form [https://stocks.finance.yahoo.co.jp/stocks/detail/?code=8306.T].
  • Right-click on the price and select 'Inspect Element'.
  • Notice that the price is in a table cell with the class 'stoksPrice'.

We need Python to:
  1. Take a stock code;
  2. Grab the webpage for that stock code. (requests);
  3. Parse the (poorly formed) markup into an XML tree. (lxml);
  4. Extract the data from the element with class 'stoksPrice' using an XPATH query;
  5. Convert the data into a number according to the locale. (locale.atof).

My Code ...


import requests
from lxml import html
import locale
from locale import atof
import csv

locale.setlocale(locale.LC_ALL, 'en_AU.UTF-8')

def yahooLookup(code):
    url = 'https://stocks.finance.yahoo.co.jp/stocks/detail/?code=' + code + '.T'
    pageContent=requests.get(url)
    tree=html.fromstring(pageContent.content)
    pricestring=tree.xpath('string(//td[@class="stoksPrice"])')
    try:
        return atof(pricestring)
    except ValueError:
        print('pricestring error:', code, pricestring)
        return '0'

with open('YahooJP.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row['Code'], row['Name'], yahooLookup(row['Code']))


Don't just cut, paste, and run! The Python script looks for a 'YahooJP.csv' file in the same directory, formatted so:

Code Name
8411 Mizuho Financial Group
8604 Nomura Holdings
9432 NTT

Run with the file present, output will look like:


8411 Mizuho Financial Group 179.3
8604 Nomura Holdings 461.2
9432 NTT 4696.0 

You can adapt this code for use with other finance websites!

What if I don't want Thonny?

This script runs in Python3 which is preinstalled on Ubuntu. You don't need an IDE like Thonny. In order to run the script we must:
  1. Install the 'pip' package manager.
  2. Use pip to install the lxml and requests packages.
sudo apt install python3-pip
sudo pip3 install lxml
sudo pip3 install requests
Now run the script with:
python3 YahooJP.py

Notes

This is my first Python project and I am astounded at:
  • How Python doubles as a shell language.
  • The ease with which packages can be imported to add HTTP, XML, XPATH, and CSV support.
  • How fast it runs, even on a Raspberry Pi.

Comments

Popular posts from this blog

Transcode to PSP using Handbrake

Source: Handbrake 0.9.9.5530 64-bit edition Target: (Phat) Playstation Portable PSP-1000 , System Software: 6.60 Many internet articles on how to transcode video to PSP using Handbrake have not worked for me. Even the most helpful are incomplete. I hope this post will help fill in the blanks. There is no longer any PSP preset for Handbrake, but from what I can gather, the preset had only limited success as the x264 encoder would change syntax and settings between versions. Other presets that may have worked before, like 'iPod' and 'Apple-Universal' now do not. Here is what worked for me, step by step:

Scatterbox - build an Android Tor Socks Proxy Server

Cloak your location and create a firewall bypass device with a smartphone. 🕵Uses the Tor network . Does not require root. 1 - from Google Play, download and install: Orbot Orweb browser Socks Server Ultimate (Optional)

Dismissing Racism

Whenever white people kill people of colour, as in 2021's anti-Asian shootings in Atlanta Georgia , this sort of counter-commentary appears: "Since the killing of six Asian women who worked in massage parlors in Atlanta, the media has amplified the false narrative that “white supremacy” is to blame.  ... official crime stats show that white people are significantly underrepresented in terms of the violent crime threat they pose to Asians."  ... citing FBI statistics , whereas whites comprise 62% of the population, they committed 24% of crimes against Asians in 2018.  In comparison, blacks, who comprise 13% of the population, committed 27.5% of all violent crimes against Asian Americans in 2018.  So clearly, white people do not represent the biggest crime threat to Asian Americans." Not only is this an attack on the media and its imagined agenda, it also implies that Asians can't tell who's assaulting them.