I thought it would be cool to be able to run SQL queries on my banking data. RBC supposedly has an API, but I tried signing up months ago and never got an account:
The next best thing seemed to be downloading my transaction history in .csv format from RBC online banking, but the data only goes back a couple months.
I wanted to be able to run queries on all transactions since I set up the account (how much money have I spent on sushi?) so that was no good.
This tool was my last resort: parse PDF e-statements. It works for savings, chequing, and visa accounts. The chequing/savings statements are the same format and the visa ones are different. I expect the chequing/savings parser to work for everyone, but I'm not sure if different visa types will work because the statements might be formatted differently. If you want to create a PR to support a different style of account/statement that'd be cool.
Disclaimer: The data that this tool spits out might not be accurate.
The parsers check the transactions they find against the opening/closing balances reported on the e-statements and will fail out if they notice a discrepancy, so the data is probably good if nothing fails when you run the tool.
An idea for an extension of this project would be to automate downloading/importing the .csv reports from RBC online every X days. Parsing transaction data from the .csvs would be much more robust and might as well replace pdf parsing once you have your historical data.
- Clone this repo
git clone git@github.com:bnwlkr/Teller.git
cd Teller
- (optional, recommended) Use a virtualenv
python3 -m venv venv
source venv/bin/activate
- Install dependencies
(venv) pip install -r requirements.txt
A Java runtime is required to run tabula-py
, which you can download it here.
- Download all your e-statements (v boring)
- Put the downloaded pdfs into the
statements/savings
,statements/chequing
,statements/visa
directories.
Note: You can put the statements anywhere on your computer as long as the statements' parent directories are called chequing
, savings
, visa
.
Future improvement: detect account type. Directory structure looks like this:
statements
βββ chequing
βΒ Β βββ XXXXXXXXX-2020May25-2020Jun25.pdf
| ...
βββ savings
βΒ Β βββ XXXXXXXXX-2020May25-2020Jun25.pdf
| ...
βββ visa
βββ XXXXXXXXX-2020May12-2020Jun10.pdf
...
The tool uses the names of the files to ascertain some date info so pls don't change them.
- Run it!
(venv) python teller.py -d statements teller.db
If you put the statements somewhere else, specify the path to their parent directory with the -d
option.
After a while, teller.db
, a sqlite3 database file, will contain all the transaction data. You can just leave the data there, and later add new statements and rerun with the same .db file - the tool will manage uniqueness of transactions in the database (duplicate files are fine). I recommend rerunning later with only new statements to save time.
You can use the sqlite3 CLI to run queries, but I recommend using DB Browser for SQLite.
Now you can have fun running queries and feeling bad about your spending habits. For example:
SELECT sum(amount) FROM transactions WHERE description LIKE '%Dunbar Sushi%'
P.S. If you know a way to get all RBC transaction data in a computer-readable format instead of this nonsense, just don't tell me.
- How much interest have I earned in my savings account?
SELECT sum(amount) FROM transactions WHERE account_type='savings' AND description LIKE '%interest%'
- What's the biggest purchase I've ever made on my credit card?
SELECT datetime(timestamp, 'unixepoch'), MIN(amount), description FROM transactions WHERE account_type='visa'
- Add more examples
- Change from storing unix timestamps to datetime strings for ease of use
- For now you can see the date of a transaction in a more readable format by including
datetime(timestamp, 'unixepoch')
as a column in your query.
- For now you can see the date of a transaction in a more readable format by including
- Detect account type from statements instead of relying on directory structure
- Automations