-
Notifications
You must be signed in to change notification settings - Fork 4
Home
The core of this app is developed mostly in Python3, but leverages some additional code services that require familiarity with other development languages. Below is a discussion of the details surrounding the code and some key info on dependencies...
Please note: As described, I am NOTa hard-core Dev/Eng Ninja. I am a Product Strategy leader who knows how to code. Please don't critique my code design, architecture or quality based on the former. - Much appreciated in advance.
This is a Python3 app. It runs from the shell (I develop in both Linux & Windows 10, so it runs well in both envs, but I prefer and do most testing in Linux). - How to run the code...
Looking for screenshots? Here's some examples of the code running on LINUX cmd-line...here
-
Core code - Python3
- I've structured the code in a pure OOP architecture. (I'm a Product Leader & not a Dev/Engineer, so its not Google Eng/Dev production quality Python3 OOP code, but it stays true to the OOP paradigm (i.e. Classes, Instances, Class Methods/Attributes, Inheritance etc).
-
Data Science code - Pandas and NumPy (API for Python3)
- The basic Pandas DataFrame API code is pretty simple Python code, but selecting & manipulating data within DataFrames is all Pandas and NumPy native code and will/does-not look like Python3 code to anyone who doesn't know Pandas & NumPy.
-
Database injection & CRUD logic code - MongoDB (API for Python3)
- Although the Python3 MongoDB API is 100% Python, once the connection to the MongoDB database is live the real code being executed is almost exclusively Mongo JSON Document Query Language. (which is not Python3 at all).
-
Fast HTML data scraping - BeautifulSoup (bs4) for Python3
- Although HTML isn't a coding language (as per Python3), working with bs4 requires significant familiarly with HTML, HTML Doc Structure/tree/tag/objects/attributes etc. - There's just no getting around this (which is why I hate HTML doc scraping)...but sometimes you cant avoid it and it's just the only way to get to the raw data that you desire.
-
ML and AI capabilities is via Scikit-learn (sklearn) Python API.
- The sklearn code leverages a data corpora in support of countvectorizer stopwords logic from the Natural Language Toolkit (https://www.nltk.org). The stopwords.words("english") corpus is a normal ML supporting data entity in ML code/logic). The English Stopwords corpus data set MUST be loaded onto your file system. The Python3 interpreter functions for sklearn that leverage the nltk stopwords corpora must be able to find/access/read that corpus dataset during the Python3 interpreter's code pre-processing phase. Otherwise Python3 will complain and error-out before any real code executes. (see Dataset #70 here: http://www.nltk.org/nltk_data/). That code looks like this...
`from nltk.corpus import stopwords'
'sw = stopwords.words("english")'
- Since the ML/AI code is new & in heavy dev, you can comment it out if you cant figure out the nltk.corpus data-set download/install procedure. (see here: https://www.nltk.org/data.html).
- The sklearn code leverages a data corpora in support of countvectorizer stopwords logic from the Natural Language Toolkit (https://www.nltk.org). The stopwords.words("english") corpus is a normal ML supporting data entity in ML code/logic). The English Stopwords corpus data set MUST be loaded onto your file system. The Python3 interpreter functions for sklearn that leverage the nltk stopwords corpora must be able to find/access/read that corpus dataset during the Python3 interpreter's code pre-processing phase. Otherwise Python3 will complain and error-out before any real code executes. (see Dataset #70 here: http://www.nltk.org/nltk_data/). That code looks like this...
-
Realtime exchange market data feeds - Leverages the V2 Alpaca API.
- Alpaca is a great API-1st stock Market data service (FINRA registered) designed specifically to handle heavy volume financial markets time series data, trading portfolios & algorithmic trade execution. (https://alpaca.markets/docs/about-us/).
- Alpaca supports multiple language API's, of which I'm implementing the Python3 API. (https://alpaca.markets/docs/). The Alpaca API language is relatively easy and quite Pythonic, but not python perse. So you need to learn their data manipulation language & scheme. The Alpaca API semantics can be a odd at times & some API functions are poorly documented (annoyingly). Being familiar with Alpaca would be helpful if you wish to focus on Real-time data/trading beyond scraping of exchange-delayed data via bs4. My Alpaca code is in the early phases of dev and I'm still deciding where/how I want to augment my overall Application design with real-time Alpaca market data.
- Note: You can't get everything from Alpaca. Generally it's very difficult to get access to FREE live streaming, realtime stock market ticker data (from any MD providers). I've tried & you need to pay lots of $$ for this type for MD feed. Alpaca is a good cost/capability compromise (free) & is built by real Silicon Valley coders.
- In order for the Alpaca code to work, you'll WILL need to register an account with Alpaca (which may be difficult for non-USA citizens). My Alpaca API account key's have been removed from the code, and invalidated.
Regards,
~Orville