-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PLAT-63: Fix examples; Overhaul documentation
This MR updates the format, content, and structure of the python api client documentation
- Loading branch information
1 parent
083e1fb
commit be404e6
Showing
139 changed files
with
25,657 additions
and
8,604 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -146,3 +146,4 @@ examples/*.txt | |
examples/*/*.txt | ||
|
||
.idea/ | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = source | ||
BUILDDIR = build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
:root { | ||
--md-primary-fg-color: #495070; | ||
--md-primary-fg-color--light: #FFFFFF; | ||
--md-primary-fg-color--dark: #161823; | ||
--md-typeset-a-color: #757CA1; | ||
|
||
} | ||
/* a:hover { | ||
text-decoration: underline; | ||
} */ | ||
/* | ||
a { | ||
color: #1E2130; | ||
text-decoration: none; | ||
} */ | ||
|
||
a.custom { | ||
color: var(--primary-color); | ||
text-decoration: underline; | ||
} |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# Best Practices | ||
|
||
## Parallel Execution | ||
There are times when it may be useful to pull in large amounts of data at once. The most effective way to do this | ||
when calling the CoinMetrics API is to split your request into many different queries. This functionality is now | ||
built into the API Client directly to allow for faster data export: | ||
|
||
```python | ||
import os | ||
from coinmetrics.api_client import CoinMetricsClient | ||
|
||
|
||
if __name__ == '__main__': | ||
client = CoinMetricsClient(os.environ['CM_API_KEY']) | ||
coinbase_eth_markets = [market['market'] for market in client.catalog_market_candles(exchange="coinbase", base="eth")] | ||
start_time = "2022-03-01" | ||
end_time = "2023-05-01" | ||
client.get_market_candles( | ||
markets=coinbase_eth_markets, | ||
start_time=start_time, | ||
end_time=end_time, | ||
page_size=1000 | ||
).parallel().export_to_json_files() | ||
``` | ||
|
||
This feature splits the request into multiple threads and either store them in separate files (in the case of `.parallel().export_to_csv_files()` and `.parallel().export_to_json_files`) | ||
or combine them all into one file or data structure (in the case of `.parallel().to_list()`, `.parallel().to_dataframe()`, | ||
`.parallel().export_to_json()`). It's important to know that in order to send more requests per second to the CoinMetrics | ||
this uses the [parallel tasks features in Python's concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) | ||
package. This means when using this feature, the API Client will use significantly more resources and may approach | ||
the [Coin Metrics rate limits](https://docs.python.org/3/library/concurrent.futures.html). | ||
|
||
In terms of resource usage and speed, these usages are in order from most performant to least: | ||
* `.export_to_json_files()` | ||
* `.export_to_csv_files()` | ||
* `.to_list()` | ||
* `.export_to_json()` | ||
* `.to_dataframe()` | ||
|
||
### Splitting Parameter Queries | ||
There is a feature `time_increment` that can be used to split a single query into many based on time range, and then | ||
combine them later. Consider this example where we speed up getting a 2 months worth of BTC ReferenceRateUSD data into | ||
many parallel threads to create a dataframe faster: | ||
```python | ||
import datetime | ||
import os | ||
from coinmetrics.api_client import CoinMetricsClient | ||
from dateutil.relativedelta import relativedelta | ||
client = CoinMetricsClient(os.environ.get("CM_API_KEY")) | ||
start_time = datetime.datetime.now() | ||
assets = ["btc", "eth", "sol"] | ||
if __name__ == '__main__': | ||
client.get_asset_metrics( | ||
assets=assets, | ||
metrics="ReferenceRateUSD", | ||
frequency="1m", | ||
start_time="2022-03-01", | ||
end_time="2023-03-01", | ||
page_size=1000, | ||
end_inclusive=False).parallel( | ||
time_increment=relativedelta(months=1)).export_to_csv("btcRRs.csv") | ||
print(f"Time taken parallel: {datetime.datetime.now() - start_time}") | ||
start_time = datetime.datetime.now() | ||
client.get_asset_metrics( | ||
assets=assets, | ||
metrics="ReferenceRateUSD", | ||
frequency="1m", | ||
start_time="2022-03-01", | ||
end_time="2023-03-01", | ||
page_size=1000, | ||
end_inclusive=False).export_to_csv("btcRRsNormal.csv") | ||
``` | ||
Notice we pass in the `time_increment=relativedelta(months=1)` so that means we will split the threads up by month, in | ||
addition to by asset. So this will run a total 36 separate threads, 12 threads for each month x 3 threads for each asset. | ||
The difference it takes in time is dramatic: | ||
```commandline | ||
Exporting to dataframe type: 100%|██████████| 36/36 [00:00<00:00, 54.62it/s] | ||
Time taken parallel: 0:00:36.654147 | ||
Time taken normal: 0:05:20.073826 | ||
``` | ||
|
||
Please note that for short time periods you can pass in a `time_increment` with `datetime.timedelta` to specify up to | ||
several weeks, for larger time frames you can use `dateutil.relativedelta.relativedelta` in order to split requests | ||
up by increments of months or years. | ||
|
||
|
||
## General Parallelization Guidelines | ||
* If you are using a small `page_size` and trying to export a large number amount of, this will be your biggest bottleneck. | ||
Generally the fastest `page_size` is `1000` to `10000` | ||
* If you are unsure why an action is taking a long time, running the CoinMetricsClient using `verbose=True` or `debug=True` | ||
can give better insight into what is happening under the hood | ||
* The parallel feature is best used when you are exporting a large amount of data, that can be split by query params into | ||
many smaller requests. A good example of this is market candles over a long time frame - if you are querying hundreds | ||
of markets and are sure there will be data, using `.parallel().export_to_csv_files("...")` can have a huge performance | ||
increase, if you are just querying a single market you will not see a difference | ||
* The parallel feature is highly configurable, there is several configuration options that may be suitable for advanced | ||
users like tweaking the `max_workers` parameter, or changing the default `ProcessPoolExecutor` to a `ThreadPoolExectuor` | ||
* Using multithreaded code is inherently more complex, it will be harder to debug issues with long running queries | ||
when running parallel exports compared to normal single threaded code | ||
* For that reason, this tool is best suited for exporting historical data rather than supporting a real time production | ||
system. | ||
* The methods that create separate files for each thread will be the safest and most performant to use - `.export_to_csv_files()` | ||
and `.export_to_json_files()`. Using the methods that return a single output - `.export_to_csv()`, `export_to_list()`, and | ||
`.export_to_dataframe()` need to join the data from many threads before it can be returned, this may use a lot of memory | ||
if you are accessing data types like market orderbooks or market trades and could fail altogether | ||
* If using `export_to_csv/json_files()` functions, note that by default they will be saved in the directory format `/{endpoint}/{parallelize_on}`. | ||
For example, in `export_to_json_files()`, | ||
`client.get_market_trades("coinbase-eth-btc-spot,coinbase-eth-usdc-spot").parallel("markets")` will create a file each like ./market-trades/coinbase-eth-btc-spot.json, ./market-trades/coinbase-eth-usdc-spot.json | ||
`client.get_asset_metrics('btc,eth', 'ReferenceRateUSD', start_time='2024-01-01', limit_per_asset=1).parallel("assets,metrics", time_increment=timedelta(days=1))` | ||
will create a file each like ./asset-metrics/btc/ReferenceRateUSD/start_time=2024-01-01T00-00-00Z.json, ./asset-metrics/eth/ReferenceRateUSD/start_time=2024-01-01T00-00-00Z.json | ||
* If you get the error `BrokenProcessPool` it [might be because you're missing a main() function](https://stackoverflow.com/questions/15900366/all-example-concurrent-futures-code-is-failing-with-brokenprocesspool) |
Oops, something went wrong.