PLAT-63: Fix examples; Overhaul documentation

This MR updates the format, content, and structure of the python api client documentation
coinmetrics · Jan 8, 2025 · be404e6 · be404e6
1 parent 083e1fb
commit be404e6
Show file tree

Hide file tree

Showing 139 changed files with 25,657 additions and 8,604 deletions.
diff --git a/.gitignore b/.gitignore
@@ -146,3 +146,4 @@ examples/*.txt
 examples/*/*.txt
 
 .idea/
+.DS_Store
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
@@ -96,10 +96,11 @@ generate-docs:
     - echo "update version:"
     - echo $UPDATE_VERSION
     - bash update_version.sh $UPDATE_VERSION
-    - export PYTHONPATH=/ && pydoc-markdown -m coinmetrics.api_client > docs/docs/api_client.md
+    - export PYTHONPATH=/ && pydoc-markdown -m coinmetrics.api_client > docs/docs/reference/api_client.md
     - cp -f README.md docs/docs/index.md
     - cp -f FlatFilesExport.md docs/docs/FlatFilesExport.md
     - cp -f CHANGELOG.md docs/docs/CHANGELOG.md
+    - cp -f examples/README.md docs/docs/user-guide/examples.md
     - cd docs && mkdocs build
     - git add --all -- :!api-client-python/
     - git status

diff --git a/README.md b/README.md
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/docs/assets/images/cm-dark-combination.png b/docs/docs/assets/images/cm-dark-combination.png
diff --git a/docs/docs/assets/images/cm-dark-combination@2x.png b/docs/docs/assets/images/cm-dark-combination@2x.png
diff --git a/docs/docs/index.md b/docs/docs/index.md
diff --git a/docs/docs/api_client.md → docs/docs/reference/api_client.md b/docs/docs/api_client.md → docs/docs/reference/api_client.md
diff --git a/docs/docs/CHANGELOG.md → docs/docs/releases/CHANGELOG.md b/docs/docs/CHANGELOG.md → docs/docs/releases/CHANGELOG.md
diff --git a/docs/docs/stylesheets/extra.css b/docs/docs/stylesheets/extra.css
@@ -0,0 +1,20 @@
+:root {
+  --md-primary-fg-color:        #495070;
+  --md-primary-fg-color--light: #FFFFFF;
+  --md-primary-fg-color--dark:  #161823;
+  --md-typeset-a-color: #757CA1;
+
+}
+/* a:hover {
+  text-decoration: underline;
+} */
+/* 
+a {
+  color: #1E2130;
+  text-decoration: none;
+} */
+
+a.custom {
+  color: var(--primary-color);
+  text-decoration: underline;
+}
diff --git a/docs/docs/FlatFilesExport.md → docs/docs/tools/FlatFilesExport.md b/docs/docs/FlatFilesExport.md → docs/docs/tools/FlatFilesExport.md
diff --git a/docs/docs/user-guide/best-practices.md b/docs/docs/user-guide/best-practices.md
@@ -0,0 +1,111 @@
+# Best Practices
+
+## Parallel Execution
+There are times when it may be useful to pull in large amounts of data at once. The most effective way to do this 
+when calling the CoinMetrics API is to split your request into many different queries. This functionality is now 
+built into the API Client directly to allow for faster data export:
+
+```python
+import os
+from coinmetrics.api_client import CoinMetricsClient
+
+
+if __name__ == '__main__':
+    client = CoinMetricsClient(os.environ['CM_API_KEY'])
+    coinbase_eth_markets = [market['market'] for market in client.catalog_market_candles(exchange="coinbase", base="eth")]
+    start_time = "2022-03-01"
+    end_time = "2023-05-01"
+    client.get_market_candles(
+      markets=coinbase_eth_markets,
+      start_time=start_time,
+      end_time=end_time,
+      page_size=1000
+    ).parallel().export_to_json_files()
+```
+
+This feature splits the request into multiple threads and either store them in separate files (in the case of `.parallel().export_to_csv_files()` and `.parallel().export_to_json_files`)
+or combine them all into one file or data structure (in the case of `.parallel().to_list()`, `.parallel().to_dataframe()`,
+`.parallel().export_to_json()`). It's important to know that in order to send more requests per second to the CoinMetrics
+this uses the [parallel tasks features in Python's concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html)
+package. This means when using this feature, the API Client will use significantly more resources and may approach
+the [Coin Metrics rate limits](https://docs.python.org/3/library/concurrent.futures.html). 
+
+In terms of resource usage and speed, these usages are in order from most performant to least:
+* `.export_to_json_files()`
+* `.export_to_csv_files()`
+* `.to_list()`
+* `.export_to_json()`
+* `.to_dataframe()`
+
+### Splitting Parameter Queries
+There is a feature `time_increment` that can be used to split a single query into many based on time range, and then 
+combine them later. Consider this example where we speed up getting a 2 months worth of BTC ReferenceRateUSD data into
+many parallel threads to create a dataframe faster: 
+```python
+import datetime
+import os
+from coinmetrics.api_client import CoinMetricsClient
+from dateutil.relativedelta import relativedelta
+client = CoinMetricsClient(os.environ.get("CM_API_KEY"))
+start_time = datetime.datetime.now()
+assets = ["btc", "eth", "sol"]
+if __name__ == '__main__':
+    client.get_asset_metrics(
+        assets=assets,
+        metrics="ReferenceRateUSD",
+        frequency="1m",
+        start_time="2022-03-01",
+        end_time="2023-03-01",
+        page_size=1000,
+        end_inclusive=False).parallel(
+        time_increment=relativedelta(months=1)).export_to_csv("btcRRs.csv")
+    print(f"Time taken parallel: {datetime.datetime.now() - start_time}")
+    start_time = datetime.datetime.now()
+    client.get_asset_metrics(
+        assets=assets,
+        metrics="ReferenceRateUSD",
+        frequency="1m",
+        start_time="2022-03-01",
+        end_time="2023-03-01",
+        page_size=1000,
+        end_inclusive=False).export_to_csv("btcRRsNormal.csv")
+```
+Notice we pass in the `time_increment=relativedelta(months=1)` so that means we will split the threads up by month, in
+addition to by asset. So this will run a total 36 separate threads, 12 threads for each month x 3 threads for each asset.
+The difference it takes in time is dramatic:
+```commandline
+Exporting to dataframe type: 100%|██████████| 36/36 [00:00<00:00, 54.62it/s]
+Time taken parallel: 0:00:36.654147
+Time taken normal: 0:05:20.073826
+```
+
+Please note that for short time periods you can pass in a `time_increment` with `datetime.timedelta` to specify up to 
+several weeks, for larger time frames you can use `dateutil.relativedelta.relativedelta` in order to split requests
+up by increments of months or years.
+
+
+## General Parallelization Guidelines
+* If you are using a small `page_size` and trying to export a large number amount of, this will be your biggest bottleneck.
+Generally the fastest `page_size` is `1000` to `10000`
+* If you are unsure why an action is taking a long time, running the CoinMetricsClient using `verbose=True` or `debug=True`
+can give better insight into what is happening under the hood
+* The parallel feature is best used when you are exporting a large amount of data, that can be split by query params into 
+many smaller requests. A good example of this is market candles over a long time frame - if you are querying hundreds
+of markets and are sure there will be data, using `.parallel().export_to_csv_files("...")` can have a huge performance 
+increase, if you are just querying a single market you will not see a difference
+* The parallel feature is highly configurable, there is several configuration options that may be suitable for advanced
+users like tweaking the `max_workers` parameter, or changing the default `ProcessPoolExecutor` to a `ThreadPoolExectuor`
+* Using multithreaded code is inherently more complex, it will be harder to debug issues with long running queries 
+when running parallel exports compared to normal single threaded code
+* For that reason, this tool is best suited for exporting historical data rather than supporting a real time production
+system.
+* The methods that create separate files for each thread will be the safest and most performant to use - `.export_to_csv_files()`
+and `.export_to_json_files()`. Using the methods that return a single output - `.export_to_csv()`, `export_to_list()`, and
+`.export_to_dataframe()` need to join the data from many threads before it can be returned, this may use a lot of memory
+if you are accessing data types like market orderbooks or market trades and could fail altogether
+* If using `export_to_csv/json_files()` functions, note that by default they will be saved in the directory format `/{endpoint}/{parallelize_on}`.
+For example, in `export_to_json_files()`, 
+`client.get_market_trades("coinbase-eth-btc-spot,coinbase-eth-usdc-spot").parallel("markets")` will create a file each like ./market-trades/coinbase-eth-btc-spot.json, ./market-trades/coinbase-eth-usdc-spot.json
+`client.get_asset_metrics('btc,eth', 'ReferenceRateUSD', start_time='2024-01-01', limit_per_asset=1).parallel("assets,metrics", time_increment=timedelta(days=1))`
+will create a file each like ./asset-metrics/btc/ReferenceRateUSD/start_time=2024-01-01T00-00-00Z.json, ./asset-metrics/eth/ReferenceRateUSD/start_time=2024-01-01T00-00-00Z.json
+* If you get the error `BrokenProcessPool` it [might be because you're missing a main() function](https://stackoverflow.com/questions/15900366/all-example-concurrent-futures-code-is-failing-with-brokenprocesspool)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -146,3 +146,4 @@ examples/*.txt
		examples//.txt

		.idea/
		.DS_Store