diff --git a/docs/articles/improve-rag-with-raptor.md b/docs/articles/improve-rag-with-raptor.md index c4dcb836d..41baeaca2 100644 --- a/docs/articles/improve-rag-with-raptor.md +++ b/docs/articles/improve-rag-with-raptor.md @@ -1,5 +1,6 @@ # Improving RAG with RAPTOR + Traditional [RAG](https://superlinked.com/vectorhub/articles/retrieval-augmented-generation) setups commonly split documents into fixed-size chunks. But this creates problems. If key concepts span multiple chunks, the embeddings can lose the semantic coherence of the original text. LLM queries that retrieve single chunks frequently _miss_ their relationship to crucial pieces of information buried inside other chunks. This leads to incomplete or misleading responses. **Because its chunk embeddings lack any weighting or hierarchical structure, traditional RAG's flat retrieval returns results based only on similarity or relevance scores. Key insights are often lost.** So, **is there a way of getting our embeddings to preserve the relationships and hierarchical structure that exists within source documents, so that our retrieval can surface key insights, and do it efficiently**? @@ -450,7 +451,7 @@ RAPTOR has two distinct strategies for querying the RAPTOR tree: tree traversal If our query demanded complex multi-level reasoning, and a contextually rich and precise result, it would make sense to use tree traversal. But for specific queries requiring specific factual information - like our financial news query, we want to be able to directly compare our query embedding with the vector embeddings of all nodes (both leaf and summary), efficiently bypassing RAPTOR's hierarchical structure and going straight to the most relevant data points. -But even though the collapsed tree method's retrieval bypasses the RAPTOR tree's hierarchy, it still capitalizes on the RAPTOR tree's hierarchical encapsulation of meaning to retrieve context. Because the collapsed tree method treats summarized nodes from higher levels simply as additional (same level) chunks, we can pull in higher-level summaries (the global perspective) alongside granular details with just one pass. We want our retrieval to get both an overall perspective and pinpoint very specific details of a particular company's financial quarter. +But even though the collapsed tree method's retrieval bypasses the RAPTOR tree's hierarchy, it still capitalizes on the RAPTOR tree's hierarchical encapsulation of meaning to retrieve context. Because the collapsed tree method treats summarized nodes from higher levels simply as additional (same level) chunks, we can pull in higher-level summaries (the global perspective) alongside granular details in just one pass. We want our retrieval to get both an overall perspective and pinpoint very specific details of a particular company's financial quarter. For our purposes, the collapsed tree method is a better fit than tree traversal. diff --git a/docs/assets/use_cases/improve-rag-with-raptor/raptor_with_rag.ipynb b/docs/assets/use_cases/improve-rag-with-raptor/raptor_with_rag.ipynb new file mode 100644 index 000000000..0f8c303a7 --- /dev/null +++ b/docs/assets/use_cases/improve-rag-with-raptor/raptor_with_rag.ipynb @@ -0,0 +1,4933 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "be703a5360454336827d77474de87803": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ec15c209ef564b899b2467902bb7e9d5", + "IPY_MODEL_211f483b55b84c2eb656bf770ef0e4d4", + "IPY_MODEL_224c78f30362472ab4a68f4d80aa7491" + ], + "layout": "IPY_MODEL_303e08d66f9e43e6afe3e5b75a23bdd9" + } + }, + "ec15c209ef564b899b2467902bb7e9d5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_67b93205f7e3426abc804282d377cb0c", + "placeholder": "​", + "style": "IPY_MODEL_48c5b440c4244a2aa287b630d4f1fc27", + "value": "modules.json: 100%" + } + }, + "211f483b55b84c2eb656bf770ef0e4d4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_070eb5d597104ec59a1c5b32a791a50b", + "max": 349, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_584db1dfb9f04c0db7b3b405027ca5a8", + "value": 349 + } + }, + "224c78f30362472ab4a68f4d80aa7491": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_32dc8c442c8f4c708b023b13e15cd4f2", + "placeholder": "​", + "style": "IPY_MODEL_9e670b17e23d43dca51947973397a24d", + "value": " 349/349 [00:00<00:00, 4.58kB/s]" + } + }, + "303e08d66f9e43e6afe3e5b75a23bdd9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "67b93205f7e3426abc804282d377cb0c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "48c5b440c4244a2aa287b630d4f1fc27": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "070eb5d597104ec59a1c5b32a791a50b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "584db1dfb9f04c0db7b3b405027ca5a8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "32dc8c442c8f4c708b023b13e15cd4f2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9e670b17e23d43dca51947973397a24d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "936efe41e7324e10a759dc261a5a8f6f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_137e355e929b49cf905b22031b0dda0a", + "IPY_MODEL_0ab443d03e3a469dbc47d2ff5e48f905", + "IPY_MODEL_9b6e614242414230bf2078381564a84a" + ], + "layout": "IPY_MODEL_671077413ff74589a2b89df8a88bf34f" + } + }, + "137e355e929b49cf905b22031b0dda0a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b773d7756d16450f8a90c28fa2a18c35", + "placeholder": "​", + "style": "IPY_MODEL_2db0e65ddf604f15bbac4c8bd6b0973e", + "value": "config_sentence_transformers.json: 100%" + } + }, + "0ab443d03e3a469dbc47d2ff5e48f905": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_d553db695b4f47dead593c0f38b291d2", + "max": 116, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_b83daf31f0fd48a7afb2d5ef85923fed", + "value": 116 + } + }, + "9b6e614242414230bf2078381564a84a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_b99bbfeb85a74d64ad93afcb895b82dd", + "placeholder": "​", + "style": "IPY_MODEL_560e32d903044ff380d3fb7c5bdb4b31", + "value": " 116/116 [00:00<00:00, 1.85kB/s]" + } + }, + "671077413ff74589a2b89df8a88bf34f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b773d7756d16450f8a90c28fa2a18c35": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2db0e65ddf604f15bbac4c8bd6b0973e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d553db695b4f47dead593c0f38b291d2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b83daf31f0fd48a7afb2d5ef85923fed": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "b99bbfeb85a74d64ad93afcb895b82dd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "560e32d903044ff380d3fb7c5bdb4b31": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4dc9c4c82af54bceb26cbc94ea4819c5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_016d29542c2f497face705bcf140cf8c", + "IPY_MODEL_669bc49d44694ff8a74cf31232ac7665", + "IPY_MODEL_3ab7332a28b24ef2ac6547df38555145" + ], + "layout": "IPY_MODEL_c66544f937b647b99520d6e141a34f1e" + } + }, + "016d29542c2f497face705bcf140cf8c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9e153049c238446d932e31ff7e8803da", + "placeholder": "​", + "style": "IPY_MODEL_5dfa402c6e184230b2924094f919ff54", + "value": "README.md: 100%" + } + }, + "669bc49d44694ff8a74cf31232ac7665": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_3b2beb3fba3044d3a93bae33fa40d5de", + "max": 10659, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_fb676c541a67413fb6b4cdbf3a8f313a", + "value": 10659 + } + }, + "3ab7332a28b24ef2ac6547df38555145": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9b9965dd1d76436286f0afd5426f0c17", + "placeholder": "​", + "style": "IPY_MODEL_ea25aedeb8964372bcc66261b969a981", + "value": " 10.7k/10.7k [00:00<00:00, 209kB/s]" + } + }, + "c66544f937b647b99520d6e141a34f1e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9e153049c238446d932e31ff7e8803da": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5dfa402c6e184230b2924094f919ff54": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "3b2beb3fba3044d3a93bae33fa40d5de": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fb676c541a67413fb6b4cdbf3a8f313a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "9b9965dd1d76436286f0afd5426f0c17": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ea25aedeb8964372bcc66261b969a981": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9dc6c31683ed40a38911f5893488a67e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_1747aa8196cd4e01a1586aec2bd69ce2", + "IPY_MODEL_b769be63458840f78cb9be57e6edd9f8", + "IPY_MODEL_d08536abc0654c95b354ba01caaefc81" + ], + "layout": "IPY_MODEL_9eefaa330b8f4d32ba5232388022314b" + } + }, + "1747aa8196cd4e01a1586aec2bd69ce2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_2293028651f7411d85c88dca2ad588aa", + "placeholder": "​", + "style": "IPY_MODEL_c17002f3b6f54d31b6659111c449e33b", + "value": "sentence_bert_config.json: 100%" + } + }, + "b769be63458840f78cb9be57e6edd9f8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9d4ef040a792497aba63a193a21751ad", + "max": 53, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_292ba06b82424f3c8b50abf77e64e5ff", + "value": 53 + } + }, + "d08536abc0654c95b354ba01caaefc81": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_1325fa645feb4dc0b0116d06235ccd72", + "placeholder": "​", + "style": "IPY_MODEL_8cb04461d32344f1b772ba60352eda56", + "value": " 53.0/53.0 [00:00<00:00, 799B/s]" + } + }, + "9eefaa330b8f4d32ba5232388022314b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2293028651f7411d85c88dca2ad588aa": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c17002f3b6f54d31b6659111c449e33b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "9d4ef040a792497aba63a193a21751ad": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "292ba06b82424f3c8b50abf77e64e5ff": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "1325fa645feb4dc0b0116d06235ccd72": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8cb04461d32344f1b772ba60352eda56": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "c5719e0174fc4cb999cd822bf56c038d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_35f6cedab9ef41da9da5a1b577c527a5", + "IPY_MODEL_e4663b5598494f0fb641b4df62fb9701", + "IPY_MODEL_f4eb9a5b95ac4de5ab8af9d57f187145" + ], + "layout": "IPY_MODEL_0a35ce34e0ea48049cc1aa8377326e8a" + } + }, + "35f6cedab9ef41da9da5a1b577c527a5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_0fe20f4256904090b47814d097352de8", + "placeholder": "​", + "style": "IPY_MODEL_2125fd1efcc6413885da83478f7b5c36", + "value": "config.json: 100%" + } + }, + "e4663b5598494f0fb641b4df62fb9701": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_f3d61d97c6a544e793f5cafcc27e8ee3", + "max": 612, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_70722b941b8e4b0d89aa08cbc3db089f", + "value": 612 + } + }, + "f4eb9a5b95ac4de5ab8af9d57f187145": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_bb31660cd76245eca399b39c7cf9a6f8", + "placeholder": "​", + "style": "IPY_MODEL_b521b2458ad44b4e9b7f0078aee90f38", + "value": " 612/612 [00:00<00:00, 5.21kB/s]" + } + }, + "0a35ce34e0ea48049cc1aa8377326e8a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0fe20f4256904090b47814d097352de8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2125fd1efcc6413885da83478f7b5c36": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "f3d61d97c6a544e793f5cafcc27e8ee3": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "70722b941b8e4b0d89aa08cbc3db089f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "bb31660cd76245eca399b39c7cf9a6f8": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "b521b2458ad44b4e9b7f0078aee90f38": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4c63c381eee14d138a87b3078662a8a2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_7e5b564e7adb481ebe543786e26c5d99", + "IPY_MODEL_3ec929fc4dbb4b7f9f2ad6426ee96798", + "IPY_MODEL_6587bf6b5fdc43e8865418181404add3" + ], + "layout": "IPY_MODEL_3c268114693e4309af820c89708557b5" + } + }, + "7e5b564e7adb481ebe543786e26c5d99": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9b1818f243574f3486a94f7c0dde601b", + "placeholder": "​", + "style": "IPY_MODEL_d902c243cef141128e750ea302c1ccfe", + "value": "model.safetensors: 100%" + } + }, + "3ec929fc4dbb4b7f9f2ad6426ee96798": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_322162892edc4ab89bab2b4464ba010f", + "max": 90868376, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_c2ecde8402594b2aa57780978a2736b6", + "value": 90868376 + } + }, + "6587bf6b5fdc43e8865418181404add3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_c5af93c4252e44dea822a51c097c6114", + "placeholder": "​", + "style": "IPY_MODEL_22ab0d37c7ac4f838c5683d18e924044", + "value": " 90.9M/90.9M [00:01<00:00, 54.1MB/s]" + } + }, + "3c268114693e4309af820c89708557b5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9b1818f243574f3486a94f7c0dde601b": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d902c243cef141128e750ea302c1ccfe": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "322162892edc4ab89bab2b4464ba010f": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "c2ecde8402594b2aa57780978a2736b6": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "c5af93c4252e44dea822a51c097c6114": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "22ab0d37c7ac4f838c5683d18e924044": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "23d35eb6c5d84e2498a26d572585dbab": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_f3679e160bd5484292aa5822e6157dd2", + "IPY_MODEL_cf9d6b043bbd49c2a036d5877b03ffd2", + "IPY_MODEL_677616ac487248c79a5488b17640fc6d" + ], + "layout": "IPY_MODEL_24a4d38ae53d4b77b7949e7914e3cd78" + } + }, + "f3679e160bd5484292aa5822e6157dd2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_fe4cf5db788a47b6bc1238a0901bdbf7", + "placeholder": "​", + "style": "IPY_MODEL_786b94bea5bc4e5bac510fa471690b07", + "value": "tokenizer_config.json: 100%" + } + }, + "cf9d6b043bbd49c2a036d5877b03ffd2": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_cb6b79fbe1894dfebda7ca100bfb8a27", + "max": 350, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_fca9216926cd4b9cb4b79361c4e28c8f", + "value": 350 + } + }, + "677616ac487248c79a5488b17640fc6d": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_294ebbf2245d417590768daa5ecbd6b9", + "placeholder": "​", + "style": "IPY_MODEL_ca34b303478745148a8cf2ffb9608050", + "value": " 350/350 [00:00<00:00, 5.32kB/s]" + } + }, + "24a4d38ae53d4b77b7949e7914e3cd78": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fe4cf5db788a47b6bc1238a0901bdbf7": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "786b94bea5bc4e5bac510fa471690b07": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "cb6b79fbe1894dfebda7ca100bfb8a27": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fca9216926cd4b9cb4b79361c4e28c8f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "294ebbf2245d417590768daa5ecbd6b9": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ca34b303478745148a8cf2ffb9608050": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "33d706fc338f4b5190e4d411e7a19ed3": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_02f763bd8c534b6cab5a71c7e4f96d5c", + "IPY_MODEL_2382010f15044771bae63bfc907e6c76", + "IPY_MODEL_e0ec9206e3b1463fbf13ecb1c2c5af28" + ], + "layout": "IPY_MODEL_5e65cc6748c9499db70f566d8c85640c" + } + }, + "02f763bd8c534b6cab5a71c7e4f96d5c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a0bc08ce4fde43e8bf3bb38ab1f6d8ea", + "placeholder": "​", + "style": "IPY_MODEL_8b49e94d81f0463b8f0e127533c0169c", + "value": "vocab.txt: 100%" + } + }, + "2382010f15044771bae63bfc907e6c76": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ed2cfa42d6b84a6b82d4e2d0a15cb68e", + "max": 231508, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_2e0a2884d45f43beb4f2e1766a3aea97", + "value": 231508 + } + }, + "e0ec9206e3b1463fbf13ecb1c2c5af28": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ee402e17a3f14f17b2a444b953bc6c4e", + "placeholder": "​", + "style": "IPY_MODEL_d0fc2fabcb284d528a4a7439b21a3ece", + "value": " 232k/232k [00:00<00:00, 1.93MB/s]" + } + }, + "5e65cc6748c9499db70f566d8c85640c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "a0bc08ce4fde43e8bf3bb38ab1f6d8ea": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "8b49e94d81f0463b8f0e127533c0169c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ed2cfa42d6b84a6b82d4e2d0a15cb68e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "2e0a2884d45f43beb4f2e1766a3aea97": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "ee402e17a3f14f17b2a444b953bc6c4e": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "d0fc2fabcb284d528a4a7439b21a3ece": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "7ca4b89fe55946bbbf4db559883ab88b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ad3bb94f158f4f6db10318ff4f5fa4f1", + "IPY_MODEL_259215f6bdf541e198747feeb23fe131", + "IPY_MODEL_b4e24a22fae541fbba241cad0ef3c893" + ], + "layout": "IPY_MODEL_7778e2dde0ff4e7083763c8add1568fe" + } + }, + "ad3bb94f158f4f6db10318ff4f5fa4f1": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_301b6d78b8df4835aacfdbd085aa0233", + "placeholder": "​", + "style": "IPY_MODEL_06f3f9ea472d472191ec6df978539d26", + "value": "tokenizer.json: 100%" + } + }, + "259215f6bdf541e198747feeb23fe131": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_690ae35b96304f45a00d8c856b107199", + "max": 466247, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_526f3ff32b134b6d90903ab2c7fb9011", + "value": 466247 + } + }, + "b4e24a22fae541fbba241cad0ef3c893": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_15db9ca50cc34cca89c06e9ac5b56c2c", + "placeholder": "​", + "style": "IPY_MODEL_0fb2805f19f346cea205368fc1047356", + "value": " 466k/466k [00:00<00:00, 6.02MB/s]" + } + }, + "7778e2dde0ff4e7083763c8add1568fe": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "301b6d78b8df4835aacfdbd085aa0233": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "06f3f9ea472d472191ec6df978539d26": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "690ae35b96304f45a00d8c856b107199": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "526f3ff32b134b6d90903ab2c7fb9011": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "15db9ca50cc34cca89c06e9ac5b56c2c": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0fb2805f19f346cea205368fc1047356": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "4b382d8ca91c430e862bb7e54c800596": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_d73101f6560c4ac0b5b8d7eb1ea677a4", + "IPY_MODEL_54e1cd8f1fb04cb698ea68af80d8c234", + "IPY_MODEL_542d51359bca4906ab18672b21c55c48" + ], + "layout": "IPY_MODEL_f4f5d9a78cfc41eca41255a8e4607baa" + } + }, + "d73101f6560c4ac0b5b8d7eb1ea677a4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_5a0623b782e34075af7520ccabb936c2", + "placeholder": "​", + "style": "IPY_MODEL_ecf89a48bed1412e97e62a2e9209b165", + "value": "special_tokens_map.json: 100%" + } + }, + "54e1cd8f1fb04cb698ea68af80d8c234": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ce427887152f4d07a986b032cd96856a", + "max": 112, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_66c27e12558b4a8d9c0c06d5250d5687", + "value": 112 + } + }, + "542d51359bca4906ab18672b21c55c48": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_7c915ca33f464c6382095af94830e5fe", + "placeholder": "​", + "style": "IPY_MODEL_207ec4736935444a999962514544ba4f", + "value": " 112/112 [00:00<00:00, 2.45kB/s]" + } + }, + "f4f5d9a78cfc41eca41255a8e4607baa": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5a0623b782e34075af7520ccabb936c2": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "ecf89a48bed1412e97e62a2e9209b165": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ce427887152f4d07a986b032cd96856a": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "66c27e12558b4a8d9c0c06d5250d5687": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7c915ca33f464c6382095af94830e5fe": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "207ec4736935444a999962514544ba4f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "d3976255a98544fcae91e46189da54ed": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_8421f5bf16054eeda179b9664d595312", + "IPY_MODEL_01eadd6ce62b45a0a067e6a59b996840", + "IPY_MODEL_cdd546ed070a4691aa6f4bc14ad214b5" + ], + "layout": "IPY_MODEL_aa58345cc8a74a94bf07449949467792" + } + }, + "8421f5bf16054eeda179b9664d595312": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_144cb8c372b54a0f9e3f7f2d0dfd4abd", + "placeholder": "​", + "style": "IPY_MODEL_e1d0c2d48de34b18999dfe8b12cf568a", + "value": "1_Pooling/config.json: 100%" + } + }, + "01eadd6ce62b45a0a067e6a59b996840": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_ee13d97e607d4171bf5667bd71d39291", + "max": 190, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_9280f62d04ce45d9bd00c3efb33f3c8e", + "value": 190 + } + }, + "cdd546ed070a4691aa6f4bc14ad214b5": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_df1309e8f1f843b2b5dd1085a8230816", + "placeholder": "​", + "style": "IPY_MODEL_0470c8d796fc46f3a370e74ec8192c0c", + "value": " 190/190 [00:00<00:00, 3.37kB/s]" + } + }, + "aa58345cc8a74a94bf07449949467792": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "144cb8c372b54a0f9e3f7f2d0dfd4abd": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e1d0c2d48de34b18999dfe8b12cf568a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "ee13d97e607d4171bf5667bd71d39291": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9280f62d04ce45d9bd00c3efb33f3c8e": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "df1309e8f1f843b2b5dd1085a8230816": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "0470c8d796fc46f3a370e74ec8192c0c": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "### Installing the relevant dependencies" + ], + "metadata": { + "id": "pXh2tc9vKHxu" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "o-2_U0HGaU9S" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install lancedb scikit-learn openai torch sentence_transformers tiktoken umap-learn PyPDF2 tantivy" + ] + }, + { + "cell_type": "code", + "source": [ + "import os\n", + "import uuid\n", + "import tiktoken\n", + "import re\n", + "import numpy as np\n", + "import pandas as pd\n", + "import transformers\n", + "import torch\n", + "import umap.umap_ as umap\n", + "import matplotlib.pyplot as plt\n", + "from openai import OpenAI\n", + "from typing import List, Tuple, Optional, Dict\n", + "from sklearn.mixture import GaussianMixture\n", + "from sentence_transformers import SentenceTransformer\n", + "\n", + "openai_api_key = \"******\"\n", + "embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')\n", + "client = OpenAI(api_key=openai_api_key)\n", + "SEED = 1234" + ], + "metadata": { + "id": "XGmxiiQNbNOG", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 426, + "referenced_widgets": [ + "be703a5360454336827d77474de87803", + "ec15c209ef564b899b2467902bb7e9d5", + "211f483b55b84c2eb656bf770ef0e4d4", + "224c78f30362472ab4a68f4d80aa7491", + "303e08d66f9e43e6afe3e5b75a23bdd9", + "67b93205f7e3426abc804282d377cb0c", + "48c5b440c4244a2aa287b630d4f1fc27", + "070eb5d597104ec59a1c5b32a791a50b", + "584db1dfb9f04c0db7b3b405027ca5a8", + "32dc8c442c8f4c708b023b13e15cd4f2", + "9e670b17e23d43dca51947973397a24d", + "936efe41e7324e10a759dc261a5a8f6f", + "137e355e929b49cf905b22031b0dda0a", + "0ab443d03e3a469dbc47d2ff5e48f905", + "9b6e614242414230bf2078381564a84a", + "671077413ff74589a2b89df8a88bf34f", + "b773d7756d16450f8a90c28fa2a18c35", + "2db0e65ddf604f15bbac4c8bd6b0973e", + "d553db695b4f47dead593c0f38b291d2", + "b83daf31f0fd48a7afb2d5ef85923fed", + "b99bbfeb85a74d64ad93afcb895b82dd", + "560e32d903044ff380d3fb7c5bdb4b31", + "4dc9c4c82af54bceb26cbc94ea4819c5", + "016d29542c2f497face705bcf140cf8c", + "669bc49d44694ff8a74cf31232ac7665", + "3ab7332a28b24ef2ac6547df38555145", + "c66544f937b647b99520d6e141a34f1e", + "9e153049c238446d932e31ff7e8803da", + "5dfa402c6e184230b2924094f919ff54", + "3b2beb3fba3044d3a93bae33fa40d5de", + "fb676c541a67413fb6b4cdbf3a8f313a", + "9b9965dd1d76436286f0afd5426f0c17", + "ea25aedeb8964372bcc66261b969a981", + "9dc6c31683ed40a38911f5893488a67e", + "1747aa8196cd4e01a1586aec2bd69ce2", + "b769be63458840f78cb9be57e6edd9f8", + "d08536abc0654c95b354ba01caaefc81", + "9eefaa330b8f4d32ba5232388022314b", + "2293028651f7411d85c88dca2ad588aa", + "c17002f3b6f54d31b6659111c449e33b", + "9d4ef040a792497aba63a193a21751ad", + "292ba06b82424f3c8b50abf77e64e5ff", + "1325fa645feb4dc0b0116d06235ccd72", + "8cb04461d32344f1b772ba60352eda56", + "c5719e0174fc4cb999cd822bf56c038d", + "35f6cedab9ef41da9da5a1b577c527a5", + "e4663b5598494f0fb641b4df62fb9701", + "f4eb9a5b95ac4de5ab8af9d57f187145", + "0a35ce34e0ea48049cc1aa8377326e8a", + "0fe20f4256904090b47814d097352de8", + "2125fd1efcc6413885da83478f7b5c36", + "f3d61d97c6a544e793f5cafcc27e8ee3", + "70722b941b8e4b0d89aa08cbc3db089f", + "bb31660cd76245eca399b39c7cf9a6f8", + "b521b2458ad44b4e9b7f0078aee90f38", + "4c63c381eee14d138a87b3078662a8a2", + "7e5b564e7adb481ebe543786e26c5d99", + "3ec929fc4dbb4b7f9f2ad6426ee96798", + "6587bf6b5fdc43e8865418181404add3", + "3c268114693e4309af820c89708557b5", + "9b1818f243574f3486a94f7c0dde601b", + "d902c243cef141128e750ea302c1ccfe", + "322162892edc4ab89bab2b4464ba010f", + "c2ecde8402594b2aa57780978a2736b6", + "c5af93c4252e44dea822a51c097c6114", + "22ab0d37c7ac4f838c5683d18e924044", + "23d35eb6c5d84e2498a26d572585dbab", + "f3679e160bd5484292aa5822e6157dd2", + "cf9d6b043bbd49c2a036d5877b03ffd2", + "677616ac487248c79a5488b17640fc6d", + "24a4d38ae53d4b77b7949e7914e3cd78", + "fe4cf5db788a47b6bc1238a0901bdbf7", + "786b94bea5bc4e5bac510fa471690b07", + "cb6b79fbe1894dfebda7ca100bfb8a27", + "fca9216926cd4b9cb4b79361c4e28c8f", + "294ebbf2245d417590768daa5ecbd6b9", + "ca34b303478745148a8cf2ffb9608050", + "33d706fc338f4b5190e4d411e7a19ed3", + "02f763bd8c534b6cab5a71c7e4f96d5c", + "2382010f15044771bae63bfc907e6c76", + "e0ec9206e3b1463fbf13ecb1c2c5af28", + "5e65cc6748c9499db70f566d8c85640c", + "a0bc08ce4fde43e8bf3bb38ab1f6d8ea", + "8b49e94d81f0463b8f0e127533c0169c", + "ed2cfa42d6b84a6b82d4e2d0a15cb68e", + "2e0a2884d45f43beb4f2e1766a3aea97", + "ee402e17a3f14f17b2a444b953bc6c4e", + "d0fc2fabcb284d528a4a7439b21a3ece", + "7ca4b89fe55946bbbf4db559883ab88b", + "ad3bb94f158f4f6db10318ff4f5fa4f1", + "259215f6bdf541e198747feeb23fe131", + "b4e24a22fae541fbba241cad0ef3c893", + "7778e2dde0ff4e7083763c8add1568fe", + "301b6d78b8df4835aacfdbd085aa0233", + "06f3f9ea472d472191ec6df978539d26", + "690ae35b96304f45a00d8c856b107199", + "526f3ff32b134b6d90903ab2c7fb9011", + "15db9ca50cc34cca89c06e9ac5b56c2c", + "0fb2805f19f346cea205368fc1047356", + "4b382d8ca91c430e862bb7e54c800596", + "d73101f6560c4ac0b5b8d7eb1ea677a4", + "54e1cd8f1fb04cb698ea68af80d8c234", + "542d51359bca4906ab18672b21c55c48", + "f4f5d9a78cfc41eca41255a8e4607baa", + "5a0623b782e34075af7520ccabb936c2", + "ecf89a48bed1412e97e62a2e9209b165", + "ce427887152f4d07a986b032cd96856a", + "66c27e12558b4a8d9c0c06d5250d5687", + "7c915ca33f464c6382095af94830e5fe", + "207ec4736935444a999962514544ba4f", + "d3976255a98544fcae91e46189da54ed", + "8421f5bf16054eeda179b9664d595312", + "01eadd6ce62b45a0a067e6a59b996840", + "cdd546ed070a4691aa6f4bc14ad214b5", + "aa58345cc8a74a94bf07449949467792", + "144cb8c372b54a0f9e3f7f2d0dfd4abd", + "e1d0c2d48de34b18999dfe8b12cf568a", + "ee13d97e607d4171bf5667bd71d39291", + "9280f62d04ce45d9bd00c3efb33f3c8e", + "df1309e8f1f843b2b5dd1085a8230816", + "0470c8d796fc46f3a370e74ec8192c0c" + ] + }, + "outputId": "dfea26c3-cb8c-4c56-f36a-13bb2e36eb5a" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "modules.json: 0%| | 0.00/349 [00:00 np.ndarray:\n", + " if clustering_type == \"local\":\n", + " n_neighbors = max(2, min(10, len(embeddings) - 1))\n", + " min_dist = 0.01\n", + " elif clustering_type == \"global\":\n", + " n_neighbors = max(2, min(int((len(embeddings) - 1) ** 0.5), len(embeddings) // 10, len(embeddings) - 1))\n", + " min_dist = 0.1\n", + " else:\n", + " raise ValueError(\"clustering_type must be either 'local' or 'global'\")\n", + "\n", + " umap_model = umap.UMAP(\n", + " n_neighbors=n_neighbors,\n", + " min_dist=min_dist,\n", + " n_components=target_dim,\n", + " metric=metric,\n", + " )\n", + " return umap_model.fit_transform(embeddings)" + ], + "metadata": { + "id": "ZJ_LvhKYgepc" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Optimal Number of Clusters\n", + "\n" + ], + "metadata": { + "id": "QRRbuxB5qpUa" + } + }, + { + "cell_type": "markdown", + "source": [ + "I plan to leverage both the Elbow Method and the Bayesian Information Criterion (BIC) to pinpoint the optimal number of clusters for our analysis." + ], + "metadata": { + "id": "LRy5NHTkq2kl" + } + }, + { + "cell_type": "code", + "source": [ + "def compute_inertia(embeddings: np.ndarray, labels: np.ndarray, centroids: np.ndarray) -> float:\n", + " return np.sum(np.min(np.sum((embeddings[:, np.newaxis] - centroids) ** 2, axis=2), axis=1))\n", + "\n", + "def optimal_cluster_number(\n", + " embeddings: np.ndarray,\n", + " max_clusters: int = 50,\n", + " random_state: int = SEED\n", + ") -> int:\n", + " max_clusters = min(max_clusters, len(embeddings))\n", + " number_of_clusters = np.arange(1, max_clusters + 1)\n", + " inertias = []\n", + " bic_scores = []\n", + "\n", + " for n in number_of_clusters:\n", + " gmm = GaussianMixture(n_components=n, random_state=random_state)\n", + " labels = gmm.fit_predict(embeddings)\n", + " centroids = gmm.means_\n", + " inertia = compute_inertia(embeddings, labels, centroids)\n", + " inertias.append(inertia)\n", + " bic_scores.append(gmm.bic(embeddings))\n", + "\n", + " inertia_changes = np.diff(inertias)\n", + " elbow_optimal = number_of_clusters[np.argmin(inertia_changes) + 1]\n", + " bic_optimal = number_of_clusters[np.argmin(bic_scores)]\n", + "\n", + " return max(elbow_optimal, bic_optimal)\n", + "\n", + "def gmm_clustering(\n", + " embeddings: np.ndarray,\n", + " threshold: float,\n", + " random_state: int = SEED\n", + ") -> Tuple[List[np.ndarray], int]:\n", + " n_clusters = optimal_cluster_number(embeddings, random_state=random_state)\n", + " gm = GaussianMixture(n_components=n_clusters, random_state=random_state, n_init=2)\n", + " gm.fit(embeddings)\n", + " probs = gm.predict_proba(embeddings)\n", + " labels = [np.where(prob > threshold)[0] for prob in probs]\n", + " return labels, n_clusters" + ], + "metadata": { + "id": "zsQA9H0vgs5o" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "### Tree Construction\n", + "\n", + "Now that we’ve wrapped up the clustering part, let’s talk about how we build our hierarchical tree. After several rounds of clustering and summarization (while keeping track of how deep we go), here’s what we have:\n", + "\n", + "- **Leaf Nodes:** These are our original text chunks, forming the base of the tree.\n", + "- **Summary Nodes:** As we go up the tree, each node acts like a quick summary of its child nodes, capturing the main idea of the cluster.\n", + "- **Hierarchical Embeddings:** The summary nodes can also become the new nodes at their level. Each of these nodes gets its own vector embedding, representing the summarized meaning. So, we’re essentially adding more nodes while enriching them with summaries.\n", + "\n", + "The process flows nicely: we embed the chunks, reduce their dimensions using UMAP, cluster them with Gaussian Mixture Models, start with a broad overview, and then zoom in for more detailed clusters before summarizing." + ], + "metadata": { + "id": "-w-MJXDqrQxl" + } + }, + { + "cell_type": "code", + "source": [ + "def clustering_algorithm(\n", + " embeddings: np.ndarray,\n", + " target_dim: int,\n", + " threshold: float,\n", + " random_state: int = SEED\n", + ") -> Tuple[List[np.ndarray], int]:\n", + " if len(embeddings) <= target_dim + 1:\n", + " return [np.array([0]) for _ in range(len(embeddings))], 1\n", + "\n", + " # Global clustering\n", + " reduced_global_embeddings = dimensionality_reduction(embeddings, target_dim, \"global\")\n", + " global_clusters, n_global_clusters = gmm_clustering(reduced_global_embeddings, threshold, random_state=random_state)\n", + "\n", + " all_local_clusters = [np.array([]) for _ in range(len(embeddings))]\n", + " total_clusters = 0\n", + "\n", + " # Local clustering within each global cluster\n", + " for i in range(n_global_clusters):\n", + " global_cluster_mask = np.array([i in gc for gc in global_clusters])\n", + " global_cluster_embeddings = embeddings[global_cluster_mask]\n", + "\n", + " if len(global_cluster_embeddings) <= target_dim + 1:\n", + " # Assign all points in this global cluster to a single local cluster\n", + " for idx in np.where(global_cluster_mask)[0]:\n", + " all_local_clusters[idx] = np.append(all_local_clusters[idx], total_clusters)\n", + " total_clusters += 1\n", + " continue\n", + "\n", + " try:\n", + " reduced_local_embeddings = dimensionality_reduction(global_cluster_embeddings, target_dim, \"local\")\n", + " local_clusters, n_local_clusters = gmm_clustering(reduced_local_embeddings, threshold, random_state=random_state)\n", + "\n", + " # Assign local cluster IDs\n", + " for j in range(n_local_clusters):\n", + " local_cluster_mask = np.array([j in lc for lc in local_clusters])\n", + " global_indices = np.where(global_cluster_mask)[0]\n", + " local_indices = global_indices[local_cluster_mask]\n", + " for idx in local_indices:\n", + " all_local_clusters[idx] = np.append(all_local_clusters[idx], j + total_clusters)\n", + "\n", + " total_clusters += n_local_clusters\n", + " except Exception as e:\n", + " print(f\"Error in local clustering for global cluster {i}: {str(e)}\")\n", + " # Assign all points in this global cluster to a single local cluster\n", + " for idx in np.where(global_cluster_mask)[0]:\n", + " all_local_clusters[idx] = np.append(all_local_clusters[idx], total_clusters)\n", + " total_clusters += 1\n", + "\n", + " return all_local_clusters, total_clusters\n", + "\n", + "def generate_summary(context: str) -> str:\n", + " prompt = f\"\"\"\n", + " Provide the Summary for the given context. Here are some additional instructions for you:\n", + "\n", + " Instructions:\n", + " 1. Don't make things up, Just use the contexts and generate the relevant summary.\n", + " 2. Don't mix the numbers, Just use the numbers in the context.\n", + " 3. Don't try to use fancy words, stick to the basics of the language that is being used in the context.\n", + "\n", + " Context: {context}\n", + " \"\"\"\n", + " response = client.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a helpful assistant that summarizes text.\"},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ],\n", + " max_tokens=200,\n", + " n=1,\n", + " stop=None,\n", + " temperature=0.7\n", + " )\n", + " summary = response.choices[0].message.content.strip()\n", + " return summary\n", + "\n", + "def embed_clusters(\n", + " texts: List[str],\n", + " target_dim: int = 10,\n", + " threshold: float = 0.1\n", + ") -> pd.DataFrame:\n", + " textual_embeddings = np.array(embedding_model.encode(texts))\n", + " clusters, number_of_clusters = clustering_algorithm(textual_embeddings, target_dim, threshold)\n", + " print(f\"Number of clusters: {number_of_clusters}\")\n", + " return pd.DataFrame({\n", + " \"texts\": texts,\n", + " \"embedding\": list(textual_embeddings),\n", + " \"clusters\": clusters\n", + " })\n", + "\n", + "def embed_cluster_summaries(\n", + " texts: List[str],\n", + " level: int,\n", + " target_dim: int = 10,\n", + " threshold: float = 0.1\n", + ") -> Tuple[pd.DataFrame, pd.DataFrame]:\n", + " df_clusters = embed_clusters(texts, target_dim, threshold)\n", + " main_list = []\n", + "\n", + " for _, row in df_clusters.iterrows():\n", + " for cluster in row[\"clusters\"]:\n", + " main_list.append({\n", + " \"text\": row[\"texts\"],\n", + " \"embedding\": row[\"embedding\"],\n", + " \"clusters\": cluster\n", + " })\n", + "\n", + " main_df = pd.DataFrame(main_list)\n", + " unique_clusters = main_df[\"clusters\"].unique()\n", + " if len(unique_clusters) == 0:\n", + " return df_clusters, pd.DataFrame(columns=[\"summaries\", \"level\", \"clusters\"])\n", + "\n", + " print(f\"--Generated {len(unique_clusters)} clusters--\")\n", + "\n", + " summaries = []\n", + " for cluster in unique_clusters:\n", + " text_in_df = main_df[main_df[\"clusters\"] == cluster]\n", + " unique_texts = text_in_df[\"text\"].tolist()\n", + " text = \"------\\n------\".join(unique_texts)\n", + " summary = generate_summary(text)\n", + " summaries.append(summary)\n", + "\n", + " df_summaries = pd.DataFrame({\n", + " \"summaries\": summaries,\n", + " \"level\": [level] * len(summaries),\n", + " \"clusters\": unique_clusters\n", + " })\n", + "\n", + " return df_clusters, df_summaries\n", + "\n", + "\n", + "def recursive_embedding_with_cluster_summarization(\n", + " texts: List[str],\n", + " number_of_levels: int = 3,\n", + " level: int = 1,\n", + " target_dim: int = 10,\n", + " threshold: float = 0.1\n", + ") -> Dict[int, Tuple[pd.DataFrame, pd.DataFrame]]:\n", + " if level > number_of_levels:\n", + " return {}\n", + "\n", + " results = {}\n", + " df_clusters, df_summaries = embed_cluster_summaries(texts, level, target_dim, threshold)\n", + " results[level] = (df_clusters, df_summaries)\n", + "\n", + " if df_summaries.empty or len(df_summaries['clusters'].unique()) == 1:\n", + " print(f\"No more unique clusters found at level {level}. Stopping recursion.\")\n", + " return results\n", + "\n", + " if level < number_of_levels:\n", + " next_level_texts = df_summaries['summaries'].tolist()\n", + " next_level_results = recursive_embedding_with_cluster_summarization(\n", + " next_level_texts,\n", + " number_of_levels,\n", + " level + 1,\n", + " target_dim,\n", + " threshold\n", + " )\n", + " results.update(next_level_results)\n", + "\n", + " return results\n" + ], + "metadata": { + "id": "FcAEBwFIiKNX" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def process_text_hierarchy(\n", + " texts: List[str],\n", + " number_of_levels: int = 3,\n", + " target_dim: int = 10,\n", + " threshold: float = 0.1\n", + ") -> Dict[str, pd.DataFrame]:\n", + " hierarchy_results = recursive_embedding_with_cluster_summarization(\n", + " texts, number_of_levels, target_dim=target_dim, threshold=threshold\n", + " )\n", + "\n", + " processed_results = {}\n", + " for level, (df_clusters, df_summaries) in hierarchy_results.items():\n", + " if df_clusters.empty or df_summaries.empty:\n", + " print(f\"No data for level {level}. Skipping.\")\n", + " continue\n", + " processed_results[f\"level_{level}_clusters\"] = df_clusters\n", + " processed_results[f\"level_{level}_summaries\"] = df_summaries\n", + "\n", + " return processed_results\n", + "\n", + "\n", + "results = process_text_hierarchy(chunks, number_of_levels=3)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "M5Cq7FziHFjc", + "outputId": "2f3362a0-59a7-4ab3-da87-e7ef3dc4d18f" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Number of clusters: 4\n", + "--Generated 4 clusters--\n", + "Number of clusters: 1\n", + "--Generated 1 clusters--\n", + "No more unique clusters found at level 2. Stopping recursion.\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### RAG Formation\n", + "\n", + "Now it’s smooth sailing! We’ll set up a LanceDB vector database to store our embeddings and facilitate querying our RAG setup. To compare the relevant results from both RAPTOR RAG and VANILLA RAG, I’ll configure both RAG systems." + ], + "metadata": { + "id": "Wyu9ec6nrWPz" + } + }, + { + "cell_type": "code", + "source": [ + "raptor_texts = []\n", + "for level, row in results.items():\n", + " if level.endswith(\"clusters\"):\n", + " raptor_texts.extend(row[\"texts\"])\n", + " else:\n", + " raptor_texts.extend(row[\"summaries\"])\n", + "\n", + "raptor_embeddings = embedding_model.encode(raptor_texts) # new raptor embeddings\n", + "normal_embeddings = embedding_model.encode(chunks) # default chunks from our data\n", + "print(raptor_embeddings)\n", + "print(normal_embeddings)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "JRi7Z254IKOb", + "outputId": "5856f39c-1a64-4f0d-c6f2-523af7755832" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[[-0.02867434 -0.03404145 0.03183713 ... -0.11637738 -0.03496148\n", + " 0.02426555]\n", + " [-0.02260254 -0.05349045 0.01649414 ... -0.07463644 0.06521226\n", + " 0.02849963]\n", + " [-0.04479891 -0.05722785 0.07431144 ... -0.12457766 -0.07554325\n", + " 0.08023182]\n", + " ...\n", + " [-0.00617281 -0.02009849 -0.00419744 ... -0.09979296 -0.10421827\n", + " -0.01685225]\n", + " [-0.0961774 -0.01634985 -0.02820352 ... -0.08988537 -0.03870759\n", + " -0.0209068 ]\n", + " [-0.07575317 -0.00782455 -0.00337196 ... -0.11301762 -0.02062772\n", + " 0.02530929]]\n", + "[[-2.8674338e-02 -3.4041446e-02 3.1837128e-02 ... -1.1637738e-01\n", + " -3.4961481e-02 2.4265550e-02]\n", + " [-2.2602541e-02 -5.3490449e-02 1.6494140e-02 ... -7.4636437e-02\n", + " 6.5212265e-02 2.8499626e-02]\n", + " [-4.4798914e-02 -5.7227850e-02 7.4311443e-02 ... -1.2457766e-01\n", + " -7.5543255e-02 8.0231816e-02]\n", + " ...\n", + " [-6.6004984e-02 -7.7177368e-02 2.0053916e-02 ... -8.4871434e-02\n", + " 5.2336227e-02 -1.4862130e-06]\n", + " [ 1.5349649e-02 -9.5823780e-02 1.1927984e-02 ... -8.5330136e-02\n", + " 6.6217750e-02 3.6647685e-02]\n", + " [-8.2652569e-02 -2.1616790e-02 -1.9847509e-02 ... -8.3741397e-02\n", + " 4.1552845e-02 2.0205928e-02]]\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "With RAPTOR, we now have an increased number of chunks due to the addition of cluster-level summary nodes alongside the default chunks we had earlier." + ], + "metadata": { + "id": "5G0mDUuRs3Ac" + } + }, + { + "cell_type": "code", + "source": [ + "# Compute lengths of the embeddings\n", + "raptor_length = len(raptor_embeddings)\n", + "normal_length = len(normal_embeddings)\n", + "\n", + "# Bar graph data\n", + "labels = ['Raptor RAG', 'Normal RAG']\n", + "lengths = [raptor_length, normal_length]\n", + "\n", + "# Create the bar graph\n", + "plt.bar(labels, lengths, color=['blue', 'green'])\n", + "\n", + "# Add title and labels\n", + "plt.title('Comparison of Number of Chunks')\n", + "plt.ylabel('Number of Chunks')\n", + "\n", + "# Show the plot\n", + "plt.show()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 452 + }, + "id": "GdxyN80FYarp", + "outputId": "aebcc6f5-d178-42e6-864b-0ef7df4f7542" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAGzCAYAAAA1yP25AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA6XElEQVR4nO3deXQUZf7+/asTsi8NgRCIxrBKhGAUVHDYBQmLIoIiIBJckNGgbAITkdWFRUUEAXF+QBBZHJVFVFB2vgrCoCCigLKJCgTUkIaAAZL7+cOTfmyydUOHpDLv1zl9DnVX1d2f6qa6r1TdVW0zxhgBAABYkE9JFwAAAHC5CDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDIAAMCyCDJAKWaz2TRmzJiSLuOKzZ8/X3FxcfLz81P58uVLupzLkpqaKpvNpu3bt5d0KW758ccf1bZtW9ntdtlsNi1btswr/bZs2VLx8fFe6ctTGzZskM1m0/vvv18iz4/SiSCDUu3AgQPq16+fatSoocDAQIWHh6tJkyZ6/fXXde7cuZIuD27Yu3ev+vTpo5o1a+rf//633nrrrQKXHTNmjGw2m6KionT27Nk886tVq6a77rqrOMstM5KSkvTtt9/qxRdf1Pz583XLLbcUurzD4dDYsWOVkJCg0NBQBQUFKT4+XsOHD9fRo0evUtWA58qVdAFAQT7++GPdf//9CggIUO/evRUfH6/z58/r888/19ChQ/Xdd98V+qVYFpw7d07lyll7N92wYYNycnL0+uuvq1atWm6tc+LECc2cOVNDhgwp5urKpnPnzmnLli0aMWKE+vfvX+TyBw8eVJs2bXTkyBHdf//9evzxx+Xv769du3Zp9uzZWrp0qX744YerUDngOWt/QqLMOnTokLp3767Y2FitW7dOVatWdc5LTk7W/v379fHHH5dghcUnJydH58+fV2BgoAIDA0u6nCt24sQJSfLolNJNN92kl19+WU8++aSCgoKKqbLSKTMzUyEhIVfUx8mTJyW595pfvHhRXbp0UVpamjZs2KCmTZu6zH/xxRc1ceLEK6oHKE6cWkKpNGnSJJ05c0azZ892CTG5atWqpQEDBjinL168qOeff141a9ZUQECAqlWrpmeffVZZWVku6+WemtiwYYNuueUWBQUFqX79+tqwYYMkacmSJapfv74CAwPVsGFD7dixw2X9Pn36KDQ0VAcPHlRiYqJCQkIUHR2tcePG6dIfkn/llVf0j3/8QxUrVlRQUJAaNmyY77l9m82m/v37a8GCBapXr54CAgK0atUq57y/j5E5ffq0Bg4cqGrVqikgIECVK1fWnXfeqa+//tqlz/fee08NGzZUUFCQKlWqpF69eunXX3/Nd1t+/fVXde7cWaGhoYqMjNQzzzyj7OzsAt4ZVzNmzHDWHB0dreTkZJ06dcrl9R49erQkKTIy0u0xP6NGjVJaWppmzpxZ6HK5YyZy379chw8fls1mU2pqap7tPXLkiO666y6Fhobqmmuu0fTp0yVJ3377re644w6FhIQoNjZWCxcuzPc5z549q379+qlixYoKDw9X7969lZ6enme5lStXqlmzZgoJCVFYWJg6duyo7777zmWZ3JoOHDigDh06KCwsTA8++GCh27xjxw61b99e4eHhCg0NVevWrfXll186548ZM0axsbGSpKFDh8pms6latWoF9vfBBx/om2++0YgRI/KEGEkKDw/Xiy++mKf9+++/V6tWrRQcHKxrrrlGkyZNcpmfO6bo8OHDLu35vWe5426K6jM/WVlZuuuuu2S327V582ZJ7u8nKBsIMiiVVqxYoRo1augf//iHW8s/9thjGjVqlBo0aKDXXntNLVq00Pjx49W9e/c8y+7fv189e/bU3XffrfHjxys9PV133323FixYoEGDBqlXr14aO3asDhw4oG7duiknJ8dl/ezsbLVr105RUVGaNGmSGjZsqNGjRzu/sHO9/vrruvnmmzVu3Di99NJLKleunO6///58jyStW7dOgwYN0gMPPKDXX3+9wC+ef/7zn5o5c6a6du2qGTNm6JlnnlFQUJD27NnjXCY1NVXdunWTr6+vxo8fr759+2rJkiVq2rSpS8jI3ZbExERVrFhRr7zyilq0aKFXX33VrVN2Y8aMUXJysqKjo/Xqq6+qa9eumjVrltq2basLFy5IkqZMmaJ7771XkjRz5kzNnz9fXbp0KbLvZs2a6Y477tCkSZO8OhYqOztb7du3V0xMjCZNmqRq1aqpf//+Sk1NVbt27XTLLbdo4sSJCgsLU+/evXXo0KE8ffTv31979uzRmDFj1Lt3by1YsECdO3d2CbLz589Xx44dFRoaqokTJ2rkyJH6/vvv1bRp0zxf7BcvXlRiYqIqV66sV155RV27di2w/u+++07NmjXTN998o2HDhmnkyJE6dOiQWrZsqa1bt0qSunTpotdee02S1KNHD82fP19TpkwpsM8PP/xQkvTQQw+5+zIqPT1d7dq1U0JCgl599VXFxcVp+PDhWrlypdt9eKPPc+fO6e6779bmzZu1Zs0a5+eFO/sJyhADlDIZGRlGkrnnnnvcWn7nzp1Gknnsscdc2p955hkjyaxbt87ZFhsbaySZzZs3O9s+/fRTI8kEBQWZn376ydk+a9YsI8msX7/e2ZaUlGQkmaeeesrZlpOTYzp27Gj8/f3NyZMnne1nz551qef8+fMmPj7e3HHHHS7tkoyPj4/57rvv8mybJDN69GjntN1uN8nJyQW+FufPnzeVK1c28fHx5ty5c872jz76yEgyo0aNyrMt48aNc+nj5ptvNg0bNizwOYwx5sSJE8bf39+0bdvWZGdnO9vfeOMNI8nMmTPH2TZ69GgjyeW1Kcjfl924caORZCZPnuycHxsbazp27OicXr9+fZ73yBhjDh06ZCSZuXPn5tnel156ydmWnp5ugoKCjM1mM4sXL3a27927N89rP3fuXCPJNGzY0Jw/f97ZPmnSJCPJLF++3BhjzOnTp0358uVN3759XWo6fvy4sdvtLu25Nf3rX/8q8rUxxpjOnTsbf39/c+DAAWfb0aNHTVhYmGnevHme7X/55ZeL7PPmm282drvdrec3xpgWLVoYSebtt992tmVlZZkqVaqYrl27OttyX69Dhw65rJ/fe+Zun7nrvvfee+b06dOmRYsWplKlSmbHjh0uz1HUfoKyhSMyKHUcDockKSwszK3lP/nkE0nS4MGDXdpzB4peegSkbt26uv32253TjRo1kiTdcccduu666/K0Hzx4MM9z/n0AZe6pofPnz2vNmjXO9r+P7UhPT1dGRoaaNWuW7+HtFi1aqG7dukVs6V9jHrZu3VrgVSTbt2/XiRMn9OSTT7qMr+nYsaPi4uLyPRr0z3/+02W6WbNm+W7z361Zs0bnz5/XwIED5ePz/3+M9O3bV+Hh4V4Zv9S8eXO1atXK60dlHnvsMee/y5cvrzp16igkJETdunVzttepU0fly5fP93V4/PHH5efn55x+4oknVK5cOef/w9WrV+vUqVPq0aOHfvvtN+fD19dXjRo10vr16/P0+cQTTxRZd3Z2tj777DN17txZNWrUcLZXrVpVPXv21Oeff+7cdzzhcDjc3tdyhYaGqlevXs5pf39/3XbbbUX+v/FWnxkZGWrbtq327t2rDRs26KabbnKZX9R+grKFIINSJzw8XNJf57nd8dNPP8nHxyfPFTFVqlRR+fLl9dNPP7m0/z2sSJLdbpckxcTE5Nt+6fgHHx8fly8SSbr++uslyeW0wUcffaTGjRsrMDBQERERioyM1MyZM5WRkZFnG6pXr17UZkr6a+zQ7t27FRMTo9tuu01jxoxx+aDP3dY6derkWTcuLi7PaxEYGKjIyEiXtgoVKuQ75uPvCnoef39/1ahRI8/zXK4xY8bo+PHjevPNN73SX37ba7fbde2118pms+Vpz+91qF27tst0aGioqlat6nzvf/zxR0l/BePIyEiXx2effeYc/JyrXLlyuvbaa4us/eTJkzp79my+7+0NN9ygnJwc/fzzz0X2c6nw8HC397Vc+b1e7vy/8VafAwcO1H//+1+tWbNG9erVyzO/qP0EZQtBBqVOeHi4oqOjtXv3bo/Wu/RDsCC+vr4etZtLBvG64//+7//UqVMnBQYGasaMGfrkk0+0evVq9ezZM9/+3L0yp1u3bjp48KCmTZum6Ohovfzyy6pXr95lj00oaJtLi+bNm6tly5YFHpUp6D0vaLDy1Xjvc8dUzZ8/X6tXr87zWL58ucvyAQEBLke1rra4uDhlZGR4FILceb289d7k9x7cc889MsZowoQJecawSd7fT1C6EWRQKt111106cOCAtmzZUuSysbGxysnJcf4lnCstLU2nTp1yXsHhLTk5OXn+usu9x0buIN0PPvhAgYGB+vTTT/XII4+offv2atOmjVeev2rVqnryySe1bNkyHTp0SBUrVnReVZK7rfv27cuz3r59+7z2WhT0POfPn9ehQ4e8+prnHpWZNWtWnnkVKlSQpDyDmL11RCg/l/4/O3PmjI4dO+Z872vWrClJqly5stq0aZPn0bJly8t63sjISAUHB+f73u7du1c+Pj55jiq64+6775YkvfPOO5dVV0GK873p3Lmz5syZo4ULFyo5OTnfZQrbT1C2EGRQKg0bNkwhISF67LHHlJaWlmf+gQMH9Prrr0uSOnToIEl5rsyYPHmypL/Gh3jbG2+84fy3MUZvvPGG/Pz81Lp1a0l//XVps9lc/vo8fPjwFd0mPjs7O89pqcqVKys6Otp5mfktt9yiypUr680333S59HzlypXas2eP116LNm3ayN/fX1OnTnX5i3n27NnKyMjw6mveokULtWzZUhMnTtSff/7pMi82Nla+vr7atGmTS/uMGTO89vyXeuutt5xXZUl/XY118eJFtW/fXpKUmJio8PBwvfTSSy7L5cq9x4unfH191bZtWy1fvtzlFGZaWpoWLlyopk2bOk/LeuK+++5T/fr19eKLL+b7h8Pp06c1YsQIj/vNDXR/f2+ys7O9dhPL3r17a+rUqXrzzTc1fPhwl+coaj9B2cIN8VAq1axZUwsXLtQDDzygG264weXOvps3b9Z7772nPn36SJISEhKUlJSkt956S6dOnVKLFi20bds2zZs3T507d1arVq28WltgYKBWrVqlpKQkNWrUSCtXrtTHH3+sZ5991jn+omPHjpo8ebLatWunnj176sSJE5o+fbpq1aqlXbt2Xdbznj59Wtdee63uu+8+523k16xZo//+97969dVXJUl+fn6aOHGiHn74YbVo0UI9evRQWlqa85LuQYMGeeU1iIyMVEpKisaOHat27dqpU6dO2rdvn2bMmKFbb73VZdCmN4wePTrf99Fut+v+++/XtGnTZLPZVLNmTX300Ud5xqF40/nz59W6dWt169bNuc1NmzZVp06dJP11anTmzJl66KGH1KBBA3Xv3l2RkZE6cuSIPv74YzVp0sQlCHvihRde0OrVq9W0aVM9+eSTKleunGbNmqWsrCy37rmSHz8/Py1ZskRt2rRR8+bN1a1bNzVp0kR+fn767rvvtHDhQlWoUMHjoxn16tVT48aNlZKSoj/++EMRERFavHixLl68eFl15qd///5yOBwaMWKE7Ha7nn32Wbf2E5QxJXfBFFC0H374wfTt29dUq1bN+Pv7m7CwMNOkSRMzbdo08+effzqXu3Dhghk7dqypXr268fPzMzExMSYlJcVlGWPyXr6bS1KeyzXzu4Q1KSnJhISEmAMHDpi2bdua4OBgExUVZUaPHu1yGbIxxsyePdvUrl3bBAQEmLi4ODN37lzn5cVFPfff5+VeApyVlWWGDh1qEhISTFhYmAkJCTEJCQlmxowZedZ79913zc0332wCAgJMRESEefDBB80vv/ziskzutlwqvxoL8sYbb5i4uDjj5+dnoqKizBNPPGHS09Pz7c/Ty68vlXuJ7qXv38mTJ03Xrl1NcHCwqVChgunXr5/ZvXt3vpdf57e9LVq0MPXq1cvTfun/ldzLiTdu3Ggef/xxU6FCBRMaGmoefPBB8/vvv+dZf/369SYxMdHY7XYTGBhoatasafr06WO2b99eZE2F+frrr01iYqIJDQ01wcHBplWrVi63EzDGs8uvc6Wnp5tRo0aZ+vXrm+DgYBMYGGji4+NNSkqKOXbsmHO5gl6vpKQkExsb69J24MAB06ZNGxMQEGCioqLMs88+a1avXp3v5dfu9Pn3y6//btiwYUaSeeONNzzaT1A22Iy5jNFswP+oPn366P3339eZM2dKuhQAgBgjAwAALIwgAwAALIsgAwAALIsxMgAAwLI4IgMAACyLIAMAACyrzN8QLycnR0ePHlVYWJjbv8UDAABKljFGp0+fVnR0dKG/R1bmg8zRo0cv6/dHAABAyfv5558L/YX4Mh9kwsLCJP31QlzO75AAAICrz+FwKCYmxvk9XpAyH2RyTyeFh4cTZAAAsJiihoUw2BcAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFgWQQYAAFhWuZIuwMqK+GVx4H+eMSVdAYCyjiMyAADAsggyAADAsggyAADAsggyAADAsggyAADAsggyAADAsggyAADAsggyAADAsggyAADAsko0yIwfP1633nqrwsLCVLlyZXXu3Fn79u1zWaZly5ay2Wwuj3/+858lVDEAAChNSjTIbNy4UcnJyfryyy+1evVqXbhwQW3btlVmZqbLcn379tWxY8ecj0mTJpVQxQAAoDQp0d9aWrVqlct0amqqKleurK+++krNmzd3tgcHB6tKlSpXuzwAAFDKlaoxMhkZGZKkiIgIl/YFCxaoUqVKio+PV0pKis6ePVtgH1lZWXI4HC4PAABQNpWaX7/OycnRwIED1aRJE8XHxzvbe/bsqdjYWEVHR2vXrl0aPny49u3bpyVLluTbz/jx4zV27NirVTYAAChBNmOMKekiJOmJJ57QypUr9fnnn+vaa68tcLl169apdevW2r9/v2rWrJlnflZWlrKyspzTDodDMTExysjIUHh4uFdrttm82h1Q5pSOTxcAVuRwOGS324v8/i4VR2T69++vjz76SJs2bSo0xEhSo0aNJKnAIBMQEKCAgIBiqRMAAJQuJRpkjDF66qmntHTpUm3YsEHVq1cvcp2dO3dKkqpWrVrM1QEAgNKuRINMcnKyFi5cqOXLlyssLEzHjx+XJNntdgUFBenAgQNauHChOnTooIoVK2rXrl0aNGiQmjdvrhtvvLEkSwcAAKVAiY6RsRUwyGTu3Lnq06ePfv75Z/Xq1Uu7d+9WZmamYmJidO+99+q5555ze7yLu+fYLgdjZIDCMUYGwOWyxBiZojJUTEyMNm7ceJWqAQAAVlOq7iMDAADgCYIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwrBINMuPHj9ett96qsLAwVa5cWZ07d9a+fftclvnzzz+VnJysihUrKjQ0VF27dlVaWloJVQwAAEqTEg0yGzduVHJysr788kutXr1aFy5cUNu2bZWZmelcZtCgQVqxYoXee+89bdy4UUePHlWXLl1KsGoAAFBa2IwxpqSLyHXy5ElVrlxZGzduVPPmzZWRkaHIyEgtXLhQ9913nyRp7969uuGGG7RlyxY1bty4yD4dDofsdrsyMjIUHh7u1XptNq92B5Q5pefTBYDVuPv9XarGyGRkZEiSIiIiJElfffWVLly4oDZt2jiXiYuL03XXXactW7bk20dWVpYcDofLAwAAlE2lJsjk5ORo4MCBatKkieLj4yVJx48fl7+/v8qXL++ybFRUlI4fP55vP+PHj5fdbnc+YmJiirt0AABQQkpNkElOTtbu3bu1ePHiK+onJSVFGRkZzsfPP//spQoBAEBpU66kC5Ck/v3766OPPtKmTZt07bXXOturVKmi8+fP69SpUy5HZdLS0lSlSpV8+woICFBAQEBxlwwAAEqBEj0iY4xR//79tXTpUq1bt07Vq1d3md+wYUP5+flp7dq1zrZ9+/bpyJEjuv322692uQAAoJQp0SMyycnJWrhwoZYvX66wsDDnuBe73a6goCDZ7XY9+uijGjx4sCIiIhQeHq6nnnpKt99+u1tXLAEAgLKtRC+/thVw/fLcuXPVp08fSX/dEG/IkCFatGiRsrKylJiYqBkzZhR4aulSXH4NlBwuvwZwudz9/i5V95EpDgQZoOSU7U8XAMXJkveRAQAA8ARBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWJbHQebnn3/WL7/84pzetm2bBg4cqLfeesurhQEAABTF4yDTs2dPrV+/XpJ0/Phx3Xnnndq2bZtGjBihcePGeb1AAACAgngcZHbv3q3bbrtNkvSf//xH8fHx2rx5sxYsWKDU1FRv1wcAAFAgj4PMhQsXFBAQIElas2aNOnXqJEmKi4vTsWPHvFsdAABAITwOMvXq1dObb76p//u//9Pq1avVrl07SdLRo0dVsWJFrxcIAABQEI+DzMSJEzVr1iy1bNlSPXr0UEJCgiTpww8/dJ5yAgAAuBpsxhjj6UrZ2dlyOByqUKGCs+3w4cMKDg5W5cqVvVrglXI4HLLb7crIyFB4eLhX+7bZvNodUOZ4/ukCAH9x9/vb4yMyixYtkq+vr0uIkaRq1arp5Zdf9rxSAACAy+RxkHniiSe0cuXKPO2DBg3SO++845WiAAAA3OFxkFmwYIF69Oihzz//3Nn21FNP6T//+Y/z/jIAAABXg8dBpmPHjpoxY4Y6deqkr776Sk8++aSWLFmi9evXKy4urjhqBAAAyFe5y1mpZ8+eOnXqlJo0aaLIyEht3LhRtWrV8nZtAAAAhXIryAwePDjf9sjISDVo0EAzZsxwtk2ePNk7lQEAABTBrSCzY8eOfNtr1aolh8PhnG/jemQAAHAVuRVkGMQLAABKI48H+wIAAJQWHg/2zczM1IQJE7R27VqdOHFCOTk5LvMPHjzoteIAAAAK43GQeeyxx7Rx40Y99NBDqlq1KuNiAJR5trF8zgEFMaNL9rdIPA4yK1eu1Mcff6wmTZoURz0AAABu83iMTIUKFRQREVEctQAAAHjE4yDz/PPPa9SoUTp79mxx1AMAAOA2j08tvfrqqzpw4ICioqJUrVo1+fn5ucz/+uuvvVYcAABAYTwOMp07dy6GMgAAADzncZAZPXp0cdQBAADgMW6IBwAALMvjIzI+Pj6F3jsmOzv7igoCAABwl8dBZunSpS7TFy5c0I4dOzRv3jyNHTvWa4UBAAAUxeMgc8899+Rpu++++1SvXj29++67evTRR71SGAAAQFG8NkamcePGWrt2rbe6AwAAKJJXgsy5c+c0depUXXPNNd7oDgAAwC0en1qqUKGCy2BfY4xOnz6t4OBgvfPOO14tDgAAoDAeB5kpU6a4TPv4+CgyMlKNGjVShQoVvFUXAABAkTwOMklJScVRBwAAgMc8DjKSdOrUKW3btk0nTpxQTk6Oy7zevXt7pTAAAICieBxkVqxYoQcffFBnzpxReHi4y3gZm81GkAEAAFeNx1ctDRkyRI888ojOnDmjU6dOKT093fn4448/iqNGAACAfHkcZH799Vc9/fTTCg4OLo56AAAA3OZxkElMTNT27duLoxYAAACPuDVG5sMPP3T+u2PHjho6dKi+//571a9fX35+fi7LdurUybsVAgAAFMCtINO5c+c8bePGjcvTZrPZ+PVrAABw1bh1aiknJ8eth6chZtOmTbr77rsVHR0tm82mZcuWuczv06ePbDaby6Ndu3YePQcAACi7vPajkZcjMzNTCQkJmj59eoHLtGvXTseOHXM+Fi1adBUrBAAApZnbQWbdunWqW7euHA5HnnkZGRmqV6+eNm3a5NGTt2/fXi+88ILuvffeApcJCAhQlSpVnA9+BgEAAORyO8hMmTJFffv2VXh4eJ55drtd/fr102uvvebV4iRpw4YNqly5surUqaMnnnhCv//+e6HLZ2VlyeFwuDwAAEDZ5HaQ+eabbwodn9K2bVt99dVXXikqV7t27fT2229r7dq1mjhxojZu3Kj27dsXOhZn/PjxstvtzkdMTIxXawIAAKWH2z9RkJaWludSa5eOypXTyZMnvVJUru7duzv/Xb9+fd14442qWbOmNmzYoNatW+e7TkpKigYPHuycdjgchBkAAMoot4/IXHPNNdq9e3eB83ft2qWqVat6paiC1KhRQ5UqVdL+/fsLXCYgIEDh4eEuDwAAUDa5HWQ6dOigkSNH6s8//8wz79y5cxo9erTuuusurxZ3qV9++UW///57sQcmAABgDW6fWnruuee0ZMkSXX/99erfv7/q1KkjSdq7d6+mT5+u7OxsjRgxwqMnP3PmjMvRlUOHDmnnzp2KiIhQRESExo4dq65du6pKlSo6cOCAhg0bplq1aikxMdGj5wEAAGWT20EmKipKmzdv1hNPPKGUlBQZYyT9dTffxMRETZ8+XVFRUR49+fbt29WqVSvndO7YlqSkJM2cOVO7du3SvHnzdOrUKUVHR6tt27Z6/vnnFRAQ4NHzAACAsslmchOJB9LT07V//34ZY1S7du1SfW8Xh8Mhu92ujIwMr4+Xsdm82h1Q5nj+6VI62cayswMFMaOLZ0d39/vb7SMyf1ehQgXdeuutl10cAACAN5ToTxQAAABcCYIMAACwLIIMAACwLLeCTIMGDZSeni5JGjdunM6ePVusRQEAALjDrSCzZ88eZWZmSpLGjh2rM2fOFGtRAAAA7nDrqqWbbrpJDz/8sJo2bSpjjF555RWFhobmu+yoUaO8WiAAAEBB3AoyqampGj16tD766CPZbDatXLlS5crlXdVmsxFkAADAVeNWkKlTp44WL14sSfLx8dHatWtVuXLlYi0MAACgKB7fEC8nJ6c46gAAAPDYZd3Z98CBA5oyZYr27NkjSapbt64GDBigmjVrerU4AACAwnh8H5lPP/1UdevW1bZt23TjjTfqxhtv1NatW1WvXj2tXr26OGoEAADIl8dHZP71r39p0KBBmjBhQp724cOH68477/RacQAAAIXx+IjMnj179Oijj+Zpf+SRR/T99997pSgAAAB3eBxkIiMjtXPnzjztO3fu5EomAABwVXl8aqlv3756/PHHdfDgQf3jH/+QJH3xxReaOHGiBg8e7PUCAQAACuJxkBk5cqTCwsL06quvKiUlRZIUHR2tMWPG6Omnn/Z6gQAAAAWxGWPM5a58+vRpSVJYWJjXCvI2h8Mhu92ujIwMhYeHe7Vvm82r3QFlzuV/upQutrHs7EBBzOji2dHd/f6+rPvI5CrNAQYAAJR9Hg/2BQAAKC0IMgAAwLIIMgAAwLI8CjIXLlxQ69at9eOPPxZXPQAAAG7zKMj4+flp165dxVULAACARzw+tdSrVy/Nnj27OGoBAADwiMeXX1+8eFFz5szRmjVr1LBhQ4WEhLjMnzx5steKAwAAKIzHQWb37t1q0KCBJOmHH35wmWfjDnEAAOAq8jjIrF+/vjjqAAAA8NhlX369f/9+ffrppzp37pwk6Qp+6QAAAOCyeBxkfv/9d7Vu3VrXX3+9OnTooGPHjkmSHn30UQ0ZMsTrBQIAABTE4yAzaNAg+fn56ciRIwoODna2P/DAA1q1apVXiwMAACiMx2NkPvvsM3366ae69tprXdpr166tn376yWuFAQAAFMXjIzKZmZkuR2Jy/fHHHwoICPBKUQAAAO7wOMg0a9ZMb7/9tnPaZrMpJydHkyZNUqtWrbxaHAAAQGE8PrU0adIktW7dWtu3b9f58+c1bNgwfffdd/rjjz/0xRdfFEeNAAAA+fL4iEx8fLx++OEHNW3aVPfcc48yMzPVpUsX7dixQzVr1iyOGgEAAPLl8REZSbLb7RoxYoS3awEAAPDIZQWZ9PR0zZ49W3v27JEk1a1bVw8//LAiIiK8WhwAAEBhPD61tGnTJlWrVk1Tp05Venq60tPTNXXqVFWvXl2bNm0qjhoBAADy5fERmeTkZD3wwAOaOXOmfH19JUnZ2dl68sknlZycrG+//dbrRQIAAOTH4yMy+/fv15AhQ5whRpJ8fX01ePBg7d+/36vFAQAAFMbjINOgQQPn2Ji/27NnjxISErxSFAAAgDvcOrW0a9cu57+ffvppDRgwQPv371fjxo0lSV9++aWmT5+uCRMmFE+VAAAA+bAZY0xRC/n4+Mhms6moRW02m7Kzs71WnDc4HA7Z7XZlZGQoPDzcq33bbF7tDihziv50sQbbWHZ2oCBmdPHs6O5+f7t1RObQoUNeKwwAAMBb3AoysbGxxV0HAACAxy7rhnhHjx7V559/rhMnTignJ8dl3tNPP+2VwgAAAIricZBJTU1Vv3795O/vr4oVK8r2t4EiNpuNIAMAAK4aj4PMyJEjNWrUKKWkpMjHx+OrtwEAALzG4yRy9uxZde/enRADAABKnMdp5NFHH9V7771XHLUAAAB4xONTS+PHj9ddd92lVatWqX79+vLz83OZP3nyZK8VBwAAUJjLCjKffvqp6tSpI0l5BvsCAABcLR4HmVdffVVz5sxRnz59iqEcAAAA93k8RiYgIEBNmjQpjloAAAA84nGQGTBggKZNm1YctQAAAHjE41NL27Zt07p16/TRRx+pXr16eQb7LlmyxGvFAQAAFMbjIzLly5dXly5d1KJFC1WqVEl2u93l4YlNmzbp7rvvVnR0tGw2m5YtW+Yy3xijUaNGqWrVqgoKClKbNm30448/eloyAAAoozw+IjN37lyvPXlmZqYSEhL0yCOPqEuXLnnmT5o0SVOnTtW8efNUvXp1jRw5UomJifr+++8VGBjotToAAIA1XdaPRnpL+/bt1b59+3znGWM0ZcoUPffcc7rnnnskSW+//baioqK0bNkyde/e/WqWCgAASiGPg0z16tULvV/MwYMHr6igXIcOHdLx48fVpk0bZ5vdblejRo20ZcuWAoNMVlaWsrKynNMOh8Mr9QAAgNLH4yAzcOBAl+kLFy5ox44dWrVqlYYOHeqtunT8+HFJUlRUlEt7VFSUc15+xo8fr7Fjx3qtDgAAUHp5HGQGDBiQb/v06dO1ffv2Ky7oSqWkpGjw4MHOaYfDoZiYmBKsCAAAFBev/YR1+/bt9cEHH3irO1WpUkWSlJaW5tKelpbmnJefgIAAhYeHuzwAAEDZ5LUg8/777ysiIsJb3al69eqqUqWK1q5d62xzOBzaunWrbr/9dq89DwAAsC6PTy3dfPPNLoN9jTE6fvy4Tp48qRkzZnjU15kzZ7R//37n9KFDh7Rz505FRETouuuu08CBA/XCCy+odu3azsuvo6Oj1blzZ0/LBgAAZZDHQebSEOHj46PIyEi1bNlScXFxHvW1fft2tWrVyjmdO7YlKSlJqampGjZsmDIzM/X444/r1KlTatq0qVatWsU9ZAAAgCTJZowxJV1EcXI4HLLb7crIyPD6eJlCrkIHIKmsfLrYxrKzAwUxo4tnR3f3+9trY2QAAACuNrdPLfn4+BR6IzxJstlsunjx4hUXBQAA4A63g8zSpUsLnLdlyxZNnTpVOTk5XikKAADAHW4HmdzfO/q7ffv26V//+pdWrFihBx98UOPGjfNqcQAAAIW5rDEyR48eVd++fVW/fn1dvHhRO3fu1Lx58xQbG+vt+gAAAArkUZDJyMjQ8OHDVatWLX333Xdau3atVqxYofj4+OKqDwAAoEBun1qaNGmSJk6cqCpVqmjRokX5nmoCAAC4mty+j4yPj4+CgoLUpk0b+fr6FrjckiVLvFacN3AfGaDkcB8ZoOwr6fvIuH1Epnfv3kVefg0AAHA1uR1kUlNTi7EMAAAAz3FnXwAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFmlOsiMGTNGNpvN5REXF1fSZQEAgFKiXEkXUJR69eppzZo1zuly5Up9yQAA4Cop9amgXLlyqlKlSkmXAQAASqFSfWpJkn788UdFR0erRo0aevDBB3XkyJFCl8/KypLD4XB5AACAsqlUB5lGjRopNTVVq1at0syZM3Xo0CE1a9ZMp0+fLnCd8ePHy263Ox8xMTFXsWIAAHA12YwxpqSLcNepU6cUGxuryZMn69FHH813maysLGVlZTmnHQ6HYmJilJGRofDwcK/WY7N5tTugzLHOp0vhbGPZ2YGCmNHFs6M7HA7Z7fYiv79L/RiZvytfvryuv/567d+/v8BlAgICFBAQcBWrAgAAJaVUn1q61JkzZ3TgwAFVrVq1pEsBAAClQKkOMs8884w2btyow4cPa/Pmzbr33nvl6+urHj16lHRpAACgFCjVp5Z++eUX9ejRQ7///rsiIyPVtGlTffnll4qMjCzp0gAAQClQqoPM4sWLS7oEAABQipXqU0sAAACFIcgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLIsgAAADLskSQmT59uqpVq6bAwEA1atRI27ZtK+mSAABAKVDqg8y7776rwYMHa/To0fr666+VkJCgxMREnThxoqRLAwAAJazUB5nJkyerb9++evjhh1W3bl29+eabCg4O1pw5c0q6NAAAUMLKlXQBhTl//ry++uorpaSkONt8fHzUpk0bbdmyJd91srKylJWV5ZzOyMiQJDkcjuItFkAeZWa3+7OkCwBKr+L6fs3t1xhT6HKlOsj89ttvys7OVlRUlEt7VFSU9u7dm+8648eP19ixY/O0x8TEFEuNAApmt5d0BQCKm31C8e7op0+flr2QD5NSHWQuR0pKigYPHuyczsnJ0R9//KGKFSvKZrOVYGUobg6HQzExMfr5558VHh5e0uUAKAbs5/87jDE6ffq0oqOjC12uVAeZSpUqydfXV2lpaS7taWlpqlKlSr7rBAQEKCAgwKWtfPnyxVUiSqHw8HA+4IAyjv38f0NhR2JylerBvv7+/mrYsKHWrl3rbMvJydHatWt1++23l2BlAACgNCjVR2QkafDgwUpKStItt9yi2267TVOmTFFmZqYefvjhki4NAACUsFIfZB544AGdPHlSo0aN0vHjx3XTTTdp1apVeQYAAwEBARo9enSeU4sAyg72c1zKZoq6rgkAAKCUKtVjZAAAAApDkAEAAJZFkAEAAJZFkAEAAJZFkAEAWN6GDRtks9l06tSpki4FVxlBBh7r06ePbDabbDab/Pz8VL16dQ0bNkx//un9X9arVq2apkyZ4vV+c3myLb/88ov8/f0VHx+fb1/GGP373//W7bffrvDwcIWGhqpevXoaMGCA9u/fX2zbAHhT7j4xYcIEl/Zly5ZZ/mdeqlWr5tzfg4ODVb9+ff2///f/8l120aJF8vX1VXJycr7zHQ6HRo4cqXr16ikoKEgVK1bUrbfeqkmTJik9Pb04NwOXIMjgsrRr107Hjh3TwYMH9dprr2nWrFkaPXp0SZdVoPPnzxc4z91tSU1NVbdu3eRwOLR161aXecYY9ezZU08//bQ6dOigzz77TN9//71mz56twMBAvfDCC17fJqC4BAYGauLEiV7/Qi5sP7xaxo0bp2PHjmn37t3q1auX+vbtq5UrV+ZZbvbs2Ro2bJgWLVqU5w+bP/74Q40bN9bcuXP1zDPPaOvWrfr666/14osvaseOHVq4cOHV2hxIkgE8lJSUZO655x6Xti5dupibb77ZOf3bb7+Z7t27m+joaBMUFGTi4+PNwoULXdZp0aKFSU5ONsnJySY8PNxUrFjRPPfccyYnJ8c5X5LLI9f7779v6tata/z9/U1sbKx55ZVXXPqOjY0148aNMw899JAJCwszSUlJl70txhiTk5NjatSoYVatWmWGDx9u+vbt6zJ/0aJFRpJZvnx5vs+Tu01AaZeUlGTuuusuExcXZ4YOHepsX7p0qbn0K+Ny9sO5c+cau91uVqxYYa6//noTFBRkunbtajIzM01qaqqJjY015cuXN0899ZS5ePGis6+3337bNGzY0ISGhpqoqCjTo0cPk5aW5py/fv16I8mkp6cXuG2xsbHmtddec2mLiIgwgwYNcmk7ePCgCQoKMqdOnTKNGjUyCxYscJnfr18/ExISYn799dd8n4f9/eriiAyu2O7du7V582b5+/s72/788081bNhQH3/8sXbv3q3HH39cDz30kLZt2+ay7rx581SuXDlt27ZNr7/+uiZPnuw81LtkyRJde+21zr+gjh07Jkn66quv1K1bN3Xv3l3ffvutxowZo5EjRyo1NdWl71deeUUJCQnasWOHRo4cednbIknr16/X2bNn1aZNG/Xq1UuLFy9WZmamc/6iRYtUp04dderUKd9+rX5IHv9bfH199dJLL2natGn65Zdf8l3mSvbDs2fPaurUqVq8eLFWrVqlDRs26N5779Unn3yiTz75RPPnz9esWbP0/vvvO/u5cOGCnn/+eX3zzTdatmyZDh8+rD59+lz2Nubk5OiDDz5Qenp6nv197ty56tixo+x2u3r16qXZs2e7rPfuu++qV69eBf4qM/v7VVbSSQrWk5SUZHx9fU1ISIgJCAgwkoyPj495//33C12vY8eOZsiQIc7pFi1amBtuuMHlr5fhw4ebG264wTmd319QPXv2NHfeeadL29ChQ03dunVd1uvcubPXtqVnz55m4MCBzumEhAQzd+5c53RcXJzp1KmTyzoDBgwwISEhJiQkxFxzzTVF1gKUBn8/Stm4cWPzyCOPGGPyHpG53P1w7ty5RpLZv3+/s61fv34mODjYnD592tmWmJho+vXrV2Cd//3vf40k5zruHpHx9/c3ISEhply5ckaSiYiIMD/++KNzmezsbBMTE2OWLVtmjDHm5MmTxt/f3xw8eNAYY8zx48eNJDN58mSXvhs0aODc37t3715gDfA+jsjgsrRq1Uo7d+7U1q1blZSUpIcfflhdu3Z1zs/Oztbzzz+v+vXrKyIiQqGhofr000915MgRl34aN27s8tfL7bffrh9//FHZ2dkFPveePXvUpEkTl7YmTZrkWe+WW27xyracOnVKS5YsUa9evZxtl/6Vlp8RI0Zo586dGjVqlM6cOeNWLUBpMnHiRM2bN0979uzJM+9K9sPg4GDVrFnTOR0VFaVq1aopNDTUpe3EiRPO6a+++kp33323rrvuOoWFhalFixaSlOczpShDhw7Vzp07tW7dOjVq1EivvfaaatWq5Zy/evVqZWZmqkOHDpKkSpUq6c4779ScOXMK7Xfp0qXauXOnEhMTde7cOY9qwpUp9T8aidIpJCTEufPPmTNHCQkJmj17th599FFJ0ssvv6zXX39dU6ZMUf369RUSEqKBAwde1cF+ISEhbi9X2LYsXLhQf/75pxo1auRcxxijnJwc/fDDD7r++utVu3Zt7du3z6XfyMhIRUZGqnLlyl7aIuDqat68uRITE5WSknLZp3Hy2w/9/PxcpnOvGry0LScnR5KUmZmpxMREJSYmasGCBYqMjNSRI0eUmJjo8WdKpUqVVKtWLdWqVUvvvfee6tevr1tuuUV169aV9Ncg3z/++ENBQUHOdXJycrRr1y6NHTtWkZGRKl++fJ79/brrrpMkhYWFcQn4VcYRGVwxHx8fPfvss3ruueecf4l88cUXuueee9SrVy8lJCSoRo0a+uGHH/Kse+nVP19++aVq164tX19fSZK/v3+eozM33HCDvvjiC5e2L774Qtdff71zPW9uy+zZszVkyBDt3LnT+fjmm2/UrFkz519pPXr00L59+7R8+fIren6gtJkwYYJWrFihLVu2uLQX5354qb179+r333/XhAkT1KxZM8XFxbkcrblcMTExeuCBB5SSkiJJ+v3337V8+XItXrzYZX/fsWOH0tPT9dlnn8nHx0fdunXTO++8o6NHj15xDbhyBBl4xf333y9fX19Nnz5dklS7dm2tXr1amzdv1p49e9SvXz+lpaXlWe/IkSMaPHiw9u3bp0WLFmnatGkaMGCAc361atW0adMm/frrr/rtt98kSUOGDNHatWv1/PPP64cfftC8efP0xhtv6JlnnvH6tuzcuVNff/21HnvsMcXHx7s8evTooXnz5unixYvq3r277rvvPnXv3l3jxo3T1q1bdfjwYW3cuFHvvvuu1z/Ygaulfv36evDBBzV16lSX9uLeD//uuuuuk7+/v6ZNm6aDBw/qww8/1PPPP++VvgcMGKAVK1Zo+/btmj9/vipWrKhu3bq57OsJCQnq0KGD83TySy+9pGuuuUa33Xab5syZo127dunAgQNaunSptmzZwv5+lRFk4BXlypVT//79NWnSJGVmZuq5555TgwYNlJiYqJYtW6pKlSrq3LlznvV69+6tc+fO6bbbblNycrIGDBigxx9/3Dl/3LhxOnz4sGrWrKnIyEhJUoMGDfSf//xHixcvVnx8vEaNGqVx48Zd0RUMBW3L9OnTVbduXcXFxeVZ7t5779WJEyf0ySefyGaz6d1339WUKVP0ySefqHXr1qpTp44eeeQRxcTE6PPPP/dKbUBJGDdunPM0T67i3g//LjIyUqmpqXrvvfdUt25dTZgwQa+88opX+q5bt67atm2rUaNGac6cObr33nvzveqoa9eu+vDDD/Xbb7+pYsWK2rZtm3r37q2XX35Zt912m+rXr68xY8bogQce0L///W+v1Ab32IwxpqSLwP+mli1b6qabbirWO/cCAMo2jsgAAADLIsgAAADL4tQSAACwLI7IAAAAyyLIAAAAyyLIAAAAyyLIAAAAyyLIAAAAyyLIAAAAyyLIAAAAyyLIAAAAy/r/AEOemN2SBj3eAAAAAElFTkSuQmCC\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "raptor_dict = {\"texts\": [], \"embeddings\": []}\n", + "for texts, embeddings in zip(raptor_texts, raptor_embeddings):\n", + " raptor_dict[\"texts\"].append(texts)\n", + " raptor_dict[\"embeddings\"].append(embeddings.tolist())\n", + "\n", + "normal_dict = {\"texts\": [], \"embeddings\": []}\n", + "for texts, embeddings in zip(chunks, normal_embeddings):\n", + " normal_dict[\"texts\"].append(texts)\n", + " normal_dict[\"embeddings\"].append(embeddings.tolist())" + ], + "metadata": { + "id": "To8QBJ-MIMqD" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "rag_raptor_df = pd.DataFrame(raptor_dict)\n", + "print(rag_raptor_df.shape)\n", + "\n", + "rag_normal_df = pd.DataFrame(normal_dict)\n", + "print(rag_normal_df.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "dyKzF8L8IiMI", + "outputId": "a1587ef6-3654-442a-9e0b-d1708830ddd9" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(26, 2)\n", + "(17, 2)\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import lancedb\n", + "import pyarrow as pa\n", + "from lancedb.pydantic import Vector, LanceModel\n", + "\n", + "uri = \"lancedb_database\"\n", + "db = lancedb.connect(uri)\n", + "\n", + "class RAG(LanceModel):\n", + " texts : str\n", + " embeddings : Vector(384)\n", + "\n", + "table_name = \"rag_with_raptor\"\n", + "raptor_table = db.create_table(table_name, schema = RAG, mode=\"overwrite\")\n", + "raptor_table.add(rag_raptor_df)\n", + "raptor_table.create_fts_index(\"texts\", replace=True)\n", + "\n", + "table_name = \"rag_without_raptor\"\n", + "normal_table = db.create_table(table_name, schema = RAG, mode=\"overwrite\")\n", + "normal_table.add(rag_normal_df)\n", + "normal_table.create_fts_index(\"texts\", replace=True)" + ], + "metadata": { + "id": "682ORFjhIPQL" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "def generate_results(\n", + " query : str,\n", + " context_text : str\n", + ") -> str:\n", + "\n", + " prompt = f\"\"\"\n", + " Based on the context provided, use it to answer the query.\n", + "\n", + " query : {query}\n", + "\n", + " Instructions:\n", + " 1. Don't make things up, Just use the contexts and generate the relevant answer.\n", + " 2. Don't mix the numbers, Just use the numbers in the context.\n", + " 3. Don't try to use fancy words, stick to the basics of the language that is being used in the context.\n", + "\n", + " {context_text}\n", + " \"\"\"\n", + " response = client.chat.completions.create(\n", + " model=\"gpt-4\",\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": \"You are a helpful assistant that answers query and give the answers.\"},\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ],\n", + " max_tokens=200,\n", + " n=1,\n", + " stop=None,\n", + " temperature=0.7\n", + " )\n", + " answer = response.choices[0].message.content.strip()\n", + " return answer" + ], + "metadata": { + "id": "q45B_5TyIRcD" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "query = \"NTT DATA's net income attributable to shareholders increased from ¥69,227 million in Q3 FY2020 to ¥110,191 million in Q3 FY2021. How does this growth align with their acquisition strategy, particularly considering their stated reasons for acquiring Nexient, LLC and the provisional goodwill recorded in this transaction?\"\n", + "raptor_contexts = raptor_table.search(query).limit(5).select([\"texts\"]).to_list()\n", + "raptor_context_text = \"------\\n\\n\".join([context[\"texts\"] for context in raptor_contexts])\n", + "raptor_context_text = \"------\\n\\n\" + raptor_context_text\n", + "\n", + "normal_contexts = normal_table.search(query).limit(5).select([\"texts\"]).to_list()\n", + "normal_context_text = \"------\\n\\n\".join([context[\"texts\"] for context in normal_contexts])\n", + "normal_context_text = \"------\\n\\n\" + normal_context_text\n", + "\n", + "raptor_answer = generate_results(query, raptor_context_text)\n", + "normal_answer = generate_results(query, normal_context_text)" + ], + "metadata": { + "id": "aoVxfQRjIT6J" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "raptor_answer" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 165 + }, + "id": "_0zQwN9LJWtH", + "outputId": "a64d2cf0-b4dd-40b3-f926-803a50653ea5" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "\"The growth in NTT DATA's net income attributable to shareholders from ¥69,227 million in Q3 FY2020 to ¥110,191 million in Q3 FY2021 aligns with their acquisition strategy as it suggests a positive financial performance that can support such growth initiatives. The acquisition of Nexient, LLC, for instance, is expected to enhance NTT DATA's capabilities and better meet clients' needs, which could potentially lead to further growth in income. The cost of this acquisition was 286 million yen, indicating that the company is investing in strategic acquisitions to drive growth. The fact that no significant changes were made in subsidiaries, accounting policies, or accounting estimates, nor were there revisions to the forecasts of dividends and consolidated results, further indicates a stable financial condition that can support such growth-oriented strategies.\"" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 41 + } + ] + }, + { + "cell_type": "code", + "source": [ + "normal_answer" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 109 + }, + "id": "ITtuQpERZgZh", + "outputId": "47cae890-88cc-4150-e880-3434d0b7ccd4" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "\"Based on the context provided, it can be seen that NTT DATA's net income attributable to shareholders has significantly increased from ¥69,227 million in Q3 FY2020 to ¥110,191 million in Q3 FY2021. However, the context does not provide information regarding how this growth aligns with their acquisition strategy. Similarly, there is no specific mention of their reasons for acquiring Nexient, LLC or the provisional goodwill recorded in this transaction. Therefore, it is not possible to provide a detailed answer to the query without making assumptions or adding additional information.\"" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 42 + } + ] + } + ] +} \ No newline at end of file