Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Release 1.4.0" #148

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/.doctrees/api.doctree
Binary file not shown.
Binary file modified docs/.doctrees/chat.doctree
Binary file not shown.
Binary file modified docs/.doctrees/mwes.doctree
Binary file not shown.
Binary file removed docs/_images/chat_mwe.png
Binary file not shown.
Binary file removed docs/_images/chat_stats.png
Binary file not shown.
43 changes: 3 additions & 40 deletions docs/_sources/chat.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,51 +19,14 @@ call the `chat` method to interact with the data and get insights from it via Na

from wordview.text_analysis import TextStatsPlots
imdb_df = pd.read_csv("data/IMDB_Dataset_sample_5k.csv")
with open("your_secrets_dir/openai_api_key.json", "r") as f:
with open("wordview/chat/secrets/openai_api_key.json", "r") as f:
credentials = json.load(f)

tsp = TextStatsPlots(df=imdb_df, text_column="review")
tsp.chat(api_key=credentials.get("openai_api_key"))

The chat UI is available under http://127.0.0.1:5000/

|chat_stats|
|chat|

Chat with MWEs
~~~~~~~~~~~~~~

After allowing Wordview to extract MWEs, you can call the `chat` method to get insights from this extraction through Natural Language.

.. code:: python

import json

import pandas as pd

from wordview.mwe_extraction import MWEs
from wordview.preprocessing import NgramExtractor

imdb_df = pd.read_csv("data/IMDB_Dataset_sample_5k.csv")
with open("your_secrets_dir/openai_api_key.json", "r") as f:
credentials = json.load(f)

extractor = NgramExtractor(imdb_df, "review")
extractor.extract_ngrams()
extractor.get_ngram_counts(ngram_count_file_path="ngram_counts.json")

mwe_obj = MWE(imdb_df, 'review',
ngram_count_file_path='ngram_counts.json',
language='EN',
custom_patterns="NP: {<DT>?<JJ>*<NN>}",
only_custom_patterns=False,
)
mwe_obj.extract_mwes(sort=True, top_n=10)
mwe_obj.chat(api_key=credentials.get("openai_api_key"))

The chat UI for MWEs is available under http://127.0.0.1:5001/

|chat_mwe|

.. |chat_stats| image:: ../figs/chat_stats.png

.. |chat_mwe| image:: ../figs/chat_mwe.png
.. |chat| image:: ../figs/chat.png
12 changes: 4 additions & 8 deletions docs/_sources/mwes.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,16 @@ the documentation.
custom_patterns="NP: {<DT>?<JJ>*<NN>}",
only_custom_patterns=False,
)
mwe_obj.extract_mwes(sort=True, top_n=10)
json.dump(mwe_obj.mwes, open('data/mwes.json', 'w'), indent=4)
mwes = mwe_obj.extract_mwes(sort=True, top_n=10)
json.dump(mwes, open('data/mwes.json', 'w'), indent=4)


The above returns the results in a dictionary, that in this example we stored in a json file called `data/mwes.json`.
The above returns the results in a dictionary, that in this example we stored in `mwes.json` file.
You can also return the result in a table:

.. code-block:: python

mwe_obj.print_mwe_table()

Which will return a table like this:

.. code-block:: text

╔═════════════════════════╦═══════════════╗
║ LVC ║ Association ║
╠═════════════════════════╬═══════════════╣
Expand Down
29 changes: 5 additions & 24 deletions docs/api.html

Large diffs are not rendered by default.

36 changes: 2 additions & 34 deletions docs/chat.html
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,6 @@
<li class="toctree-l1"><a class="reference internal" href="clustering.html">Cluster Analysis</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Chat with Wordview</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#chat-with-textstatsplots">Chat with TextStatsPlots</a></li>
<li class="toctree-l2"><a class="reference internal" href="#chat-with-mwes">Chat with MWEs</a></li>
</ul>
</li>
</ul>
Expand Down Expand Up @@ -116,46 +115,15 @@ <h2>Chat with TextStatsPlots<a class="headerlink" href="#chat-with-textstatsplot

<span class="kn">from</span> <span class="nn">wordview.text_analysis</span> <span class="kn">import</span> <span class="n">TextStatsPlots</span>
<span class="n">imdb_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s2">&quot;data/IMDB_Dataset_sample_5k.csv&quot;</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;your_secrets_dir/openai_api_key.json&quot;</span><span class="p">,</span> <span class="s2">&quot;r&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;wordview/chat/secrets/openai_api_key.json&quot;</span><span class="p">,</span> <span class="s2">&quot;r&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">credentials</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>

<span class="n">tsp</span> <span class="o">=</span> <span class="n">TextStatsPlots</span><span class="p">(</span><span class="n">df</span><span class="o">=</span><span class="n">imdb_df</span><span class="p">,</span> <span class="n">text_column</span><span class="o">=</span><span class="s2">&quot;review&quot;</span><span class="p">)</span>
<span class="n">tsp</span><span class="o">.</span><span class="n">chat</span><span class="p">(</span><span class="n">api_key</span><span class="o">=</span><span class="n">credentials</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;openai_api_key&quot;</span><span class="p">))</span>
</pre></div>
</div>
<p>The chat UI is available under <a class="reference external" href="http://127.0.0.1:5000/">http://127.0.0.1:5000/</a></p>
<p><img alt="chat_stats" src="_images/chat_stats.png" /></p>
</section>
<section id="chat-with-mwes">
<h2>Chat with MWEs<a class="headerlink" href="#chat-with-mwes" title="Permalink to this heading"></a></h2>
<p>After allowing Wordview to extract MWEs, you can call the <cite>chat</cite> method to get insights from this extraction through Natural Language.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">json</span>

<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>

<span class="kn">from</span> <span class="nn">wordview.mwe_extraction</span> <span class="kn">import</span> <span class="n">MWEs</span>
<span class="kn">from</span> <span class="nn">wordview.preprocessing</span> <span class="kn">import</span> <span class="n">NgramExtractor</span>

<span class="n">imdb_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s2">&quot;data/IMDB_Dataset_sample_5k.csv&quot;</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;your_secrets_dir/openai_api_key.json&quot;</span><span class="p">,</span> <span class="s2">&quot;r&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">credentials</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>

<span class="n">extractor</span> <span class="o">=</span> <span class="n">NgramExtractor</span><span class="p">(</span><span class="n">imdb_df</span><span class="p">,</span> <span class="s2">&quot;review&quot;</span><span class="p">)</span>
<span class="n">extractor</span><span class="o">.</span><span class="n">extract_ngrams</span><span class="p">()</span>
<span class="n">extractor</span><span class="o">.</span><span class="n">get_ngram_counts</span><span class="p">(</span><span class="n">ngram_count_file_path</span><span class="o">=</span><span class="s2">&quot;ngram_counts.json&quot;</span><span class="p">)</span>

<span class="n">mwe_obj</span> <span class="o">=</span> <span class="n">MWE</span><span class="p">(</span><span class="n">imdb_df</span><span class="p">,</span> <span class="s1">&#39;review&#39;</span><span class="p">,</span>
<span class="n">ngram_count_file_path</span><span class="o">=</span><span class="s1">&#39;ngram_counts.json&#39;</span><span class="p">,</span>
<span class="n">language</span><span class="o">=</span><span class="s1">&#39;EN&#39;</span><span class="p">,</span>
<span class="n">custom_patterns</span><span class="o">=</span><span class="s2">&quot;NP: {&lt;DT&gt;?&lt;JJ&gt;*&lt;NN&gt;}&quot;</span><span class="p">,</span>
<span class="n">only_custom_patterns</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">mwe_obj</span><span class="o">.</span><span class="n">extract_mwes</span><span class="p">(</span><span class="n">sort</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">top_n</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">mwe_obj</span><span class="o">.</span><span class="n">chat</span><span class="p">(</span><span class="n">api_key</span><span class="o">=</span><span class="n">credentials</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;openai_api_key&quot;</span><span class="p">))</span>
</pre></div>
</div>
<p>The chat UI for MWEs is available under <a class="reference external" href="http://127.0.0.1:5001/">http://127.0.0.1:5001/</a></p>
<p><img alt="chat_mwe" src="_images/chat_mwe.png" /></p>
<p><img alt="chat" src="_images/chat.png" /></p>
</section>
</section>

Expand Down
6 changes: 1 addition & 5 deletions docs/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -153,12 +153,8 @@ <h2 id="B">B</h2>
<h2 id="C">C</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="api.html#wordview.mwes.MWE.chat">chat() (wordview.mwes.MWE method)</a>

<ul>
<li><a href="api.html#wordview.text_analysis.TextStatsPlots.chat">(wordview.text_analysis.TextStatsPlots method)</a>
<li><a href="api.html#wordview.text_analysis.TextStatsPlots.chat">chat() (wordview.text_analysis.TextStatsPlots method)</a>
</li>
</ul></li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="api.html#wordview.clustering.cluster.Cluster">Cluster (class in wordview.clustering.cluster)</a>
Expand Down
14 changes: 5 additions & 9 deletions docs/mwes.html
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@
<li class="toctree-l1"><a class="reference internal" href="bias.html">Bias Analysis</a></li>
<li class="toctree-l1"><a class="reference internal" href="anomalies.html">Analysis of Anomalies &amp; Outliers</a></li>
<li class="toctree-l1"><a class="reference internal" href="clustering.html">Cluster Analysis</a></li>
<li class="toctree-l1"><a class="reference internal" href="chat.html">Chat with Wordview</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Utilities</span></p>
<ul>
Expand Down Expand Up @@ -127,17 +126,14 @@ <h1>Analysis &amp; Extraction of Multiword Expressions (MWEs)<a class="headerlin
<span class="n">custom_patterns</span><span class="o">=</span><span class="s2">&quot;NP: {&lt;DT&gt;?&lt;JJ&gt;*&lt;NN&gt;}&quot;</span><span class="p">,</span>
<span class="n">only_custom_patterns</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">mwe_obj</span><span class="o">.</span><span class="n">extract_mwes</span><span class="p">(</span><span class="n">sort</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">top_n</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">mwe_obj</span><span class="o">.</span><span class="n">mwes</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;data/mwes.json&#39;</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">),</span> <span class="n">indent</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
<span class="n">mwes</span> <span class="o">=</span> <span class="n">mwe_obj</span><span class="o">.</span><span class="n">extract_mwes</span><span class="p">(</span><span class="n">sort</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">top_n</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">mwes</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;data/mwes.json&#39;</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">),</span> <span class="n">indent</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
</pre></div>
</div>
<p>The above returns the results in a dictionary, that in this example we stored in a json file called <cite>data/mwes.json</cite>.
<p>The above returns the results in a dictionary, that in this example we stored in <cite>mwes.json</cite> file.
You can also return the result in a table:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mwe_obj</span><span class="o">.</span><span class="n">print_mwe_table</span><span class="p">()</span>
</pre></div>
</div>
<p>Which will return a table like this:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>╔═════════════════════════╦═══════════════╗
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>mwe_obj.print_mwe_table()
╔═════════════════════════╦═══════════════╗
║ LVC ║ Association ║
╠═════════════════════════╬═══════════════╣
║ SHOOT the binding ║ 26.02 ║
Expand Down
Binary file modified docs/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/searchindex.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "wordview"
version = "1.4.0"
version = "1.3.0"
description = """Wordview is a Python package for Exploratory Data Analysis of text and provides many statistics about your data in the form of plots, tables, and descriptions allowing you to have both a high-level and detailed overview of your data."""
authors = ["meghdadFar <meghdad.farahmand@gmail.com>"]
include = ["CHANGES.rst"]
Expand Down
Binary file added sphinx-docs/figs/chat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed sphinx-docs/figs/chat_mwe.png
Binary file not shown.
Binary file removed sphinx-docs/figs/chat_stats.png
Binary file not shown.
6 changes: 2 additions & 4 deletions sphinx-docs/source/chat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ call the `chat` method to interact with the data and get insights from it via Na

The chat UI is available under http://127.0.0.1:5000/

|chat_stats|

Chat with MWEs
~~~~~~~~~~~~~~
Expand Down Expand Up @@ -62,8 +61,7 @@ After allowing Wordview to extract MWEs, you can call the `chat` method to get i

The chat UI for MWEs is available under http://127.0.0.1:5001/

|chat_mwe|

.. |chat_stats| image:: ../figs/chat_stats.png
|chat|

.. |chat_mwe| image:: ../figs/chat_mwe.png
.. |chat| image:: ../figs/chat.png
2 changes: 1 addition & 1 deletion wordview/chat_ui/chat.html
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
} */
.message-container {
overflow-y: auto; /* Enables vertical scrolling */
max-height: 850px; /* Set a max-height that fits your design */
max-height: 500px; /* Set a max-height that fits your design */
padding: 10px;
margin-bottom: 10px;
width: 100%; /* Ensure it fills the container */
Expand Down
5 changes: 2 additions & 3 deletions wordview/mwes/mwe.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def __init__(
language: str = "EN",
custom_patterns: Optional[str] = None,
only_custom_patterns: bool = False,
mwe_frequency_threshold: int = 10,
mwe_frequency_threshold: int = 3,
association_threshold: float = 1.0,
) -> None:
"""Initializes a new instance of MWE class.
Expand All @@ -64,7 +64,7 @@ def __init__(
ADJP: {<RB|RBR|RBS>*<JJ>} # Adjective phrase
ADVP: {<RB.*>+<VB.*><RB.*>*} # Adverb phrase'''
only_custom_pattern: If True, only the custom pattern will be used to extract MWEs, otherwise, the default patterns will be used as well.
mwe_frequency_threshold: The minimum frequency of an MWE to be considered for extraction. Defaults to 10.
mwe_frequency_threshold: The minimum frequency of an MWE to be considered for extraction. Defaults to 3.
association_threshold: A threshold value for the association measure. Only MWEs with an association measure above this threshold will be returned.

Returns:
Expand Down Expand Up @@ -151,7 +151,6 @@ def chat(self, api_key: str = ""):
"MWE Type": "MWE instance 1": "Association measure", "MWE instance 2": "Association measure", ...\n
- There could be other custom types in which case you should just mention the dictionary key.\n
- Depending on a parameter N set by the user, each MWE type contains at most N instances. But it can contain less or even 0.
- Return the association measures that you read from the dictionary with only two decimal places.
"""
chat_history = [
{"role": "system", "content": base_content},
Expand Down
2 changes: 1 addition & 1 deletion wordview/text_analysis/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ def chat(self, api_key: str = ""):
------------------------------
{self.return_stats()}
\n\n
Do NOT say according to Wordview Analysis dictionary.
Answer the questions without adding According to or Based on to the Wordview Analysis dictionary.
"""
chat_history = [
{"role": "system", "content": base_content},
Expand Down
Loading