diff --git a/README.md b/README.md index 258f57c..28db8e7 100644 --- a/README.md +++ b/README.md @@ -54,6 +54,30 @@ It supports a variety of interactive workflows, each designed to address a speci All tutorial data and examples used in Intelligence Toolkit were created for this purpose using the [`Generate Mock Data`](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/generate_mock_data/README.md) workflow. +### What workflow should I use? +Use the diagram to identify an appropriate workflow, which can be opened from the left sidebar while running the application. + +```mermaid +%%{init: { + "flowchart": {"htmlLabels": true}} }%% +flowchart TD + NoData["Input: None"] --> |"Generate Mock Data
workflow"| MockData["AI-Generated Records"] + NoData["Input: None"] --> |"Generate Mock Data
workflow"| MockText["AI-Generated Texts"] + MockText["AI-Generated Texts"] --> TextDocs["Input: Text Data"] + MockData["AI-Generated Records"] --> PersonalData["Input: Personal Case Records"] + MockData["AI-Generated Records"] --> CaseRecords["Input: Case Records"] + MockData["AI-Generated Records"] --> EntityData["Input: Entity Records"] + PersonalData["Input: Personal Case Records"] ----> |"Anonymize Case Data
workflow"| AnonData["Anonymous Case Records"] + CaseRecords["Input: Case Records"] ---> HasTime{"Time
Attributes?"} + HasTime{"Time
Attributes?"} --> |"Detect Case Patterns
workflow"| CasePatterns["AI Pattern Reports"] + CaseRecords["Input: Case Records"] ---> HasGroups{"Grouping
Attributes?"} + HasGroups{"Grouping
Attributes?"} --> |"Compare Case Groups
workflow"| MatchedEntities["AI Group Reports"] + EntityData["Input: Entity Records"] ---> HasInconsistencies{"Inconsistent
Attributes?"} --> |"Match Entity Records
workflow"| RecordLinking["AI-Matched Records"] + EntityData["Input: Entity Records"] ---> HasIdentifiers{"Identifying
Attributes?"} --> |"Detect Entity Networks
workflow"| NetworkAnalysis["AI Network Reports"] + TextDocs["Input: Text Data"] ---> NeedRecords{"Need
Records?"} --> |"Extract Record Data
workflow"| ExtractedRecords["AI-Extracted Records"] + TextDocs["Input: Text Data"] ---> NeedAnswers{"Need
Answers?"} --> |"Query Text Data
workflow"| AnswerReports["AI Answer Reports"] +``` + ### How was Intelligence Toolkit evaluated? The Intelligence Toolkit was designed, refined, and evaluated in the context of the [Tech Against Trafficking (TAT)](https://techagainsttrafficking.org/) accelerator program with [Issara Institute](https://www.issarainstitute.org/) and [Polaris](https://polarisproject.org/) (2023-2024). It includes and builds on prior accelerator outputs developed with [Unseen](https://www.unseenuk.org/) (2021-2022) and [IOM](https://www.iom.int/)/[CTDC](https://www.ctdatacollaborative.org/) (2019-2020). See this [launch blog](https://www.microsoft.com/en-us/research/blog/empowering-ngos-with-generative-ai-in-the-fight-against-human-trafficking/) for more information. @@ -106,50 +130,6 @@ All use of Intelligence Toolkit should be consistent with this documentation. In ## Getting Started -### Setting up the AI model - -Intelligence Toolkit can be used with either OpenAI or Azure OpenAI as the generative AI API. - -The [`Generate Mock Data`](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/generate_mock_data/README.md) and [`Extract Record Data`](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/extract_record_data/README.md) workflows additionally use OpenAI's Structured Outputs API, which requires a gpt-4o model as follows: - -- `gpt-4o-mini` -- `gpt-4o` - -You can access the `Settings` page on the left sidebar when running the application: - -- For OpenAI, you will need an active OpenAI account ([create here](https://platform.openai.com/login)) and API key ([create here](https://platform.openai.com/account/api-keys)). - -- For Azure OpenAI, you will need an active Azure account ([create here](https://portal.azure.com/)), endpoint, key and version for the AI Service ([create here](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/OpenAI)). - -### Selecting the right workflow for the data and task - -Use the diagram to identify an appropriate workflow, which can be opened from the left sidebar while running the application. - -```mermaid -%%{init: { - "flowchart": {"htmlLabels": true}} }%% -flowchart TD - NoData["Input: None"] --> |"Generate Mock Data
workflow"| MockData["AI-Generated Records"] - NoData["Input: None"] --> |"Generate Mock Data
workflow"| MockText["AI-Generated Texts"] - MockText["AI-Generated Texts"] --> TextDocs["Input: Text Data"] - MockData["AI-Generated Records"] --> PersonalData["Input: Personal Case Records"] - MockData["AI-Generated Records"] --> CaseRecords["Input: Case Records"] - MockData["AI-Generated Records"] --> EntityData["Input: Entity Records"] - PersonalData["Input: Personal Case Records"] ----> |"Anonymize Case Data
workflow"| AnonData["Anonymous Case Records"] - CaseRecords["Input: Case Records"] ---> HasTime{"Time
Attributes?"} - HasTime{"Time
Attributes?"} --> |"Detect Case Patterns
workflow"| CasePatterns["AI Pattern Reports"] - CaseRecords["Input: Case Records"] ---> HasGroups{"Grouping
Attributes?"} - HasGroups{"Grouping
Attributes?"} --> |"Compare Case Groups
workflow"| MatchedEntities["AI Group Reports"] - EntityData["Input: Entity Records"] ---> HasInconsistencies{"Inconsistent
Attributes?"} --> |"Match Entity Records
workflow"| RecordLinking["AI-Matched Records"] - EntityData["Input: Entity Records"] ---> HasIdentifiers{"Identifying
Attributes?"} --> |"Detect Entity Networks
workflow"| NetworkAnalysis["AI Network Reports"] - TextDocs["Input: Text Data"] ---> NeedRecords{"Need
Records?"} --> |"Extract Record Data
workflow"| ExtractedRecords["AI-Extracted Records"] - TextDocs["Input: Text Data"] ---> NeedAnswers{"Need
Answers?"} --> |"Query Text Data
workflow"| AnswerReports["AI Answer Reports"] -``` - -## Diving Deeper - -### Getting started - You can start using the Intelligence Toolkit as either a web application (with a tool called Docker) or a Python package (via PyPI). Choose one of the options below based on your needs. **Option 1: Using Intelligence Toolkit as a Web Application (via Docker)** @@ -191,6 +171,22 @@ Open [http://localhost:80](http://localhost:80) in your web browser to start usi **Note:** Docker Desktop App may enter sleep mode if inactive. In this case, open Docker Desktop, select Container in the left menu, then press play on intelligence-toolkit. +**6. Setting up the AI model:** + +Intelligence Toolkit can be used with either OpenAI or Azure OpenAI as the generative AI API. + +The [`Generate Mock Data`](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/generate_mock_data/README.md) and [`Extract Record Data`](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/extract_record_data/README.md) workflows additionally use OpenAI's Structured Outputs API, which requires a gpt-4o model as follows: + +- `gpt-4o-mini` +- `gpt-4o` + +You can access the `Settings` page on the left sidebar when running the application: + +- For OpenAI, you will need an active OpenAI account ([create here](https://platform.openai.com/login)) and API key ([create here](https://platform.openai.com/account/api-keys)). + +- For Azure OpenAI, you will need an active Azure account ([create here](https://portal.azure.com/)), endpoint, key and version for the AI Service ([create here](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/OpenAI)). + + **Option 2: Using Intelligence Toolkit as a Python Package (via PyPI)** If you prefer to use Intelligence Toolkit as a Python package, install it directly from PyPI: diff --git a/app/Home.py b/app/Home.py index dc42e2e..2857e54 100644 --- a/app/Home.py +++ b/app/Home.py @@ -31,8 +31,26 @@ def get_readme_and_mermaid(): ) parts = content.split("") parts = "# Intelligence Toolkit" + parts[1] - parts = parts.split("```mermaid") - return parts[0], parts[1].split("## Diving Deeper")[0].replace("```", "") + parts_text_original = parts.split("## Getting Started")[0] + parts = parts_text_original.split("```mermaid") + mermaid = ( + parts[1] + .split("### How was Intelligence Toolkit evaluated?")[0] + .replace("```", "") + ) + + parts_text = parts_text_original.split("### What workflow should I use?")[0] + parts_text += parts_text_original.split( + "### How was Intelligence Toolkit evaluated?" + )[1] + + openai_text = content.split("")[1].split("**6. Setting up the AI model:**")[1] + parts_text += "## Getting Started" + parts_text += openai_text.split( + "**Option 2: Using Intelligence Toolkit as a Python Package (via PyPI)**" + )[0] + + return parts_text, mermaid def main(): @@ -48,6 +66,7 @@ def main(): st.markdown(transparency_faq) + st.markdown("### What workflow should I use?") mermaid.mermaid( code=mermaid_text, height=1000,