Knowledge Infused AI

This is a research initiative by Intuit AI Research and focuses on techniques and methodologies to incorporate knowledge of various forms to improve the performance of AI systems. Some of our work in this space include -

Synthetic Knowledge Ingestion :- Ski Knowledge Refinement and Injection for Enhancing Large Language Models
RAG Context Ranking :- HyQE Ranking Contexts with Hypothetical Query Embeddings

🔥 News

[2024.11] Blog Enhancing LLMs with Synthetic Knowledge Ingestion: A Novel Approach from Intuit AI Research at EMNLP 2024
[2024.10] HyQE paper accepted at EMNLP 2024
[2024.10] Ski paper accepted at EMNLP 2024

Synthetic Knowledge Ingestion

Ski: Towards Knowledge Refinement and Injection for Enhancing Large Language Models

[Paper] [Code]

Large language models (LLMs) are proficient in capturing factual knowledge across various domains. However, refining their capabilities on previously seen knowledge or integrating new knowledge from external sources remains a significant challenge. In this work, we propose a novel synthetic knowledge ingestion method called Ski, which leverages fine-grained synthesis, interleaved generation, and assemble augmentation strategies to construct high-quality data representations from raw knowledge sources. We then integrate Ski and its variations with three knowledge injection techniques: Retrieval Augmented Generation (RAG), Supervised Fine-tuning (SFT), and Continual Pre-training (CPT) to inject and refine knowledge in language models. Extensive empirical experiments are conducted on various question-answering tasks spanning finance, biomedicine, and open-generation domains to demonstrate that Ski significantly outperforms baseline methods by facilitating effective knowledge injection. We believe that our work is an important step towards enhancing the factual accuracy of LLM outputs by refining knowledge representation and injection capabilities.

RAG Context Ranking

HyQE: Ranking Contexts with Hypothetical Query Embeddings

[Paper] [Code]

In retrieval-augmented systems, context ranking techniques are commonly employed to reorder the retrieved contexts based on their relevance to a user query. A standard approach is to measure this relevance through the similarity between contexts and queries in the embedding space. However, such similarity often fails to capture the relevance. Alternatively, large language models (LLMs) have been used for ranking contexts. However, they can encounter scalability issues when the number of candidate contexts grows and the context window sizes of the LLMs remain constrained. Additionally, these approaches require fine-tuning LLMs with domain-specific data. In this work, we introduce a scalable ranking framework that combines embedding similarity and LLM capabilities without requiring LLM fine-tuning. Our framework uses a pre-trained LLM to hypothesize the user query based on the retrieved contexts and ranks the context based on the similarity between the hypothesized queries and the user query. Our framework is efficient at inference time and is compatible with many other retrieval and ranking techniques. Experimental results show that our method improves the ranking performance across multiple benchmarks. This work has been done in collaboration with Boston University.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
synthetic-knowledge-ingestion		synthetic-knowledge-ingestion
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Infused AI

🔥 News

Synthetic Knowledge Ingestion

Ski: Towards Knowledge Refinement and Injection for Enhancing Large Language Models

RAG Context Ranking

HyQE: Ranking Contexts with Hypothetical Query Embeddings

About

Releases

Packages

Contributors 2

Languages

License

intuit-ai-research/knowledge-infused-ai

Folders and files

Latest commit

History

Repository files navigation

Knowledge Infused AI

🔥 News

Synthetic Knowledge Ingestion

Ski: Towards Knowledge Refinement and Injection for Enhancing Large Language Models

RAG Context Ranking

HyQE: Ranking Contexts with Hypothetical Query Embeddings

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages