Most recent releases are shown at the top. Each release shows:
- New: New classes, methods, functions, etc
- Changed: Additional parameters, changes to inputs or outputs, etc
- Fixed: Bug fixes that don't change documented behaviour
- support for custom metadata in vectorstore (#126)
- use os.walk instead of glob for extract_files and remove dot from extensions (#127)
- Add
batch_size
parameter toLLM.ingest
(#128) - use generators in
load_documents
(#129) - Changed
split_list
tobatch_list
- N/A
- Support for using self-ask prompt strategy with RAG (#120)
- Improved table understanding when invoking
LLm.ask
. (#124) - helpers for document metadata (#121)
- Added
k
andscore_threshold
arguments toLLM.ask
(#122) - Added
n_proc
paramter to control the number of CPUs used byLLM.ingest
(ee09807) - Upgrade version of
chromadb
(#125)
- Ensure table-processing is sequential and not parallelized (#123)
- Fixes to support newer version of
langchain_community
. (#125)
- Added
HFClassifier
topipelines.classifier
module (#119) - Added
SKClassifier
topipelines.classifier
module (#118) sk
"helper" module to fit simple scikit-learn text models (#117)
- Added
process_documents
function (#117)
- Pass
autodetect_encoding
argument toTextLoader
(#116)
- N/A
- N/A
- Fix for HF chat template issue (#113/#114)
- Support for structured outputs (#110)
- Support for table extraction (#106, #107)
- Facilitate identifying tables extracted as HTML (#112)
- Remove dependnency on deprecated RetrievalQA (#108)
- Refactored code base (#109)
- Use new JSON-safe formatting of prompt templates (#109)
- Added
utils.format_string
function to help format template strings with embedded JSON (#105) - support stop strings with transformers (#111)
- N/A
- Changed
pdf_use_unstructured
topdf_unstructured
andpdf2md
topdf_markdown
(#102)
- N/A
- Improved PDF text extraction including optional markdown conversion, table inference, and OCR (#100)
- N/A
- Add support for HF training (#98)
- Default to localhost in Web app (#99)
- N/A
- N/A
- Allow all Hugging Face pipeline/model arguments to be supplied (#96)
- N/A
- Refactored Hugging Face transformers backend (#95)
- Suppress swig deprecation warning (#93)
- Raise error if summarizers encounter bad document (#94)
- Support for Hugging Face transformers as LLM engine instead of Llama.cpp
LLM.prompt
now accepts OpenAI-style messages in form of list of dictionaries
- Remove unused imports (#92)
- Added
default_model
parameter toLLM
to more easily useLlama-3.1-8B-Instruct
.
- N/A
- N/A
- N/A
- Added key-value pair,
ocr:True
, toDocument.metadata
when PDF is OCR'ed (#91)
- removed dead code in
pipelines.summarizer
(#88)
- N/A
- Removed
include_surrounding
parameter fromsummarize_by_concept
- N/A
- Support for concept-focused summarizations (#87)
- Replace
use_larger
parameter withuse_zephyr
- Replace deprecated
CallbackManager
(#86)
- N/A
- N/A
- Check if
docs
is None (#85)
- N/A
- N/A
- Fixed error when raising Exceptions in
Ingester
(#84)
- N/A
- N/A
- Resolve issues with PDFs that mix OCR/not-OCR (#83)
- N/A
- Auto set some unstructured settings based on input (#81)
- Ensure any supplied
unstructured
kwargs
do not persist (#81)
- Better PDF OCR support and table-handling (#75, #80)
- add
pdf_use_unstructured
argument toLLM.ingest
for PDF OCR and better table-handling (#79) - Allow configuration of unstructured for PDFs from
LLM.ingest
(#80)
- N/A
- OCR support (#75)
- Added
Ingester.store_documents
method (#36,#77)
- switch to
langchain_huggingface
andlangchain_chroma
(#78)
- N/A
- N/A
Added preproc_fn
to Extractor.apply
(#74)
- N/A
- N/A
- Segment needs to accept arguments in extractor pipeline (#70)
- N/A
- Add clean function to
Extractor.apply
(#69)
- Remove BOS token from default prompt (#67)
- Remove call to
db.persist
(#68)
- Use OnPrem.LLM with OpenAI-compatible REST APIs (#61)
- information extraction pipeline (#64)
- experimental support for Azure OpenAI (#63)
- Docker support
- Few-Shot classification pipeline (#66)
- change default model to Mistral (#65)
- allow installation of onprem without llama-cpp-python for easier use with LLMs served through REST APIs (#62)
- Added
ignore_fn
argument toLLM.ingest
to allow more control over ignoring certain files (#58) - Added
Ingester.get_ingested_files
to show files ingested into vector database (#59)
- If encountering a loading error when processing a file, skip and continue instead of halting (#60)
- Add check for partially download files (#49)
- Support for OpenAI models (#55)
LLM.prompt
, 'LLM.ask, and
LLM.chatnow accept extra
**kwargs` that are sent diretly to model (#54)
- N/A
- N/A
- Updates for
langchain>=0.1.0
(which is now minimum version)
- N/A
- Uses Zephyr-7B as default model in
webapp.yml
. (#52)
- Added
stop
paramter toLLM.prompt
(overridesstop
paramter supplied to constructor) (#53)
- N/A
- N/A
- Added
prompt_template
parameter toLLM
constructor (#51) - Added
update_max_tokens
andupdate_stop
methods toLLM
for dynamic adjustments during prompt experiments
- Explicitly set
offload_kqv
to ensure GPUs are fully utilized (#50)
- Summarization pipeline (#35)
- Upgrades to all dependencies, but pin
chromadb==0.4.15
to retain compatibilitiy with older langchain - Default
n_ctx
(context window) changed to 3900
- N/A
- The
guider
module, a simplistic interface to Guidance (#34)
- N/A
- N/A
- N/A
- progress bar for embeddings creation (#46)
- Support model-specific prompt templates in
LLM.ask
method (#47)
- Added
python-docx
as dependency (#43) - Added
python-pptx
as dependency (#44) - Pass
prompt_template
toask
method in Web app (#47) - Skip file beginning with '~$' in
LLM.ingest
(#45)
- N/A
- N/A
- Added warning if URL is not pointing to GGUF model file. (#40)
- N/A
- N/A
- Changed default value for
verbose
inLLM
from False to True due tollama-cpp-python
bug (#37)
- N/A
- Remove pin for
llama-cpp-python
so latest is always used (#33)
- N/A
- N/A
- Include
prompt_template
variable in YAML (#32)
- N/A
- N/A
- Breaking Change: The
LLM.ask
method now returns a dictionary with keys:answer
,source_documents
, andquestion
(#31)
- N/A
- N/A
- Added
rag_text_path
andverbose
to defaultwebapp.yml
.
- Moving
load_llm
to constructor seems to prevent model loading issues inLlamacpp
(#30)
- N/A
- round scores in web app to 3 decimal places (#29)
- N/A
- attempt to auto-create symlinks for serving source documents
- N/A
- N/A
- Support for hyperlinks to sources in RAG screen of Web app (#28)
- N/A
LLM.ingest
converts relative paths to absolute paths during ingestion
- Support for
GGUF
format as the default LLM format. (#1)
- All default models have been changed to
GGUF
models. - updated pin for
llama-cpp-python
to support GGUF format.
- Misc adjustments and bug fixes for built-in Web app
- Built-in Web app for both RAG and general prompting
- Possible Breaking Change: Support for
score_threshold
inLLM.ask
andLLM.chat
(#26) - Use
CallbackManager
(#24)
- N/A
- N/A
LLM.chat
now includessource_documents
in output (#23)
- N/A
- The
LLM.chat
method supports question-answering with conversational memory. (#20)
LLM
now accepts acallbacks
parameter for custom callbacks. (#21)- added additional examples
- N/A
- Support for prompt templates in
ask
(#17)
- Added
LLM.load_qa
method
- batchify input to
Chroma
(#18)
- N/A
- N/A
- pass
embedding_model_kwargs
andembedding_encode_kwargs
toHuggingFaceEmbeddings
(#16)
- N/A
- Added
Ingester.get_embeddings
method to access instance ofHuggingFaceEmbeddings
- Added
chunk_size
andchunk_overlap
parameters toIngester.ingest
andLLM.ingest
(#13)
- Check to ensure
source_directory
is a folder inLLM.ingest
(#15)
- N/A
- Accept extra
kwargs
and supply them tolangchain.llms.Llamacpp
(#12) - Add optional argument to specify custom path to vector DB (#11)
- N/A
- N/A
- Add optional argument to specify custom path to download LLM (#5), thanks to @rabilrbl
- Fixed capitalization in download confirmation (#9), thanks to @rabilrbl
- Insert dummy replacement of decorator into numpy
- N/A
- Print
persist_directory
when creating new vector store - Revert
numpy
pin
- N/A
- N/A
- Pin to
numpy==1.23.3
due to_no_nep50
error in some environments
- N/A
- Last release without CHANGELOG updates