[Feature Request] Allow using a tag to exclude a document from processing and auto-add the tag to the document when processing fails #77

paradizelost · 2025-01-08T21:39:12Z

I have multiple documents that due to context tokens i'm unable to process, and it is repeatedly crashing ollama when attempted, this returns a 500 error to paperless-ai but paperless ai retrys the document on the next run. it would be good to have a way to note documents that have attempted processing but failed and not re-attempt them automatically without the tag being removed to have them back in the queue

paradizelost · 2025-01-08T21:58:35Z

clusterzx · 2025-01-08T22:06:15Z

In a comment you wrote you increased it from 10000 to 40000.
maybe you shouldn't do that unless you configure your ollama server so it handle such large context sizes?

I don't know what to do but shorten the content.

What is your suggestion?

paradizelost · 2025-01-08T22:11:34Z

I'm actually actively running with a num_ctx of 100000 and its working for all but 2 of my documents that are just way too big. thing is, once those documents are hit the process crashes and just repeats the last document it crashed on over and over.
I'm thinking the flow with paperless-ai should be something like:

will document fit in context size, if not truncate to context size and send to ollama
if ollama returns successful result (HTTP 200) attempt to use that data and move on, but if ollama returns an actual error (HTTP 500) tag the document such that it won't be re-attempted unless manual intervention happens.

the paperless-ai is explicitly setting the num_ctx parameter in the request, so there is no way to override that on the ollama side. I've attached a screenshot where it is being explicitly set by this service.

clusterzx · 2025-01-08T22:18:55Z

I just looked into the ollama documentation. You have to configure the context sizes there. It doesn't matter what is passed via the api as parameter.

paradizelost · 2025-01-08T22:20:32Z

I disagree, i'm editing it in the JS file i attached a screenshot to and am seeing changes in the context window sizes without any changes on the ollama side. I'd attempted to change it on ollama but it continued to be set to what is set in the above JS file

paradizelost · 2025-01-08T22:22:43Z

the way i'm editing it is via docker exec -it paperless-ai /bin/bash, then apt update && apt install vim -y, then editing the config file inside of the container itself, this doesn't persist between container rebuilds, but it works if i just stop and start the container.

clusterzx · 2025-01-08T22:28:50Z

What if you increase the context size to 128k? That's the maximum llama3.2 can handle.

Would be interesting to see what happens and if the doc processes.

paradizelost · 2025-01-08T22:30:23Z

the doc is so large that i actively run out of VRAM already with it set this high on those documents, ollama isn't apparently smart enough to keep itself from running out of memory. I also had to shut down stable diffusion to give ollama the full 12GB of VRAM i've got.

clusterzx · 2025-01-08T23:17:01Z

Hmmm okey. I will remove the 10k ctx value.

But there will be no future solution to process these files if the are so big.

I will add them to processed to not retry them later, after failure.

paradizelost · 2025-01-08T23:18:53Z

I don't mind the value being there, but it would be good to have it be configurable rather than hard-coded, just like the ollama URL is. just another field there that can set the paramter, and have it with a sane default (like the 10000)

clusterzx · 2025-01-08T23:26:18Z

For OpenAI it's quite simple as I can use tiktoken library to truncate to the maximum token size.

I will integrate your thoughts as a feature next release. Probally the best solution.

clusterzx changed the title ~~Allow using a tag to exclude a document from processing and auto-add the tag to the document when processing fails~~ [Feature Request] Allow using a tag to exclude a document from processing and auto-add the tag to the document when processing fails Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Allow using a tag to exclude a document from processing and auto-add the tag to the document when processing fails #77

[Feature Request] Allow using a tag to exclude a document from processing and auto-add the tag to the document when processing fails #77

paradizelost commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

paradizelost commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

[Feature Request] Allow using a tag to exclude a document from processing and auto-add the tag to the document when processing fails #77

[Feature Request] Allow using a tag to exclude a document from processing and auto-add the tag to the document when processing fails #77

Comments

paradizelost commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

paradizelost commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025

paradizelost commented Jan 8, 2025

clusterzx commented Jan 8, 2025