Examine possible bugs with processing larger files (part of new resource upload) #50

moodler · 2024-02-16T12:39:09Z

No description provided.

aleclofabbro · 2024-02-19T15:07:48Z

Yesterday moodle.net process hanged due to memory/cpu overload
Eduard noticed that by chance 7 hours later and rebooted manually

issue is in the file-text-extraction procedure, that get stuck on certain files
in this case a 44MB .pptx file halted the system
after rebooting the system restarted from there, and didn't hang this time
though it didn't manage to extract text, due to a long list of similar format errors like:

@#[line:4,col:500] [xmldom error]        element parse error: Error: Hierarchy request error: Only one element can be added and only after doctype

it's not the first time this issue occurs, and surely it's not going to be the last

we probably can prevent hanging in the future by providing more RAM to the VM, but it looks more like a temporary non-scalable, non-sustainable patch

the real problem is it seems the text extraction tools we're using are not reliable/efficient
they're some old npm libs, and I didn't find any newer .

we'd need to discuss a bit about this feature issue both:

at functional layer: to figure out reliable and efficient tools for file analisys
from arch/infra layer: separation of services, having distinct instances of the software performing different tasks, avoiding a failure in one service to halt the whole system.

we can anyway mitigate with the current setup, by finish the job for the containerized deployments, with ICT standardized kubernetes+docker managements

aleclofabbro · 2024-02-19T15:08:32Z

ha !
a little good news about that pptx resource:
the metadata genearation process after reboot actually actually provided - just a little - something

"title": "Community Mental Health and Folk Psychiatry in Tribal India",
"description": "ATSUKO IBATA",
"learningOutcomes": [],
"language": { "code": "eng" },
"level": {"code": "ED5"},
"subject": {"code": "F0914"},
"type": null

plus a generated image

aleclofabbro · 2024-02-19T15:08:56Z

I tested on my laptop the scenario that halted moodle.net, uploading the same resource:
a couple of observation :

it produces same errors and similar outcomes
there's 200MB memory peak during processing ( 350MB from a base 150MB )

so, not as huge peak.

aleclofabbro · 2024-02-19T15:11:40Z

i just made some manual tests directly on moodle.net processing multiple times the same file that halted yesterday..
it works correctly, with peaks like in my laptop
but on about the third attempt the process peaks up the memory usage up to 90% !! and halts the whole VM

so it's quite random , and I couldn't replicate on my laptop so far

here's the tests setup and observations

fresh restart
procedure worked as expected twice: with growing peaks of mem: from ~5% to ~10% during processing and consequent sink down back to ~5% after
third attempt: almost immediate mem explosion to >90% (probably more, but top simply stopped refreshing)
VM RAM: 4GB

aleclofabbro · 2024-02-19T17:57:38Z

some relevant issues from textract

takeaways:

check current nodejs mem options
run it in separate process (like webpack compiler)
run it in a Worker ?

... though it won't help for mem runaways .. unless running a mem-capped child process ?

aleclofabbro · 2024-03-12T15:28:59Z

finally modified openai-autofill package to require and rely on an external Tika Service for file text extraction
tikaUrl is a required package config now

moodler added this to MoodleNet Current Work Jan 31, 2024

moodler converted this from a draft issue Feb 16, 2024

moodler assigned aleclofabbro Feb 16, 2024

moodler moved this from 🆕 New to ⏭️ Next in MoodleNet Current Work Feb 16, 2024

aleclofabbro moved this from ⏭️ Next to 🏗 In progress in MoodleNet Current Work Feb 20, 2024

aleclofabbro closed this as completed Mar 12, 2024

github-project-automation bot moved this from 🏗 In progress to ✅ Done in MoodleNet Current Work Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examine possible bugs with processing larger files (part of new resource upload) #50

Examine possible bugs with processing larger files (part of new resource upload) #50

moodler commented Feb 16, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Mar 12, 2024

Examine possible bugs with processing larger files (part of new resource upload) #50

Examine possible bugs with processing larger files (part of new resource upload) #50

Comments

moodler commented Feb 16, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Feb 19, 2024

aleclofabbro commented Mar 12, 2024