python linux architecture embedchain[dataloaders] 6GB bloat #1302
amber-beasley-liatrio
started this conversation in
General
Replies: 2 comments 1 reply
-
Hello! I think at the moment, Embedchain does not provide pre-compiled binary versions of the embedchain package or its dependencies. When you install embedchain, it indeed compiles several dependencies, which can significantly increase the image size. I'm sorry I can't be more helpful! |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks! We ended up isolating install to bare minimum required package to ingest sitemap data_type. ~~This reduced the image size from ~6GB to ~1.5GB~~ image size still large (around 6GB) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello!
context
I am using Embedchain for a python slack app. I want to use xml and other datatype. As the code lists, we have to pip install embedchain[dataloaders].
https://github.com/embedchain/embedchain/blob/faacfeb8913b3727747b6d9af4deca753caa4848/embedchain/loaders/xml.py#L7
problem
When you pip install embedchain[dataloaders], the image size will bloat by ~6GB for a linux architecture becuase it has to compile a bunch of dependencies for dataloaders.
Pulling images at this size takes quite a long time and can cause problems in unit testing on runner that do not have enough room to do the pip install.
question
Are there any plans to get binary versions of embedchain[dataloaders] (or its dependencies) so we don't have to compile during a pip install ?
Beta Was this translation helpful? Give feedback.
All reactions