Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflection #317

Open
Onaffair opened this issue Jul 22, 2024 · 1 comment
Open

Reflection #317

Onaffair opened this issue Jul 22, 2024 · 1 comment

Comments

@Onaffair
Copy link

Example:
book = epub.read_epub(epub_file)
for item in book.items:
pass

I found that item.get_content() would remove the external style of css in xhtml ,while item.content would save it

@aerkalov
Copy link
Owner

When library was created the idea was that it would be used to produce 100% valid EPUB3 files. And at that time most of the EPUB files were invalid EPUB 2 files with some EPUB 3 files. That is why the idea was that you would read input EPUB file into an object and instead of cleaning that book from the garbage and making it valid EPUB3 file you would create new book and you would copy there only the things you need and know are correct.

That is why you have item.content with original content and you can use lxml to parse that and find things you need (for instance in the headers and etc) and you also have item.get_content() which should be used for the books you are creating. That method will always return clean and valid content. That is why for the newly created pages you use item.add_item() to add style sheet files or JS files to it and don't use header content for it (because it will be ignored). Why? A lot of input files would have .css/.js/.png/.jpeg located who knows where and a lot of content can be invalid and the idea of the library has always been to created very structured and unified EPUB3 files which would have fonts in ./Fonts/ directory, images in ./Images/, style sheet files in ./Styles/ and etc. etc. and would always pass epubcheck validation.

This sucks if you want to just read EPUB file which is 100% valid, change something and write it down but 11 years ago when initial library was written that was not really the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants