diff --git a/BACKLOG.md b/BACKLOG.md index e369bbb..7d607ba 100644 --- a/BACKLOG.md +++ b/BACKLOG.md @@ -1,17 +1,11 @@ # List of features and bugfixes I'm considering to add ## Known bugs -- [ ] sometimes images are not correctly scrapped and replaced, like in this post: [modernistyczny-poznan.blogspot.com](https://modernistyczny-poznan.blogspot.com/2021/08/wiepofama-10lat.html) -- [ ] app is not resistant to http errors, which is embarrassing +.. ## Scraping in general: - [ ] stop with keeping content in RAM - save it as ready to use ebook chapters -- [ ] use sitemaps.xml for scraping! -- [ ] replace blog url's in article content to actual chapters in ebook -- [ ] major refactor of Crawler class: - - [ ] use data models - - [ ] more common methods in crawler class - - [ ] expand crawler abstract +- [ ] replace blog internal url's in article content to actual chapters in ebook - [ ] support for blog categories, tags and pages - [ ] manually decide which crawler should be used - [ ] blog2epub.yaml - this might be too ambitious, but what if user could compose he's/hers own book, with custom @@ -25,6 +19,5 @@ ## Additional crawlers: - [ ] [nrdblog.cmosnet.eu](https://nrdblog.cmosnet.eu/) -- [ ] [zeissikonveb.de](zeissikonveb.de) - [ ] [scigacz.pl](https://www.scigacz.pl/) - [ ] [jednoslad.pl](https://www.jednoslad.pl) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1c29051..90745df 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,19 @@ # ChangeLog +### [v1.5.0](https://github.com/bohdanbobrowski/blog2epub/releases/tag/v1.5.0) - ? +- [X] integration testing +- [X] increase unit test coverage +- [X] use sitemaps.xml for scraping +- [X] crawlers refactor + - [X] use data models + - [X] more common methods in crawler class + - [X] expand crawler abstract +- [X] cli interface refactor +- [X] greek alphabet support +- [X] image download and attachment bug solved (ex. modernistyczny-poznan.blogspot.com) +- [X] improved resistance to http errors +- [X] dedicated crawler class for zeissikonveb.de + ### [v1.4.0](https://github.com/bohdanbobrowski/blog2epub/releases/tag/v1.4.0) - 2024-11-01 - [X] custom destination folder - [X] UI improvements (better scaling, more rely on KivyMD default features) diff --git a/README.md b/README.md index b5a470e..04491fd 100755 --- a/README.md +++ b/README.md @@ -149,10 +149,16 @@ Example: ### v1.5.0 - [X] integration testing - [X] increase unit test coverage +- [X] use sitemaps.xml for scraping - [X] crawlers refactor -- [X] add more crawlers + - [X] use data models + - [X] more common methods in crawler class + - [X] expand crawler abstract - [X] cli interface refactor -- [X] greek alphabet support +- [X] greek alphabet support +- [X] image download and attachment bug solved (ex. modernistyczny-poznan.blogspot.com) +- [X] improved resistance to http errors +- [X] dedicated crawler class for zeissikonveb.de [» Complete Change Log here «](https://github.com/bohdanbobrowski/blog2epub/blob/master/CHANGELOG.md) diff --git a/blog2epub/common/downloader.py b/blog2epub/common/downloader.py index e00360a..f278dae 100644 --- a/blog2epub/common/downloader.py +++ b/blog2epub/common/downloader.py @@ -155,7 +155,7 @@ def download_image(self, image_obj: ImageModel) -> bool: img_hash = self.get_urlhash(image_obj.url) img_type = os.path.splitext(image_obj.url)[1].lower() img_type = img_type.split("?")[0] - if img_type not in [".jpeg", ".jpg", ".png", ".bmp", ".gif", ".webp"]: + if img_type not in [".jpeg", ".jpg", ".png", ".bmp", ".gif", ".webp", ".heic"]: return False original_fn = os.path.join(self.dirs.originals, img_hash + "." + img_type) resized_fn = os.path.join(self.dirs.images, img_hash + ".jpg")