Skip to content

Commit

Permalink
Documentation and *.heic support
Browse files Browse the repository at this point in the history
  • Loading branch information
bohdanbobrowski committed Nov 15, 2024
1 parent ff42c65 commit 0e7737d
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 12 deletions.
11 changes: 2 additions & 9 deletions BACKLOG.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,11 @@
# List of features and bugfixes I'm considering to add

## Known bugs
- [ ] sometimes images are not correctly scrapped and replaced, like in this post: [modernistyczny-poznan.blogspot.com](https://modernistyczny-poznan.blogspot.com/2021/08/wiepofama-10lat.html)
- [ ] app is not resistant to http errors, which is embarrassing
..

## Scraping in general:
- [ ] stop with keeping content in RAM - save it as ready to use ebook chapters
- [ ] use sitemaps.xml for scraping!
- [ ] replace blog url's in article content to actual chapters in ebook
- [ ] major refactor of Crawler class:
- [ ] use data models
- [ ] more common methods in crawler class
- [ ] expand crawler abstract
- [ ] replace blog internal url's in article content to actual chapters in ebook
- [ ] support for blog categories, tags and pages
- [ ] manually decide which crawler should be used
- [ ] blog2epub.yaml - this might be too ambitious, but what if user could compose he's/hers own book, with custom
Expand All @@ -25,6 +19,5 @@

## Additional crawlers:
- [ ] [nrdblog.cmosnet.eu](https://nrdblog.cmosnet.eu/)
- [ ] [zeissikonveb.de](zeissikonveb.de)
- [ ] [scigacz.pl](https://www.scigacz.pl/)
- [ ] [jednoslad.pl](https://www.jednoslad.pl)
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# ChangeLog

### [v1.5.0](https://github.com/bohdanbobrowski/blog2epub/releases/tag/v1.5.0) - ?
- [X] integration testing
- [X] increase unit test coverage
- [X] use sitemaps.xml for scraping
- [X] crawlers refactor
- [X] use data models
- [X] more common methods in crawler class
- [X] expand crawler abstract
- [X] cli interface refactor
- [X] greek alphabet support
- [X] image download and attachment bug solved (ex. modernistyczny-poznan.blogspot.com)
- [X] improved resistance to http errors
- [X] dedicated crawler class for zeissikonveb.de

### [v1.4.0](https://github.com/bohdanbobrowski/blog2epub/releases/tag/v1.4.0) - 2024-11-01
- [X] custom destination folder
- [X] UI improvements (better scaling, more rely on KivyMD default features)
Expand Down
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,10 +149,16 @@ Example:
### v1.5.0
- [X] integration testing
- [X] increase unit test coverage
- [X] use sitemaps.xml for scraping
- [X] crawlers refactor
- [X] add more crawlers
- [X] use data models
- [X] more common methods in crawler class
- [X] expand crawler abstract
- [X] cli interface refactor
- [X] greek alphabet support
- [X] greek alphabet support
- [X] image download and attachment bug solved (ex. modernistyczny-poznan.blogspot.com)
- [X] improved resistance to http errors
- [X] dedicated crawler class for zeissikonveb.de


[» Complete Change Log here «](https://github.com/bohdanbobrowski/blog2epub/blob/master/CHANGELOG.md)
Expand Down
2 changes: 1 addition & 1 deletion blog2epub/common/downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ def download_image(self, image_obj: ImageModel) -> bool:
img_hash = self.get_urlhash(image_obj.url)
img_type = os.path.splitext(image_obj.url)[1].lower()
img_type = img_type.split("?")[0]
if img_type not in [".jpeg", ".jpg", ".png", ".bmp", ".gif", ".webp"]:
if img_type not in [".jpeg", ".jpg", ".png", ".bmp", ".gif", ".webp", ".heic"]:
return False
original_fn = os.path.join(self.dirs.originals, img_hash + "." + img_type)
resized_fn = os.path.join(self.dirs.images, img_hash + ".jpg")
Expand Down

0 comments on commit 0e7737d

Please sign in to comment.