Real world open source book reviews aggregator, something like Metacritic / Digg for books. It allows to compare book price between different shops.
π΅π± Poland
- Wykop.pl (#bookmeter tag)
- Gildia.pl
- Literatura Gildia
- Granice.pl
- Matras.pl
- Bonito.pl
- Skupszop.pl
- Dadada.pl
- Aros.pl
- Publio.pl
- Hrosskar.blogspot.com
- krytycznymokiem.blogspot.com
- Madbooks.pl
- Gandalf.com.pl
- ibuk.pl
- Woblink.com
- Taniaksiazka.pl
- Bryk.pl
- Streszczenia.pl
- klp.pl
- legimi.pl
To be added soon:
π΅π± Poland
- polskina5.pl
- Virtualo.pl
- tantis.pl
- Znak.com.pl
- Swiatksiazki.pl
- wbibliotece.pl
- Wolnelektury.pl
- LitRes.pl
- audible.com
- Chodnikliteracki.pl
- czeskieklimaty.pl
- paskarz.pl
- litres.pl
- selkar.pl
- promocjeksiazkowe.pl (Blog Post)
- eczytanie-eksiazki.blogspot.com (Blog Post)
- Tantis.pl
- Gandalf.com
- Booklips.pl
- Allegro.pl
- Cyfroteka.pl
- Amazon.com
- Nieprzeczytane.pl
- wolnelektury.pl
- bookbook.pl
- nakanapie.pl
- opracowania.pl
- ksiegarnia-armoryka.pl
π World
- Goodreads
- wykop #ksiΔ ΕΌki as blog
- Book summary
- Changes history
- Mark as school reading
- Book summary aggregation
- Free readings download button
- Discover all author books (links discover queue, discover all series book, all author book)
- Add article scrapping (wykop, reddit, etc)
- Book series tree (al'a tree box)
- Allegro.pl / Amazon.pl / SkΔ piec.pl price synchronization integration
- Wikipedia style edit info proposals
- Automatic daily summary tag posting (wykop.pl, #bookmeter tag)
- Notifications about new reviews
- Front page customization (pin sections)
- Read list
- Category books RSS
- Price, activity diagram, notifications
- Category filters
- Trending books
- Emoji reactions
- Add comment after publishing entry on wykop.pl with links to shops, add comment to verify matched book
- Add current user library link to wykop comment
- Add website spiders (as separate module that appends content to redis)
- Fb top offers bot post publish
- product basket, compare multiple books prices in table and summarize per shop basket price
- RSS integration
- E-Book readers price section and reviews
- Section: Top Books/Reviews from Wykop.pl
- Machine learning for book (review) picking
- Users who bought this book bought also section
- Automatic blog posts
- SEOLinks on blog posts / reviews
- Tinder alternative but for books
- Wykop charts in comment
- Add trending stats
- Books summaries
- Dynamic create e-leaflets from books grouped by shop
- Add button on availability table with "add store link" and if user adds try to parse
- Video reviews
- Users might create own book regals
- allow users to add book store by configuring JSON / XML (https://news.ycombinator.com/item?id=27739568)
- add e-leaflets
- youtube reviews
- add coupons
- books cons table
- Lookup in Empik go, Legimi
cp .env.example .env # edit .env config
yarn install
yarn run migration:run
yarn run seed:run
gulp entity:reindex:all
[yarn run console]:
await app.select(ScrapperModule).get('BookParentCategoryService').findAndAssignMissingParentCategories();
await app.select(ScrapperModule).get('BookCategoryRankingService').refreshCategoryRanking();
await app.select(ScrapperModule).get('BookStatsService').refreshAllBooksStats();
[/console]
yarn run develop
gulp scrapper:refresh
Proxy local 9201 to remote ES
ssh -g -L 9201:localhost:9200 -f -N deploy@upolujksiazke.pl
There is NestJS context present on window, it is called app
. All entities are exporeted to context.
yarn console
Remove book:
app.select(ScrapperModule).get('BookService').delete([13])
Reindex all record of particular type (after index structure change or something):
app.select(ScrapperModule).get('EsBookIndex').reindexAllEntities();
Sitemap:
gulp sitemap:refresh
Fetchers:
# Reindex all records
gulp entity:reindex:all
# Fetches single review by id
gulp scrapper:refresh:single --kind BOOK_REVIEW --remoteId 123 --website wykop.pl
# Fetches single book by url
gulp scrapper:refresh:single --remoteId szepty-spoza-nicosci-remigiusz-mroz,p697692.html --website www.publio.pl
# Fetches all reviews from scrapper
gulp scrapper:refresh:all --kind BOOK_REVIEW --website wykop.pl
# Refreshes only first remote reviews page using all scrappers
gulp scrapper:refresh:latest --kind BOOK_REVIEW
gulp scrapper:refresh:latest --kind BOOK_REVIEW --website wykop.pl
# Fetches all reviews pages from websites using all scrappers
gulp scrapper:refresh:all --kind BOOK_REVIEW
# Fetches missing favicons
gulp entity:website:fetch-missing-logos
# Refreshes promotion value in categories
gulp entity:category:refresh-ranking
# After adding new scrapper fetch availability for books
gulp scrapper:loader:fetch-availability --scrapperGroupId=26
Analyzers:
# Iterates over all records and reparses them, dangerous!!
# it removes records that are not classified as reviews after analyze
gulp scrapper:reanalyze:all --kind BOOK_REVIEW
# Parses again single record
gulp scrapper:reanalyze:single --remoteId szepty-spoza-nicosci-remigiusz-mroz,p697692.html --website www.publio.pl
Stats (console):
app.select(BookModule).get('BookStatsService').refreshBooksStats(R.pluck('id', books))
Spiders:
gulp scrapper:spider:run
Scrappers:
Refresh all books from all websites:
node_modules/.bin/gulp scrapper:refresh:all --kind BOOK_REVIEW --initialPage 1 --website wykop.pl
node_modules/.bin/gulp scrapper:refresh:all --kind BOOK_REVIEW --website hrosskar.blogspot.com
Prevent clearing redis when warmup when lock is available (used for long tasks)
dist/locks/redis_warmup_flushdb.lock
-
Running
scrapper
tasks such asrefreshLatest
,refreshSingle
triggers fetching new records intoscrapper_metadata
table. All of these functions are stored inServiceModule -> ScrapperService
. After successful fetching page of scrapped contentScrapperService
creates new background job stored in redis that runs database and book matchers. -
Each job is later executed and
MetadataDbLoaderService
tries to match book in database and saves it.
Adding new scrapper:
- Create scrapper file
cd ./src/server/modules/importer/sites/
mkdir example-scrapper/
touch example-scrapper/ExampleScrapperGroup.ts
- Assign scrapper to
scrappersGroups
variable insideScrapperService
Real World Nest.JS + TypeORM app.
- Node.JS
- Nest.JS
- TypeORM
- React
- nginx
- Nomad