Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for 'language' field in scraped recipes #144

Merged
merged 26 commits into from
Apr 29, 2020
Merged
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
94d6544
Allow fallback from schema.org data to abstract scraper functions
jayaddison Apr 27, 2020
79c7573
Implement schema.org, HTML tag, and META tag language scraping
jayaddison Apr 27, 2020
1467f2d
Fixup: remove duplicate space
jayaddison Apr 27, 2020
fc223a0
Add test coverage for schema.org inLanguage field
jayaddison Apr 27, 2020
9541cc4
Only accept first language from meta http-equiv content-language
jayaddison Apr 27, 2020
7e7784f
Add BCP47 validation via language-tags library
jayaddison Apr 27, 2020
6b5c0ca
Return formatted language code tags
jayaddison Apr 27, 2020
1d9a1cc
Cleanup
jayaddison Apr 27, 2020
64eb829
Remove 'en' special-casing
jayaddison Apr 27, 2020
3e03be6
Remove 'en' if a more-specific alternative candidate language exists
jayaddison Apr 27, 2020
f86a2ae
Add explanatory comment
jayaddison Apr 27, 2020
b32d18b
Return first element from set iteration (safe for empty sets)
jayaddison Apr 27, 2020
793393a
Refactor schemaorg decorator
jayaddison Apr 28, 2020
b656516
Remove empty newline
jayaddison Apr 28, 2020
18c7e78
Cleanup
jayaddison Apr 28, 2020
96e4cce
Restore support for python3.5
jayaddison Apr 28, 2020
2182f7d
Add missing 'return' statement
jayaddison Apr 28, 2020
a74dc1f
Allow caller to enable meta http-equiv parsing
jayaddison Apr 28, 2020
45a8d71
Merge branch 'master' into language-field
jayaddison Apr 28, 2020
6698e36
Remove kwargs from scrape_me
jayaddison Apr 29, 2020
56214ea
Refactor WebsiteNotImplementedError
jayaddison Apr 29, 2020
4dc845d
wip
jayaddison Apr 29, 2020
39c529b
Introduce experimental 'harvest' method
jayaddison Apr 29, 2020
447e0d7
Argument ordering consistency
jayaddison Apr 29, 2020
666a8d5
Update dependencies in setup.py
jayaddison Apr 29, 2020
51d2352
Merge branch 'master' into language-field
jayaddison Apr 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Remove kwargs from scrape_me
jayaddison committed Apr 29, 2020
commit 6698e362efca2213f57cea8419a900fa17790381
4 changes: 2 additions & 2 deletions recipe_scrapers/__init__.py
Original file line number Diff line number Diff line change
@@ -138,7 +138,7 @@ class WebsiteNotImplementedError(NotImplementedError):
pass


def scrape_me(url_path, **kwargs):
def scrape_me(url_path):

host_name = url_path_to_dict(url_path.replace('://www.', '://'))['host']

@@ -149,7 +149,7 @@ def scrape_me(url_path, **kwargs):
"Website ({}) is not supported".format(host_name)
)

return scraper(url_path, **kwargs)
return scraper(url_path)


__all__ = ['scrape_me']