If you're running in production, you should set these securely.
However, if you just want to experiment, set the following values
These are all Django settings, defined in history4feed/settings.py
DJANGO_SECRET
:insecure_django_secret
DJANGO_DEBUG
:True
DJANGO_ALLOWED_HOSTS
: BLANKDJANGO_CORS_ALLOW_ALL_ORIGINS
:True
DJANGO_CORS_ALLOWED_ORIGINS
: LEAVE EMPTY
These are all Django settings, defined in history4feed/settings.py
POSTGRES_HOST
:pgdb
POSTGRES_PORT
: BLANKPOSTGRES_DB
:postgres
POSTGRES_USER
:postgres
POSTGRES_PASSWORD
:postgres
CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP
:1
These define how the API behaves.
MAX_PAGE_SIZE
:50
- This is the maximum number of results the API will ever return before pagination
DEFAULT_PAGE_SIZE
:50
- The default page size of result returned by the API
Search index mode, uses the Serper API Key to scrape search results.
SERPER_API_KEY
EARLIEST_SEARCH_DATE
:2020-01-01T00:00:00Z
- determines how far history4feed will backfill posts for newly added feeds. e.g.
EARLIEST_SEARCH_DATE=2020-01-01T00:00:00Z
will import all posts with a publish date >=2020-01-01T00:00:00Z
- determines how far history4feed will backfill posts for newly added feeds. e.g.
SCRAPFILE_APIKEY
: YOUR_API_KEY- We strongly recommend using the ScrapFly proxy service with history4feed. Though we have no affiliation with them, it is the best proxy service we've tested and thus built in support for it to history4feed.
If you're not using a Proxy it is very likely you'll run into rate limits on the WayBack Machine and the blogs you're requesting the full text from. You should therefore consider the following options
WAYBACK_SLEEP_SECONDS
:45
- This is useful when a large amount of posts are returned. This sets the time between each request to get the full text of the article to reduce servers blocking robotic requests.
REQUEST_RETRY_COUNT
:3
- This is useful when a large amount of posts are returned. This sets the number of retries when a non-200 response is returned.