Skip to content

Latest commit

 

History

History
64 lines (41 loc) · 2.34 KB

.env.markdown

File metadata and controls

64 lines (41 loc) · 2.34 KB

Environmental file info

If you're running in production, you should set these securely.

However, if you just want to experiment, set the following values

Django Settings

These are all Django settings, defined in history4feed/settings.py

  • DJANGO_SECRET: insecure_django_secret
  • DJANGO_DEBUG: True
  • DJANGO_ALLOWED_HOSTS: BLANK
  • DJANGO_CORS_ALLOW_ALL_ORIGINS: True
  • DJANGO_CORS_ALLOWED_ORIGINS: LEAVE EMPTY

Postgres Settings

These are all Django settings, defined in history4feed/settings.py

  • POSTGRES_HOST: pgdb
  • POSTGRES_PORT: BLANK
  • POSTGRES_DB: postgres
  • POSTGRES_USER: postgres
  • POSTGRES_PASSWORD: postgres

Celery settings

  • CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP: 1

history4feed API settings

These define how the API behaves.

  • MAX_PAGE_SIZE: 50
    • This is the maximum number of results the API will ever return before pagination
  • DEFAULT_PAGE_SIZE: 50
    • The default page size of result returned by the API

Search Index Mode (Serper)

Search index mode, uses the Serper API Key to scrape search results.

Scrape backfill settings

  • EARLIEST_SEARCH_DATE: 2020-01-01T00:00:00Z
    • determines how far history4feed will backfill posts for newly added feeds. e.g. EARLIEST_SEARCH_DATE=2020-01-01T00:00:00Z will import all posts with a publish date >= 2020-01-01T00:00:00Z

Proxy settings

  • SCRAPFILE_APIKEY: YOUR_API_KEY
    • We strongly recommend using the ScrapFly proxy service with history4feed. Though we have no affiliation with them, it is the best proxy service we've tested and thus built in support for it to history4feed.

Settings to avoid rate limits if not using Scrapfly

If you're not using a Proxy it is very likely you'll run into rate limits on the WayBack Machine and the blogs you're requesting the full text from. You should therefore consider the following options

  • WAYBACK_SLEEP_SECONDS: 45
    • This is useful when a large amount of posts are returned. This sets the time between each request to get the full text of the article to reduce servers blocking robotic requests.
  • REQUEST_RETRY_COUNT: 3
    • This is useful when a large amount of posts are returned. This sets the number of retries when a non-200 response is returned.