Skip to content

Latest commit

 

History

History
74 lines (65 loc) · 3.17 KB

todo.md

File metadata and controls

74 lines (65 loc) · 3.17 KB

TODO

  • Job features

    • Implement Job.uniq_by/n, if this method is present, than there should be at most one pending job for that key ever (think subscribed services). We might want to implement this along with the group_by feature. Something like: group_by(key), do: key uniq, do: true
    • Job retries
      • Change WAL to retry jobs after reading all events rather than simply failing the jobs
  • Add a way to import jobs from redis (sidekiq, resque, etc), probably in a different mix package.

  • Add a way to wait for a job to finish, once we have this, get rid of all the :timer.sleep in tests.

  • Rotate WAL.State logs

  • Control back-pressure

    • Be able to run in an open-loop (no back pressure, normal job behaviour)
    • Be able to provide back pressure: Jobs take longer to enqueue to prevent overflowing the system.
  • Refactor:

    • GroupedQueue is currently a mess implemented during a spike to make the tests pass.
      • Make the code cleaner
      • Measure and improve performance, we have linear time operations all over the place, plus probably also putting a lot of pressure on GC.
    • Queue interface should include a way to access process and failed counts so that we can implement those differently on different queues (currently it's just a field in the struct).
      • Maybe even move the entire queue metrics to it's own structure with increment and getter methods?
      • Or implement it by listening to WAL events.
    • Start pipelines automatically based off of meta-programming
    • Implement TestJob in an helper that can be reused in every test to reduce duplication.
  • WAL

    • Configurable buffer size (0 - consistent, * - prone to inconsistency but faster)
    • Allow WAL.File to write asynchronously to the log (via config flag) - this gets us another 20/50% improvement in performance, trading for security and at least once guarantees.
      • Maybe write append synchronously, but all other events asynchronously?
      • Make the WAL payload smaller
    • Snapshots
      • Run off of memory, when shutting down push a snapshot that includes the current state.
    • Reduce each WAL event size by serializing smaller payloads
    • Move to a WAL per Pipeline, rather than one WAL for everything!
  • Support distributed workers

    • If we separate the Queue from the Source, then we can have one machine with all the queues and separate the worker pipelines to different machines - this seems like the most viable option.
    • First thought is implement a leader + followers
    • Raft or similar for leader election
    • Do we want to optimize for speed or safety?
      • I.E.- Strong or eventual consistency?
      • Eventual consistency prevents us from guarantying at least once delivery if a node fails.
      • Strong consistency might be too slow, specially if clusters are located in different zones.
  • Support deploy where the store runs in a separate process, this would help deploys so that we don't need to pay the price of doing a snapshot and then reloading everything. It would also be the first step in supporting distributed workers because we would have the ability to run the store separately.