Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduce number of Puma processes and threads
Using a simple Suspenders application, I profiled memory usage for our default configuration, as well as a few others. I found the following: * A Puma cluster uses a master process and multiple worker processes. The amount of memory used by a cluster is equal to the memory usage of the master process plus the possible bloated size of a worker process times the number of worker processes. * At boot, a simple Suspenders application uses about 117MB for the master process and 109M for each worker. * After the first request is served, a process increases to around 117M, like the master process. * The amount of potential bloat increases with each thread, because it's possible for every thread to be handling a bloated request at once. * Using [siege], I determined that the expected bloat in a simple scenario is around 10M per thread. This will be much worse in some applications. This provides the following formula for maximum memory usage under load: master_usage + worker_count * (worker_usage + bloat * thread_count) For this simple Suspenders application, this formula provides the following worst-case usage: 117 + 3 * (117 + 10 * 5) = 618 This is over the 512MB limit for a 1x Heroku dyno, and the application is very simple. I recommend changing to a default of two worker processes and two threads per dyno, changing the usage to: 117 + 2 * (117 + 10 * 2) = 391 This provides reasonable performance with a high memory ceiling. When applications begin to show troublesome performance characteristics under load, developers can tune the application's process and thread count according to its real-world memory usage, possibly upgrading the dyno size as appropriate. [siege]: https://www.joedog.org/siege-home/
- Loading branch information