The manager is the second element of the chain (starting from the bottom). The task of a manager is to get from standard input a file, split it into multiple works and assigns them to workers. The directives through user or other component can be communicated with a manager in this specific form:
- a string that indicates the path file name
- a number that indicates the number of worker that the manager must handle
The structure of a manger is basically composed by two threads. One reads from standard input for new directives, the other communicates with workers.
The task of this thread is very simple:
- read from standard input the path
- read from standard input the number of workers
- post processing the input
- enqueue new directives in a pending list
The task of this thread is composed by some sequentially steps:
- check if new directives were added
- assign work to a pending worker
- read worker's work
- send to standard output a summary
If a new directive was added the size of the file is analyzed, then works equal to current number of workers are created splitting the total size of the file.
If the worker amount changes during the execution, all doing works are invalidated and are splitted again in multiples works. This operation is made for efficiency and parallelism reasons.
Workers' works are read once for cycle. The manger try to read all worker's work amount for efficiency reason. After all work is sent the worker send a control word in order to notify if everything is ok. If done is received, manager marks as ended the worker's work, otherwise worker's work is moved in to do list.
By default a manger spawns four worker. The spawn process consists in:
- create two pipes (one for read, another for write)
- set pipes as non blocking
- fork a child
- override child's standard input and output
- change the child's code calling worker binary
After spawn process, manager saves the process ID of the spawned one. This PID is used to check if the worker, for any reason, is dead. If so, the worker's work is moved in to do list and new worker is spawned.
In very stressful situations manager seems very slow and greedy of resources. The problem is caused by the limit size of the pipe. As a matter of fact the code is very good and fast. If you want to unlock the true power edit the file that contains the limit size of the pipe.