- Last time: OpenMP nested parallelism, work sharing (for loops, sections)
- Today
- OpenMP: nested parallelism, work sharing (tasks)
- OpenMP: variable scoping, synchronization, loose ends
Ending example
- Pros: Allows parallelization of irregular problems
- Unbounded loops
- Recursive algorithms
- Producer/consumer
- Cons: Relatively tricky to deal with & introduce some overhead
- Motivations
- OpenMP started to be tailored for large array-based applications
- For example, the parallelization of a dynamic list traversal cannot be done in OpenMP for a long time
- Storing pointers to list elements in an array: High overhead for array construction (not easy to parallelize)
- Using single nowait inside a parallel region: High cost of the single construct. Also, each thread needs to traverse the entire list to determine if another thread has already processed that element
- Who does what and when?
- The developer
- Uses a pragma to specify where & what the tasks are
- Ensures that there are no dependencies (that is, tasks can be executed independently)
- The OpenMP runtime system
- Generates a new task whenever a thread encounters a task construct
- Decide the moment of execution (can be immediate or delayed)
- The developer
- Definition: A task is a specific instance/combo of executable code along w/ its data environment (the shared & private data manipulated by the task) and ICV (internal control variables: thread scheduling and environment variables, typically associated with OpenMP)
- Synchronization issues. Solution: use task barriers (
#pragma omp barrier
,#pragma omp taskwait
) to ensure the completion of tasks.
- Threads have access to a pool of memory that is shared
- Threads can also have private data
- Basic rule: Any variable declared prior to a parallel region is shared in that parallel region
- The private clause reproduces for each thread variables declared private in the pragma
- There are also OpenMP variables treated as private by default
- Stack (local) variables in functions called from within parallel regions
- Loop iteration variables
- Automatic variables within a statement block
- When in doubt, always explicitly indicate something to be private
- firstprivate: Specifies that each thread should have its own instance of a variable. Moreover, the variable is initializes using the value of the variable of the same name from the master thread
- Usage: #pragma omp parallel num_threads(4) firstprivate(i)
- lastprivate: The enclosing context's version of the variable is set equal to the private version of whichever thread executes the final iteration of the work-sharing construct (for or section)
- Data scoping is a common source of errors in OpenMP. It is the programmer's responsibility to make sure data dependencies do not lead to race conditions
- Explicit barrier: #pragma omp barrier
- Implicit barriers: parallel, for, single, sections
- Unnecessary barriers hurt performance and can be removed with the nowait clause (applicable to for, single, sections)