Technical Tips: More on OpenMP

Title -More on OpenMP

Details - OpenMP is an implementation of multithreading, a method of parallelization whereby the master "thread" (a series of instructions executed consecutively) "forks" a specified number of slave "threads" and a task is divided among them. The threads then run concurrently, with the runtime environment allocating threads to different processors.

The section of code that is meant to run in parallel is marked accordingly, with a preprocessor directive that will cause the threads to form before the section is executed. Each thread has an "id" attached to it which can be obtained using a function (called omp_get_thread_num()). The thread id is an integer, and the master thread has an id of "0". After the execution of the parallelized code, the threads "join" back into the master thread, which continues onward to the end of the program.

By default, each thread executes the parallelized section of code independently. "Work-sharing constructs" can be used to divide a task among the threads so that each thread executes its allocated part of the code. Both task parallelism and data parallelism can be achieved using OpenMP in this way.

The runtime environment allocates threads to processors depending on usage, machine load and other factors. The number of threads can be assigned by the runtime environment based on environment variables or in code using functions. The OpenMP functions are included in a header file labelled "omp.h" in C/C++.

Implementations

OpenMP has been implemented in many commercial compilers. For instance, Visual C++ 2005, 2008 and 2010 support it (in their Professional, Team System, Premium and Ultimate editions), as well as Intel Parallel Studio for various processors.

Pros and cons

Pros

Data layout and decomposition is handled automatically by directives.
Incremental parallelism: can work on one portion of the program at one time, no dramatic change to code is needed.
Unified code for both serial and parallel applications: OpenMP constructs are treated as comments when sequential compilers are used.
Original (serial) code statements need not, in general, be modified when parallelized with OpenMP. This reduces the chance of inadvertently introducing bugs.
Both coarse-grained and fine-grained parallelism are possible

Cons

Risk of introducing difficult to debug synchronization bugs and race conditions.
Currently only runs efficiently in shared-memory multiprocessor platforms .
Requires a compiler that supports OpenMP.
Scalability is limited by memory architecture.
no support for compare-and-swap
Reliable error handling is missing.
Lacks fine-grained mechanisms to control thread-processor mapping.
Can't be used on GPU
High chance of accidentally writing false sharing code

Performance expectations

One might expect to get an N times speedup when running a program parallelized using OpenMP on a N processor platform. However, this is seldom the case due to the following reasons:

N processors in a SMP may have N times the computation power, but the memory bandwidth usually does not scale up N times. Quite often, the original memory path is shared by multiple processors and performance degradation may be observed when they compete for the shared memory bandwidth.
Many other common problems affecting the final speedup in parallel computing also apply to OpenMP, like load balancing and synchronization overhead.

http://msdn.microsoft.com/en-us/library/tt15eb9t(v=vs.80).aspx

https://computing.llnl.gov/tutorials/openMP/

http://en.wikipedia.org/wiki/OpenMP

Posted By : Binu M D