Argobots 1.1 keeps the ABI and API compatibility with Argobots 1.0 while adding several new features, optimizing the core Argobots implementation, and fixing bugs. The following summarizes the changes.
Changes in Argobots 1.1
Tool Interface for Debugging and Profiling
Argobots 1.1 exposes an interface for a tool to catch internal Argobots events such as thread creation, synchronization, and yielding. This interface can be used like MPI's PMPI or OpenMP's OMPT. ABTX_prof is a header-based profiler over this interface, which measures ULT-specific performance metrics such as the average execution time of each ULT, the number of created ULTs, and the number of yield operations. See ABTX_prof for details.
Stack Unwinding for Debugging
Argobots can be compiled with libunwind to enable a stack unwinding feature. This feature should be useful especially when the user dumps Argobots information by
ABT_info_trigger_print_all_thread_stacks(), which can be invoked in a signal handler.
Static Initializers for
ABT_cond support static initializers so that the developer can easily port existing applications multithreaded with POSIX threads. Those static initializers can be used to speed up creation of
ABT_cond since statically initialized
ABT_cond needs neither
Extended Work Unit-Specific Data
Previously, work unit-specific data (similar to thread-local storage or TLS) in Argobots is accessible only on its owner work unit. Argobots 1.1 allows the user to access work unit-specific data via
ABT_task handles. It is convenient to "attach" data to a work unit. The user can also utilize an optional destructor that is automatically called on
ABT_task_free() to release the attached data.
New Utility Functions
Argobots 1.1 adds several new setter/getter functions that Argobots 1.0 lacks.
Extended Affinity Interface
Affinity plays an important role when it comes to high-performance computing. Argobots 1.1 extends the affinity interface to enable a complex affinity setting via the
ABT_SET_AFFINITY environmental variable. The grammar is similar to OpenMP's
OMP_PLACES. See this for details.
Note that Argobots 1.1 disables the affinity setting by default.
--enable-affinity is needed to turn on the affinity feature.
Argobots 1.1 improves the performance of the following components in particular.
- Work Unit-Specific Data (
- Argobots 1.1 significantly (10x or more) reduced the overheads of operations that access work unit-specific data by utilizing a unit-specific data cache and a redesigned hash table. See this PR for details.
- ULT Stack Pool
- Argobots 1.1 improves the performance and the scalability of ULT stack allocation by adopting a bucket-based lock-free LIFO pool with a per-execution stream local cache. See this PR for details.
- Synchronization Objects over Tasklets and External Threads
- Argobots 1.1 supports and optimizes synchronization operations called on a tasklet or an external thread (i.e., POSIX thread) by implementing them with either
futex(on Linux systems) or
pthread_cond_t(on non-Linux systems). Specifically, an external thread that waits on an Argobots synchronization object sleeps without spinning similarly to
pthread_cond_wait(). See this PR for details.
Better API Documentation
Argobots 1.1 enriched the API documentation, which clarifies the following.
- Which parallel entity can legally call a function (i.e., a ULT, a tasklet, or an external thread.)
- What input causes an error.
- What error code is returned.
- What input causes undefined behavior.
More Supported Platforms
Argobots 1.1 officially supports the following compilers.
- GNU Compiler (
gcc) (>= 4.8)
- Clang/LLVM (
- Intel C Compiler (
- IBM XL compiler (
xlc) (>= 16.1.1)
- PGI compiler (
pgcc) (>= 20.9)
Bug Fixes with Thorough Testing
Argobots 1.1 employed a new testing framework called "rtrace" to check the memory leak not only in successful paths but also in failure paths. For example,
ABT_init() internally calls 10-20 resource allocation functions (e.g.,
mmap()). This rtrace library tests all the possible success/failure patterns to check if
ABT_init() either succeeds or returns an error after freeing all the allocated memory during the initialization. We tested major Argobots functions and fixed bugs so that every Argobots routine either succeeds or returns an error without a side effect. See this for details.
We also started to check Argobots 1.1 with Coverity to ensure its software quality (Coverity). In addition to Valgrind, Argobots 1.1 supports GCC address sanitizers and Clang address sanitizers so that users can use address sanitizers for their programs that use Argobots.
- The producer-consumer check of ABT_pool was removed since this check has been corrupted. The user should not rely on an error returned by Argobots to check if the pool access is correct. Any user who is not sure should use an MPMC pool, which does not have this access concern at all.
- The work-unit migration target check was simplified, so some users might see some errors disappear. It becomes the user's responsibility to keep migration targets alive. Note that the check mechanism in Argobots 1.0 is not fully functional.
- ABT_task type is changed to the same as
ABT_threadtype. It should not cause any problem even if the user mixes ULTs and tasklets. However, it might cause a compilation issue in a case where a user program uses type information, for example, for C++ template.
- ABT_unit_set_associated_pool() becomes optional. In Argobots 1.1, unit-pool association is automatically updated by other routines such as
- The other minor changes (e.g., passing
num_waiters = 0to
ABT_barrier_create()) are written in the code document.
- The Argobots logo is renewed. Please use the new one. For a research presentation, we would truly appreciate if you could also cite our Argobots paper.