Changes in Argobots 1.1

Argobots 1.1 keeps the ABI and API compatibility with Argobots 1.0 while adding several new features, optimizing the core Argobots implementation, and fixing bugs. The following summarizes the changes.

New Features

Tool Interface for Debugging and Profiling

Argobots 1.1 exposes an interface for a tool to catch internal Argobots events such as thread creation, synchronization, and yielding. This interface can be used like MPI's PMPI or OpenMP's OMPT. ABTX_prof is a header-based profiler over this interface, which measures ULT-specific performance metrics such as the average execution time of each ULT, the number of created ULTs, and the number of yield operations. See ABTX_prof for details.

Stack Unwinding for Debugging

Argobots can be compiled with libunwind to enable a stack unwinding feature. This feature should be useful especially when the user dumps Argobots information by ABT_info_trigger_print_all_thread_stacks(), which can be invoked in a signal handler.

Static Initializers for ABT_mutex and ABT_cond

ABT_mutex and ABT_cond support static initializers so that the developer can easily port existing applications multithreaded with POSIX threads. Those static initializers can be used to speed up creation of ABT_mutex and ABT_cond since statically initialized ABT_mutex and ABT_cond needs neither ABT_mutex/cond_create() nor ABT_mutex/cond_free().

ABT_mutex_memory mutex_mem = ABT_MUTEX_INITIALIZER;
int g_protected_value = 0;
void inc_protected_value()
{
    ABT_mutex mutex = ABT_MUTEX_MEMORY_GET_HANDLE(&mutex_mem);
    ABT_mutex_lock(mutex);
    g_protected_value++;
    ABT_mutex_unlock(mutex);
}

Extended Work Unit-Specific Data

Previously, work unit-specific data (similar to thread-local storage or TLS) in Argobots is accessible only on its owner work unit. Argobots 1.1 allows the user to access work unit-specific data via ABT_thread or ABT_task handles. It is convenient to "attach" data to a work unit. The user can also utilize an optional destructor that is automatically called on ABT_thread_free() or ABT_task_free() to release the attached data.

int ABT_thread_set_specific(ABT_thread thread, ABT_key key, void *value);
int ABT_thread_get_specific(ABT_thread thread, ABT_key key, void **value);
int ABT_task_set_specific(ABT_task task, ABT_key key, void *value);
int ABT_task_get_specific(ABT_task task, ABT_key key, void **value);

New Utility Functions

Argobots 1.1 adds several new setter/getter functions that Argobots 1.0 lacks.

int ABT_unit_get_thread(ABT_unit unit, ABT_thread *thread);
int ABT_thread_get_last_xstream(ABT_thread thread, ABT_xstream *xstream);
int ABT_thread_get_unit(ABT_thread thread, ABT_unit *unit);
int ABT_thread_is_unnamed(ABT_thread thread, ABT_bool *is_unnamed);
int ABT_thread_get_thread_func(ABT_thread thread, void (**thread_func)(void *));
int ABT_task_is_unnamed(ABT_task task, ABT_bool *is_unnamed);
int ABT_self_is_unnamed(ABT_bool *is_unnamed);
int ABT_self_get_last_pool(ABT_pool *pool);
int ABT_self_set_associated_pool(ABT_pool pool);
int ABT_self_get_unit(ABT_unit *unit);
int ABT_self_get_thread_func(void (**thread_func)(void *));
int ABT_mutex_get_attr(ABT_mutex mutex, ABT_mutex_attr *attr);
int ABT_mutex_attr_get_recursive(ABT_mutex_attr attr, ABT_bool *recursive);

Extended Affinity Interface

Affinity plays an important role when it comes to high-performance computing. Argobots 1.1 extends the affinity interface to enable a complex affinity setting via the ABT_SET_AFFINITY environmental variable. The grammar is similar to OpenMP's OMP_PLACES. See this for details.

Note that Argobots 1.1 disables the affinity setting by default. --enable-affinity is needed to turn on the affinity feature.

Performance Optimization

Argobots 1.1 improves the performance of the following components in particular.

Work Unit-Specific Data (ABT_key)
Argobots 1.1 significantly (10x or more) reduced the overheads of operations that access work unit-specific data by utilizing a unit-specific data cache and a redesigned hash table. See this PR for details.
ULT Stack Pool
Argobots 1.1 improves the performance and the scalability of ULT stack allocation by adopting a bucket-based lock-free LIFO pool with a per-execution stream local cache. See this PR for details.
Synchronization Objects over Tasklets and External Threads
Argobots 1.1 supports and optimizes synchronization operations called on a tasklet or an external thread (i.e., POSIX thread) by implementing them with either futex (on Linux systems) or pthread_cond_t (on non-Linux systems). Specifically, an external thread that waits on an Argobots synchronization object sleeps without spinning similarly to pthread_cond_wait(). See this PR for details.

Better API Documentation

Argobots 1.1 enriched the API documentation, which clarifies the following.

  • Which parallel entity can legally call a function (i.e., a ULT, a tasklet, or an external thread.)
  • What input causes an error.
  • What error code is returned.
  • What input causes undefined behavior.
See Doxygen for details.

More Supported Platforms

Argobots 1.1 officially supports the following compilers.

  • GNU Compiler (gcc) (>= 4.8)
  • Clang/LLVM (clang)
  • Intel C Compiler (icc)
  • IBM XL compiler (xlc) (>= 16.1.1)
  • PGI compiler (pgcc) (>= 20.9)
Argobots is tested regularly with these compilers on several platforms. See this page for the latest results.

Bug Fixes with Thorough Testing

Argobots 1.1 employed a new testing framework called "rtrace" to check the memory leak not only in successful paths but also in failure paths. For example, ABT_init() internally calls 10-20 resource allocation functions (e.g., malloc() and mmap()). This rtrace library tests all the possible success/failure patterns to check if ABT_init() either succeeds or returns an error after freeing all the allocated memory during the initialization. We tested major Argobots functions and fixed bugs so that every Argobots routine either succeeds or returns an error without a side effect. See this for details.

We also started to check Argobots 1.1 with Coverity to ensure its software quality (Coverity). In addition to Valgrind, Argobots 1.1 supports GCC address sanitizers and Clang address sanitizers so that users can use address sanitizers for their programs that use Argobots.

Miscellaneous Changes

  • The producer-consumer check of ABT_pool was removed since this check has been corrupted. The user should not rely on an error returned by Argobots to check if the pool access is correct. Any user who is not sure should use an MPMC pool, which does not have this access concern at all.
  • The work-unit migration target check was simplified, so some users might see some errors disappear. It becomes the user's responsibility to keep migration targets alive. Note that the check mechanism in Argobots 1.0 is not fully functional.
  • ABT_task type is changed to the same as ABT_thread type. It should not cause any problem even if the user mixes ULTs and tasklets. However, it might cause a compilation issue in a case where a user program uses type information, for example, for C++ template.
  • ABT_unit_set_associated_pool() becomes optional. In Argobots 1.1, unit-pool association is automatically updated by other routines such as ABT_xstream_run_unit() and ABT_pool_push().
  • The other minor changes (e.g., passing num_waiters = 0 to ABT_barrier_create()) are written in the code document.
  • The Argobots logo is renewed. Please use the new one. For a research presentation, we would truly appreciate if you could also cite our Argobots paper.