The ASTG provides support to the the NASA Center for Climate Simulations (NCCS) scientific user community including management of user environments and use of the available computing resources including migration and adoption of cloud resources through the NCCS Science Managed Cloud Environment.
NCCS Task Farming (NCCS-TF
) is a Python application that allows users to execute independent tasks concurrently
across nodes on multicore clusters.
The package consists of a set of Python scripts working together through two simple text-based interfaces.
NCCS-TF
does not require any knowledge of the individual tasks (serial and even parallel) and does not make any assumptions
about the underlying applications. As a matter of fact, the tasks to be executed can be from different applications.
NCCS-TF
can be seen as a task parallelism tool where multiple concurrent independent tasks are executed in parallel.
NCCS-TF
consists of two independent front-end Python script through which a user provides a list of tasks to be
performed.
The front ends are:
gpa_tf.py
: Relies on GNU Parallel.srun_tf.py
: Relies on native SLURM srun commands.Regardless of the front-end used, NCCS-TF
determines the list of nodes reserved by the user and connects to
individual nodes to distribute the workload (independent tasks).
If tasks are available, each node receives as many of them as it has cores (if the user chooses to employ all the cores within the node).
Scientists run their applications on high-performance computers and generate a large number of data files. The data need to be processed in order to extract meaningful scientific results. In general, the manipulation of output files consists of executing a series of independent serial tasks on single processors on the same platform where the initial data were produced. This can take a significant amount of time and leads to an inefficient use of available resources.
The Portable Distributed Scripts (PoDS) is an open source Python application that allows users to execute serial independent tasks concurrently across nodes on multicore clusters. The package consists of a set of scripts working together through a simple text-based interface. A user only needs to provide minimal information to perform the desired tasks.
PoDS does not require any knowledge of the individual tasks and does not make any assumptions about the underlying application. As a matter of fact, the tasks to be executed can be from different applications. PoDS can be seen as a task parallelism tool where concurrent independent jobs are executed in parallel.
PoDS consists of a front-end Python script through which a user provides a list of tasks to be performed. In a practical sense, PoDS determines the list of nodes reserved by the user and connects (through a password-less ssh command) to individual nodes to distribute the workload (independent tasks). As long as tasks are available, each node receives as many of them as it has cores (if the user chooses to employ all the cores within the node). PoDS internally monitors the progress of each task and moves to the next available one as soon as one is completed. At any given time, all the nodes (in fact all the processors) remain busy until there is no more work to do.
Here are additional features of PoDS:
PoDS is actively maintained and can easily be modified to meet users’ needs as they emerge
To use PoDS on NCCS computers, visit the webpage: PoDS.