Using thousand of processors is a walk in the park (with OpenMOLE)

This tutorial will teach practitioners how to leverage distributed computing to explore the effect of the parameters of their programs and simulation models. Attendees will also learn how our free and open-source tool called OpenMOLE, can abstract them await from the technical burden of using distributed computing environments at their disposal (such as multi-core servers, clusters and the European grid EGI) to run extensive Design of Experiments (DoE) and Genetic Algorithms (GA) on there own executable.

Parameter tuning is a daily problem in every scientific communities using complex algorithms such as image processing, text processing, graph analysis or simulation models. Performance of computer architectures have led to more and more ambitious algorithm with a growing number of parameters. Therefore exploring high dimensional spaces to tune of parameters for specific problems become a central problem and many communities. Stochastic algorithms add another dimension of parameters to explore, as different random sources should generally be tested for each set of parameters, in order to obtain statistically sounded evaluations.

This wide range of parameters to tune, combined with the intrinsic execution time of the algorithms, make it impossible to run significant design of experiments on a desktop computer. Runs generated from a DoE are completely independent from each other. They are perfect candidates for distributed computing platforms. In an ideal world, the time required to process the whole DoE would be almost equivalent to the execution time of a single execution. However the methodological and technical costs of using distributed execution environments implies that most parameter space exploration are achieved either on a single desktop computer and occasionally on a multi-core server with shared memory but rarely on clusters and almost never on worldwide computing infrastructures such as the EGI.

This tutorial will train attendees in using OpenMOLE, a workflow distribution platform which aims at offering scientists from any field a tool to model their experiments as workflows, and distribute the resulting workload transparently to a wide range of computing environments. Compared to other workflow processing engines, it promotes a zero-deployment approach by accessing the computing environments from bare metal and copy on-the-fly all software components required for a reliable remote execution. OpenMOLE also encourages the use of software components developed in heterogeneous programming languages and enable its user to easily replace the elements involved in the workflow. Workflows can be designed using either a GUI, or a Scala DSL which exposes advanced workflow design constructs. For more details regarding the core implementation and features of OpenMOLE, the interested reader can refer to [Reuillon et al., 2010, Reuillon et al., 2013] and the OpenMOLE website1.

This tutorial will contain a short presentation of OpenMOLE and a practical session on attendees’ computers. It will give attendees the base skills to use OpenMOLE with their own simulation model or programs. At the end of the tutorial, attendees will know how to write their DoE using OpenMOLE, and distribute the resulting workload to a wide range of distributed computing environments.

Using thousand of processors is a walk in the park (with OpenMOLE)

Authors

Tutorials

Keywords

Using thousand of processors is a walk in the park (with OpenMOLE)

Authors

Tutorials

A Practical Introduction to the GAMA Agent-based Modeling Platform

Implementing artificial evolution on GPGPU-based computing eco-systems with the EASEA-CLOUD massively parallel platform

LinkRbrain: Open data and integrative databases to understand the brain

The BioEmergences workflow for reconstructing cell lineages from 3D+time imaging data

Using thousand of processors is a walk in the park (with OpenMOLE)

Virtual Collaborative Computational Science with R and Python

Keywords