convert CWL workflow (<- this is what we ask from EBRAINS users) into streamflow.yml format (<- this is what steramflow expects) to run via streamflow in k8s
Prior steps made by Nikos and Michalis:
Although we have successfully tested StreamFlow Workflow Manager in streamflow-examples there is a need to automate the process of running CWL with streamflow like it is done by CWL-tes tool.
Do we have an issue for the multi-site aspect of workflows? E.g. orchestration (co-scheduling?), data transfer?
Related: Do we also track aspects of workflow step reliability (automated retries, moving "steps" to a different site (maybe even "interactively") etc.) and other features for complex workflows (i.e. things that a plain "make" (build flow tool of your choice) cannot do)? Somewhat similar to what ganga supports (e.g. job manip and data transfer aspects).
I know that @noelp started collecting information in task 6.4 but I'm not sure if these things already hit the issue tracker. (In the last T6.4 meeting I suggested to create a cheatsheet that provides the collected information — maybe a feature matrix for the known tools.)
Thanks for sharing your progress.
I'm curious how explicit WMS like StreamFlow and implicit WMS like Parsl compare in terms of usability, flexibility and reproducibility.
I'm currently working on porting the demonstrator_workflow.cwl to Parsl and Globus, i.e. I will reuse the CWL CommandLineTools, but rewrite the CWL Workflow itself as a Parsl script.
I've already automated the creation of Globus Compute and Globus Transfer endpoints. This makes it easier to add new execution sites.
Still, I need to figure out container management and log file collection to make the workflow execution visible to the user.
hello @noelp thank you for your updates as well. is this related to multisite HPC workflows? @vgeo is currenlty working on setting up a cwl workflow engine (or a compatible to CWL at least) and we suggested to start experimenting with Streamflow. in general, as a platform, we should provide to EBRAINS users a way to run CWL workflows (for now in the cloud / via k8s) - for sure we are interested about your work - and when we are in a mature stage, align with you for providing the possibility to run complex multisite HPC workflows
@npappas created tc-workflows namespace at rke2-1-cineca-dev and granted access to @vgeo
@npappas configured PVC and service account as explained here - service account seems to be missconfigured and gives errors when attempting any of the examples shared
moving this to blocked column until service account is configured correclty
Seems to be unable to set the streamflow serviceaccount to the streamflow-edit rolebind under the tc-workflows namespace. It works on the default one.
@vgeo will make his tests on the default namespace and I will investigate further in the future if we want the tests to be production ready on another namespace.
I have successfully run everything in Kubernetes that was tested locally. The next step is to run a complex workflow, similar to the ones that will be used in production, in order to determine any assumptions needed for the CWL workflows structure or any necessary development.