Technical Hackathon : Tools, Workflow and Workbenches
A hackathon bringing together developers, Galaxy instance providers to promote collaboration & technical developments
Du 18-05-2016 au 20-05-2016 à Paris
Date limite d'inscription: Aucune information

A hackathon bringing together developers from the ELIXIR Tools & Data Services Registry (bio.tools),

Galaxy, Taverna, Arvados, CWL, BioExcel, ReGaTE and EDAM ontology, with Galaxy instance providers from ELIXIR and beyond, to promote collaboration and technical developments.

 

When

18-20 May 2016

Where

The Institut Pasteur, Paris, France

How to get there

 

Please go to the main reception : you will be directed to the meeting room.

 

Please do not forget your ID or passport. In the absence of valid ID document, you will be denied access to the campus.

Registration

The event is fully booked. If you would like to participate nevertheless, please contact hmenager@pasteur.fr directly.

Online registration is free, but mandatory. Please register here:

http://tinyurl.com/registertoolshackathon

Aims

We will discuss and develop mechanisms by which information about tools and workflows - which may be described by using EDAM, biotoolsXSD and CWL - can be maintained, exchanged and reused by the ELIXIR Tools & Data Services Registry (bio.tools) and workbench and workflow systems, such as Galaxy, Taverna and others.  The aim is to coordinate our developments to improve such technologies, avoid redundant efforts and support useful new applications in the future.  For example:

  • Sharing the burden of maintaining rich tool descriptions using EDAM     
  • Automated registration of tool & workflow descriptions in bio.tools, e.g. registration of tools installed on a Galaxy instance using ReGaTE
  • Export from bio.tools of boilerplate tool execution-layer specification, e.g. CWL file (“workbench enabler service”)
  • Connection from bio.tool entries to registries of containers /  VMs
  • EDAM developments in support of the above
  • What practical steps can we take - during the hackathon and afterwards?

Program (tentative - will be adapted based on expectations)

 

Day 1

10:00-12:00    Welcoming - setup and informal discussions

12:00-14:00    Lunch

14:00-16:00    Session 1: Introduction and Presentation

  • “Welcome, introduction and practical information” - Hervé Ménager
  • “bio.tools registry, biotoolsXSD and EDAM” - Jon Ison
  • “bio.tools : plans for 2016” - Emil Rydza

16:00-16:30    Coffee

16:30-17:30    Session 2: Short Presentations

    Participants can present projects that are related to workshop topics. Presentations should be short (no more than 10’) and stimulate discussions and collaborations

17:30-18:00    Organising ourselves into groups for Day 2 & Day 3

19:30        Dinner - pay your own way (venue t.b.d.)

Day 2

09:00-10:00    Presentations & discussions

  • "Benchmarking of Bioinformatics Analysis Tools" - Piotr Wojciech Dabrowski
  • Debian Med- Andreas Tille
  • Creating institutional change without creating new institutions: a proposal for building upon existing technical and social infrastructure in service of the bioinformatics community” - Michael Crusoe

10:00-13:00    Informal hacking

coffee @ 11AM

13:00-14:00    Lunch

14:00-18:00    Informal hacking

coffee @ 4PM

19:30        Dinner (provided for you by the IFB)

Day 3

09:00-10:00    Presentations & discussions

  • “Interactive workflow diagrams” - Frederik Coppens

10:00-13:00    Informal hacking

coffee @ 11AM

13:00-14:00    Lunch

coffee @ 2PM

14:00-15:00    Hackathon summary

  • Please complete the outcomes table
  • Practical next steps
  • General comments

Topics (non exhaustive)

Possible topics for work & discussions are below.  We cannot hope to cover everything here - it is meant as food for thought.  Please feel free to add relevant topics below.

 

Tool & workflow descriptions
  • tool documentation
    • documentation process in various scenarios
    • how to coordinate / share the burden?
  • using EDAM to improve user experience
    • in workbench /        systems (Galaxy, Taverna …)
    • creating Galaxy Pages
  • Tool/Workflow descriptions and provenance
  • How to identify and relate tools?
    • Identifiers across versions/variants
    • Tools that are part of other tools (e.g. “bigtool --method actual-tool” is part of “bigtool”)
    • Tool families
    • Following 10 simple rules for identifiers

 

Registration and usage of tools & workflows
  • from workbenches, e.g. Galaxy instances
  • from Galaxy ToolShed
  • from Taverna (TAVERNA-880)
  • from Debian-Med
  • how to describe workflow in bio.tools (biotoolsXSD extensions?)

 

Workbench enabler service
  • use of EDAM in CWL
  • storage of complete CWL files in bio.tools or cross-linking bio.tools to relevant repo(s)?
  • what is the boilerplate, e.g. the overlap between biotoolsXSD and CWL ?

 

Linking to container/ VM repos
  • Linking bio.tools entries to VMs, or Docker images, e.g. in Docker Hub, BioShadock, EGI AppDb
  • Embedding or linking to CWL tool description?

 

EDAM development
  • Curation priorities?
    • improved coverage of formats used by Galaxy et al.
  • improve existing EDAM Format concepts
    • validate the name and synonyms are sensible
    • validate the is-a (generalisation/specialisation) relations
    • add missing links to format specifications
    • annotation with media type (if available)
  • how to distinguish stably specified “exchange” formats (within EDAM scope) and implicit tool-specific ones (out of scope)?
  • how to facilitate annotation using EDAM
    • EDAM requirements, e.g. stable human-readable IDs via <stable_id> annotation?
    • tooling (“tool annotator”)

People attending

 

First name Last name Affiliation
Thomas Cokelaer Pasteur
Dominique Batista IFB
Fabien Mareuil Pasteur
Emil Rydza ELIXIR-DK
Hervé Ménager Pasteur
Matúš Kalaš Uni. Bergen
Sandrine Perrin IFB
Vincent Lefort LIRMM
Christophe Antoniewski UPMC
Nebojsa Tijanic SBG
Maja Nedeljkovic SBG
Mate Balo Bordas SBG
Ivan Batic SBG
Sinisa Ivkovic SBG
Janko Simonovic SBG
Jovana Avalic SBG
Luka Stojanovic SBG
Fabrice Leclerc U-PSUD
Hedi Peterson ELIXIR-EE
Andreas Tille Robert Koch Institute (Debian)
Lilian Janin Illumina
Hector Countouris APHP
William Digan APHP
Damien Correia Pasteur
Bernd Jagla Pasteur
Asmaa Toumi  
Francois Moreews IRISA
Bruno Costa UNL
Christophe Blanchet IFB
Marius van den Beek UPMC
Jacques van Helden U-AMU
Adrien Josso Genoscope
Chao Zhang VU Amsterdam
Piotr Wojciech Dabrowski Robert Koch Institute
Nicola Soranzo TGAC
João Cardoso ELIXIR-PT (INESC-ID)
Jon Ison ELIXIR-DK
Sarah COHEN-BOULAKIA LRI
Michael Crusoe CWL
Stian Soiland-Reyes Manchester
Alan Williams Manchester
Frederik Coppens ELIXIR-BE
Olivia Doppelt-Azeroual Pasteur
Lukas Berger ELIXIR-DK

Tool relations (Jon Ison, in the main room)

Types of tools and software in bio.tools, and relations with collections and places

There are inconsistencies and redundancies in the contents of the registry. Different versions, etc. Define links between different bio.tools entries to avoid redundancies and inconsistencies in versions, collections, suites.

Interested: Jon Ison, Matúš Kalaš, Batista Dominique, Jacques van Helden, Bruno Costa, João Cardoso, Hervé Ménager

Outcome: https://docs.google.com/spreadsheets/d/1_KGr2DkulwtAjFJzNjTm08zXVphFlVZ8p29Id6XFlxc/edit

 

 

Collaboration model (Jon Ison, PARTIALLY POSTPONED)

Sharing rights, ownership and permissions within for bio.tools and with CWL and Debian-Med

  • Testing | comparing | benchmarking infrastructure implications

Interested: Matúš Kalaš, Batista Dominique, Michael R. Crusoe, Olivia Doppelt-Azeroual, Jon Ison

 

 

Rabix & CWL Java and JS Bindings (7 Bridges)

CWL is a YAML encoded object model. Discuss bindings in Java and JS

Interested: Ivan Batić (JS), Maya Nedeljkovich (JS), Mate Balo Bordas (JS), Janko Simonović, Lilian Janin (I walked with Michael and friends to a room for the group below, don’t wait for me), Sinisa Ivkovic, Jovana Avalic, Wojtek Dabrowski

Outcome: Created Typescript bindings for CWL CommandLineTool: CWL TS

Opened a few issues on CWL github: #215, #216, #217

Setup conformance tests build for Bunny https://ci.commonwl.org/job/rabix-conformance/

 

 

CWL Specification and execution models (Michael Crusoe, in Duclaux building)

Discussions on how CWL metadata profile could relate to Research Object model and Debian’s package descriptions available as RDF, schema.org and provenance. Should form a “best practice” guiding profile, similar to (or added to) Galaxy’s Planemo.

Interested: Nebojsa Tijanic, Jacques van Helden , Stian Soiland-Reyes,

 

Outcomes: Many CWL Draft 4 proposals evaluated; several critical decisions made including #204 #214

 

Interoperability of workflow specs. (Jacques van Helden)

Implement a minimal workflow in CWL, SnakeMake and Galaxy and study feasibility

Interested: Thomas Cokelaer, Luka Stojanovich, Nebojsa Tijanic, Adrien Josso, Christophe Antoniewski, Sandrine Perrin , Janko Simonović, Chao Zhang, Wojtek Dabrowski

 

https://docs.google.com/document/d/1M4H1bcebzAdLKfvXPXvDIypIjmx1APW3VB7FyWxE97A/edit?usp=sharing

 

 

Workbench enabler (Hervé Ménager, PARTIALLY POSTPONED)

Bio.tools to Galaxy, CWL, Research Objects

Interested: Marius van den Beek, Chao Zhang, Nebojsa Tijanic, Hector Countouris, William Digan, Matúš Kalaš

 

 

Galaxy to bio.tools publishing (ReGaTE) (Olivia Doppelt)

We can start to use ReGaTE on any of your galaxy instances and generate json and xml that should almost be directly adapted to the Bio.tools format. (Maybe register some)

The format of the “tool” files can also be discussed (add EDAM operation tags to tool XML?)

 

Outcomes:

Proposal for adding EDAM operations to Galaxy tool XML (under review):

https://github.com/galaxyproject/galaxy/pull/2379

 

<tool>

...

<edam_operations>

 <edam_operation>operation_nnnn</edam_operation>

 <edam_operation>operation_nnnx</edam_operation>

</edam_operations>

</tool>

 

Addition of EDAM data and format annotations to Galaxy datatypes (merged):

https://github.com/galaxyproject/galaxy/pull/2387

 

Rebased, reviewed and improved proposal of a biotools-compatible API endpoint to Galaxy (to be used by ReGaTe to submit Galaxy or ToolShed tools to bio.tools) (WIP):

https://github.com/galaxyproject/galaxy/pull/1891

 

//

ReGaTE is being tested on Hector and William Galaxy instance

At this time we dockerized ReGaTE and we tested it against our own local Galaxy instance. The test was not successful due to proxy problem.

Interested: Chao Zhang, Hector Countouris, William Digan, Nicola Soranzo

 

 

Packaging training (Andreas Tille, in the main room)

Interested: Jacques van Helden,

 

Outcome: Andreas exposed the practical aspects of Debian packaging, and Jacques and Andreas started to prepare a debian package for the software suite  Regulatory Sequence Analysis Tools (http://rsat.eu/). This preparation work already led to several suggestions to improve the deployment of the suite.

This effort will be pursued to achieve the defined goal: enable to obtain a fully working version of this software suite with all its dependencies with a single command: “apt-get install rsat”.

Hint from Andreas: Please, if you have any software that you are missing in Debian please approach the Debian Med team on the mailing list debian-med@lists.debian.org and lets find a way to get the packaging done.

 

Docker image publishing for tools (Francois Moreews)

Dockerfile with metadata + CWL + bio.tools - docker on clusters...

 

cea
cnrs
inra
inria
inserm
logo_elixir
logo-investissements