Skip to end of metadata
Go to start of metadata

About

Data sharing and uniform data access across e-infrastructures and and community centres

Participants of the working group are representatives or experts from 

  • Infrastructures: EGI, EUDAT, PRACE
  • Technologies: iRODS, EMI/FTS, Globus Online, Globus/GridFTP, EUDAT File Manager
  • Communities: VIP4VPH, MAPPER, VERCE, EPOS, DRIHM (TBC)

Objectives

This working group is piloting the collaboration of EGI, EUDAT and PRACE with user communities for the purpose of providing users a more consistent way for accessing - reading, writing, discovering, transferring - data across the three e-infrastructures.

Objective 1: demonstrate real benefit for users (user groups) who are collaborating in this working group. Technological and operational barriers shall be removed or mitigated as far as possible.

Objective 2: identify common data access and transfer tools and protocols which can be provided by all three e-infrastructures and which are useful for the collaborating user communities.

Objective 3: identify technology and/or organisational gaps and suggest measures for improvements when use cases cannot be realized across the three e-infrastructures

These objectives shall be met by an action plan which will be developed during the lifetime of this pilot.

Pilot Activity Lifetime

The lifetime of the pilot activity shall be limited to approximately 6 months after the kickoff meeting, i.e. should sucessfully finish before August 2013.

Meetings

DateLocationTypeComment (Agenda, Minutes)
24 June 2013, 2pm CESTEEPPILOT2VC1 agenda
3 May 2013, 2pm CESTEEPPILOT2VC1pilot 1+2agenda and material
12 Mar 2013, after 1 pm BSTLondon, Thistle Hotel, UKpilot1,2, ?after the EUDAT User Forum, planning the EEP f2f + minute
22 Feb 2013, 2pm CETEEPPILOT2VC1pilot2agenda
18 Feb 2013, 2pmEEPPILOT2VC1pilot1doodle, agenda
14 Feb 2013, 4pm CETEEPPILOT2VC1pilot2minutes : status use cases, next actions
4 Feb 2013, 2pm CETEEPPILOT2VC1pilot2doodle, minutes: kickoff, objectives, action plan, meeting
26-27 Nov 2012Amsterdam, SARAEGI/EUDAT/PRACEagenda and material

Action plan

(status 27 June 2013)

We are starting the collaboration pilot with the analysis of a few initial use cases: what factors are blocking interoperability? Which options do we have to realise a use case? Then we identify the strategy for their implementation.

#MilestoneWhoDateStateRef
1Objectives, livetime of this activity, and an action plan definedall04 Feb 2013done 
1upload one representative, well explained Mapper use case to the pilot2 wikiIlya22 Feb 2013done 
1upload one representative, well explained VPH use case to the pilot2 wikiStefan Zasada14 Feb 2013donelink
2identify one (or a few) committed users for the Mapper use case who can be contacted by e-infrastructure supportersIlya22 Feb 2013done 
(Derek Groen)
 
2identify one (or a few) committed users for the VPH use case who can be contacted by e-infrastructure supportersStefan Zasada14 Feb 2013done - I (Stefan Zasada) will be the initial user. Once a working system is in place, other selected users will be invitied. 
3Create a table for names of participants for the EEP f2f meeting on the 12 Mar 2013 in London.Johannes14 Feb 2013donelink
4
Ask VIP developers at CNRS whether they can participate and provide the data (see slide#11)
Stefan Zasada22 Feb 2013done. 12-03-2013: All EGI sites supporting the BIOMED VO can be used to store VPH data, this amounts to about 100 storage instances distributed acroos the EGI infrastructures. No additional storage is needed. VPH users need to be registered to the BIOMED VO for accessing these resources 
5Clarify with EPCC their resources and involvement as EUDAT data node for VPH and as PRACE execution siteEUDAT/PRACE
Johannes
1 Mar 2013

done
chat with Rob Baxter who is trying to get the right human and hardware resources in place. (Albert Heyrovsky was appointed to follow this use case)

 
6Appoint one person from the EUDAT Service team who can support this activity. E.g. EPCC need to get a PID prefix; SARA needs to register the prefix and create an account.EUDAT
Claudio/CINECA
since Mar 2013

done

Elena Erastova/RZG (supporting the VPH use case in EUDAT)

VPH/MAPPER Use cases
7

Investigate technical options which allow to return the assigned PIDs to the VIP client. How could this work with just gridftp (griffin)?

EUDAT
PID Service team (CINECA)
29 Mar 2013

done

This topic was discussed during the f2f meeting in London. A possible solution would be to extend the data staging script to support this functionality. However, the effort needed to extend the script must be further investigated.

Update (April 18th)

A first attempt to return PID information of ingested data sets has been made. A specific iRODS rule was created to this purpose and passed to the EUDAT Service Team for testing. Furthermore, CINECA is currently working to extend the data-staging script to support PID based transfer. A preliminary set of functionalities will be realised in the upcoming days.

VPH/MAPPER Use cases
8

How to stage data from the EPCC EUDAT-node to PRACE storage of the HPC exec machine.

  • Test data staging between EUDAT/PRACE and EUDAT/EPCC
PRACE 

in progress

Either the local iRODS tools (icommands) or any GridFTP client might be used here.

Update (May 16th)

The EUDAT data staging script was extended to this purpose so data can be staged in and out through PID.

VPH/MAPPER Use cases
9

Check

  • which sites will be involved in the testing of data transfer between EGI and EUDAT?
  • the Auth framework used at those sites
EGI
Tiziana
12 Mar 2013

done

In progress (06-03-2013). T. Ferrari: we are in contact with the French National Grid Initiative (H. Cordier) and Denis Friboulet is CNRS to organize their participation to a VPH.

Stefan: VIP data is already on EGI, we don't need any further storage space on EGI resources. We want to try moving data onto EUDAT and PRACE. But we do need interoperability in terms of a security solution. We need to know:
* If EUDAT is ok to support some EGI VOs (e.g. biomed), then there will be no problem. Since the number of final users is very limited (say 2/3), it has been decided to manually ingest them and their respective DNs onto EUDAT involved sites (namely EPCC).
* If a new VO has to be created to cover both EGI and EUDAT *or* if two VOs are used, then applications should handle it

======

2013-05-02 Update: waiting T. Glatard for BIOMED to appoint two sites for testing. Usage of any site supporting BIOMED is possible for testing. Around 100 sites are currently supporting BIOMED in EGI.

Authorization: access to local resources is based on X.509 certificates

Update (May 17th)

VPH EGI testbed
===============
All sites supporting BIOMED can be used
In particular the two sites were appointed for testing:

* SRM end-point (DPM):
. marsedpm.in2p3.fr
. Site: IN2P3-CPPM
. Contact: Edith Knoops knoops@cppm.in2p3.fr

* SRM end-point (DPM):
. clrlcgse01.in2p3.fr
. Site: IN2P3-LPC
. Contact: Jean-Claude Chevaleyre Jean-Claude.Chevaleyre@clermont.in2p3.fr

VPH/MAPPER Use cases
10Refining the workflow for the VPH use caseStefan12 Mar 2013doneVPH Use cases
11Evaluate the possibility to use the EPIC web interface as File Catalog for ingested filesEUDAT PID Service Team29 Mar 2013doneVPH/MAPPER Use cases
12Make a proposal on how to extend the Data Staging script to instrument transfers through PIDs.EUDAT Data Staging Service Team (CINECA)29 Mar 2013

done

Claudio developed a preliminary mechanism that return PIDs when ingesting files at EPCC via griffin/gridftp 

VPH/MAPPER Use cases
13Organize a meeting with VERCE people to share experience/thoughts on iRODS adoptionCINECA29 Mar 2013

done

Update (April 4th)

A Video Conference was organized on March 27th. VERCE people will provide a list of data management needs and with the collaboration of CINECA, a possible adoption of iRODS to implement them will be evaluated.

VERCE Use Case
15

Find a safe GridFTP configuration for DMP storages, so DMP sites in EGI could be accessed as GridFTP end points. (DPM is the most popular SRM storage type in EGI) 

EGI - Gergely & Karolis 12 Mar 

done

Background:

Any client that uses SRM storages via GridFTP can causes instability to those storages. This was discussed at the EGI Technology Coordination Board meeting in December. See slides and minutes: https://indico.egi.eu/indico/conferenceDisplay.py?confId=1170) We (EGI.eu) are currently waiting for the DPM community to release a new configuration mode that would made access fo DPM via GridFTP possible. 

Update March 28th

The DPM community hope to complete development by the end of April, which would mean a release around June all being well.

Update May 2

At EGI CF 2013, Manchester short meeting (DPM devs and EGI.eu)

 GridFTP endpoints behind the SRM where storage is DPM/dCache can be discovered and used with globusonline.eu, FTS, globus-url-copy, only the scalability might be affected if SRM is bypassed, but FTS3 supports srm:// <-> gsiftp:// transfers.

VPH/MAPPER Use cases
 16Try out the new FTS3 service to transfer files between an EUDAT and an EGI site:
- access EUDAT site with GridFTP
- access EGI site with SRM
 EGI - Gergely & Karolis12 Mar 

done

Update (June 24th)

For discovering gridftp endpoints within EGI please take a look at:

https://wiki.egi.eu/wiki/Globus_Online_cookbook_for_EGI_VOs


Update (May 17th)

Many thanks for their work to:

Griffin technology (Shunde Zhang)

FTS3 (Michail Salichos, CERN)

EUDAT Data Staging service (Giacomo Mariani)


FTS3 - The issues were fixed yesterday and transfer was successfully performed in both directions using FTS3:

  Source:      gsiftp://irods-dev.cineca.it:2811/CINECA/home/keigelis/1M.rand (EUDAT)
  Destination: srm://srm.grid.sara.nl:8443/pnfs/grid.sara.nl/data/dteam/1M.rand (EGI)
  State:       FINISHED

  Source:      srm://srm.grid.sara.nl:8443/pnfs/grid.sara.nl/data/dteam/1M.rand (EGI)
  Destination: gsiftp://irods-dev.cineca.it:2811/CINECA/home/keigelis/1M.rand (EUDAT)
  State:       FINISHED


GlobusOnline.eu - The transfers now are working as well for EUDAT <-> EGI BIOMED VO:

SRM end-point (DPM at EGI):


EUDAT endpoint (irods/Griffin)

  • irods-dev.cineca.it


Status: SUCCEEDED
Origin: ekarolis#EUDAT_DEV (EUDAT)
Destination: ekarolis#marsedpm_in2p3_fr (EGI)
bytes=1048576 mbps=3.995

Status: SUCCEEDED
Origin: ekarolis#marsedpm_in2p3_fr (EGI)
Destination: ekarolis#EUDAT_DEV (EDUAT)
bytes=1048576 mbps=7.626


  • Possible next step: try FTS3 and Globus Online between EGI and PRACE, and between EUDAT and PRACE.
  • Possible next step: setup an FTS3 service for the communities (single, or separate instance).


Update May 2

1) globus-url-copy (appver 8.6, Globus Toolkit 5.2.2)

Transfers of the files were successful via gsiftp from EGI to EUDAT and opposite.

Endpoints used:

*EGI
VERCE VO, glite-se.scai.fraunhofer.de (GridFTP Server 6.14, Globus Toolkit 5.2.1), DPM 1.8.6

*EUDAT
data.repo.cineca.it (GridFTP Server (Griffin, Java, 0.9.0))

Details:
globus-url-copy -vb -dbg gsiftp://glite-se.scai.fraunhofer.de/dpm/scai.fraunhofer.de/home/verce.eu/1M.rand gsiftp://data.repo.cineca.it:2812/CINECA01/home/EUDAT_STAFF/keigelis/1M.rand


2) FTS3 server/client latest

FTS3 transfers of the files were successful within EGI (between DPM and dCache) using srm:// <-> gsiftp://.

Transfer from EGI to EUDAT and opposite has some issues, currently developers of FTS3 and Griffin re working on it.

People involved:
Griffin technology (Shunde Zhang)
FTS3 (Michail Salichos, CERN)
EUDAT Data Staging service (Giacomo Mariani)

REF: https://code.google.com/p/datafabric-griffin/issues/detail?id=7


3) GlobusOnline.eu

Files are browseable on both endpoints using File Transfer web interface, but transfer only from EUDAT to EGI was successful, but not from EGI TO EUDAT, error log from globusonline.eu below:

From EGI to EUDAT:
Error (make directories)
Server: ekarolis#EUDAT (data.repo.cineca.it:2812)
File: /CINECA01/home/EUDAT_STAFF/keigelis/1M.rand
Command: MKD /CINECA01 Message: Fatal FTP response --- 550 Requested action not taken. No permission.

Also there is No listing of the CWD, maybe that is why globusonline.eu wants to MKD /CINECA01:

[karolis@test13 ~]$ uberftp data.repo.cineca.it -P 2812 "ls"
220 rp04 GridFTP Server (Griffin, Java, 0.9.0) Ready.
230 User keigelis logged in.
550 Requested action not taken. Path unavailable (e.g., path not found, no access).

[karolis@test13 ~]$ uberftp data.repo.cineca.it -P 2812 "ls /CINECA01/home/EUDAT_STAFF/keigelis"
220 rp04 GridFTP Server (Griffin, Java, 0.9.0) Ready.
230 User keigelis logged in.
total 1
drwxr-xr-x 2 keigelis nobody           0 Apr 03 16:10 .
d--x--x--x 2 keigelis nobody           0 Feb 26 15:00 ..

 

VPH/MAPPER Use cases

 17

considered irrelevant

Investigate whether (and how) the EUDAT Data Staging service could be used to transfer files between EUDAT and EGI sites. What should be done:
i.   Answer the question: Can the EUDAT Data Staging service be extended with SRM interface? How?
ii.  Does this service work with EGI X509 certificate proxies? (personal and/or robot)  

EGI - Gergely & Karolis 12 Mar done

EUDAT File Staging service is currently an extension of iRODS with GridFTP interface. The GridFTP extension uses Griffin technology. If the DPM community finds a safe GridFTP configuration for DPM storages, then reconfigure a few EGI sites, then use the current EUDAT service to access EGI DMP sites with GridFTP. (See action 15)

Update May 2

considered irrelevant

 
 18Find suitable sites from EGI for the MAPPER Nano-usecase workflow. Configure the sites according to the MAPPER workflow needs.  EGI - Gergely & Tiziana & Karolis12 Mar 

Done

qcg.inula.man.poznan.pl and qcg.reef.man.poznan.pl, site: PSNC
qcg.grid.icm.edu.pl, site: ICM
qcg.grid.cyf-kr.edu.pl, site: CYFRONET-LCG2
qcg.grid.task.gda.pl, site: TASK
endor.wcss.wroc.pl, site: WCSS64

 
 

https://goc.egi.eu/portal/index.php?Page_Type=Service_Endpoints&serviceType=QCG.Computing&searchTerm=&production=Y&monitored=Y&egiVisible=EGI

Tomasz Piontek (piontek at man.poznan.pl)

REQUIREMENTS. See slide 7 of the attached EUDAT-Nano-usecase.pdf that was presented at the London F2F meeting by Derek Groen. Summary of needs:

  1. MPI sites with 16-256 cores.
  2. 1-4h execution / job
  3. 75MB-4GB I/O data per job
  4. 25MB-1GB data transfer between workflow manager and site / job
  5. 20-40 jobs per simulation. Jobs run one after the other.
  6. Allow the installation of the QosCosGrid middleware service next to the local job queue (Sites in Poland and the UK have experience on how this can be done – talk to them. Tiziana knows the contacts)
  7. Advanced reservation. This can be achieved by
    1. Having many sites , so at least one of these will be very likely available immediately for the jobs. OR
    2. Pilot job manager (e.g. DIRAC) may be able to cancel non-MAPPER jobs and start MAPPER jobs immediately. Can DIRAC be used with QosCosGrid?

2013-05-02 Update: contacted CYFRONET (Poland) for support of the nano use case. T. Ferrari waiting for feedback from NGI UK about the installation of QCG software in 1 test site in UK. Possible candidate: Durham University

Update (May 17th)

2 Sites in Poland will support this use case. This requires the activation of a compute grant.

Derek is checking within MAPPER project partners if applying for an existing MAPPER grant to the sites will prove to be sufficient.

 MAPPER use case
19Elaborate a strategy to make use of registered data (PID)Ilya/Derek28 June (was 17 May)

Update (June 25th)

Derek will elaborate a proposal for the adoption of registered data. The proposal will be further reviewed and, in the case implemented, by Albert (EPCC) and Giacomo (CINECA).

MAPPER Use Case
20Give an estimation of how much storage space would be needed on EGI resourcesStefan17 MaydoneVPH
21Investigate the possibility to extend the VERCE use case including EPOS requirementsLuca/David/Alberto17 MaypendingVERCE/EPOS
22Contact Albert Heyrovsky (EPCC) to understand if EPCC (EUDAT) can allow MAPPER use their resourcesGiuseppe25 June

done

Update (June 27th)

EPCC will provide space to MAPPER. Data will be registered using the same VPH EPIC Prefix

MAPPER
23
  • Ingest data on EPCC
  • Test access on EGI resources
  • Make test transfers among EUDAT-PRACE-EGI
  • Test job submission

 

Stefan15 July VPH

 

Use Cases

one wiki page per use case

#use casecommunityuser

infrastructure contacts

EGI

e-infra service responsible

site service responsible

EUDAT

e-infra service responsible

site service responsible

PRACE

e-infra service responsible

site service responsible

 

technologiescomment
1VPH use case.pptVPH

Stefan Zasada

and ??

tbd

SR/DS /PID service team: Claudio Cacciari, eudat-safereplication@postit.csc.fi

EPCC: Albert Heyrovsky

 

Data services: Frank Scheiner

EPCC: Rob Baxter (tbc)

  
2

MAPPER use case

STEP3: nano-materials

UCL/VPHDerek Groen and ??
tbd

SR/DS /PID service team: Claudio Cacciari, eudat-safereplication@postit.csc.fi

EPCC: Albert Heyrovsky

SiteB (tbc):

Data services: Frank Scheiner

SURFsara (Huygens):  (tbc)

LRZ (SuperMUC): (tbc)

  
3VERCEEPCC/UEDINIraklys
tbd

SR/DS /PID service team: Claudio Cacciari,  eudat-safereplication@postit.csc.fi

CINECA: Giuseppe Fiameni

Data services: Frank Scheiner

CINECA (PLX): Giuseppe Fiameni

LRZ (SuperMUC): Ilya Saverchenko

  
4EPOSINGVAlberto Michelini
tbd

SR/DS /PID service team: Claudio Cacciarieudat-safereplication@postit.csc.fi

Data services: Frank Scheiner

  

References

Conclusions of the initial EGI/EUDAT/PRACE workshop on data management

  • No labels