General introduction to the B2SAFE service
Modified: 29 January 2018
B2SAFE is EUDAT's service for the secure long-term preservation of research data. The safety of the data is ensured by means of B2SAFE's replication mechanism which automatically replicates the data to one or several backup sites and which maintains all information on replicas in an additional system to guarantee the findability of data.
This document outlines B2SAFE’s functionality. For more insight into the technical details and testing we refer to an extensive modular hands-on tutorial available at the EUDAT training repository on GitHub which covers:
What is B2SAFE?
B2SAFE is EUDAT's service for secure long-term preservation of research data. Data in B2SAFE is kept safe by replicating them to one or several other EUDAT sites, i.e. creating redundant copies of data and maintaining those by different administrative units.
Additionally to the replication workflow, the B2SAFE technology offers the framework to implement community-specific data policies. B2SAFE can store and replicate large amounts of data. It is meant to be used by repositories to preserve and backup their data collections. Moreover, B2SAFE can replicate reference datasets to various compute sites which are usually co-located with the B2SAFE endpoints.
Managing copies (replicas) of data across different sites requires a mechanism to verify data integrity and to manage the different data endpoints. To this end B2SAFE employs Persistent Identifiers (PIDs). PIDs are usually guarantee the identity of data and provide the means to cite data. B2SAFE employs them in a slightly different way. Each data object, original and replica, is assigned with a PID. The PID record itself contains all necessary information to link a data object to its parents if it is a replica, or it lists all direct children if the data object is a original data object. EUDAT designed a specific PID profile which structures this metadata.
B2SAFE can be employed in two modes. 1) A community centre can join the B2SAFE network which requires to deploy the full B2SAFE software stack or 2) a community can use B2SAFE, please see the Section "More Information" at the bottom of this page.
In this section we will explain how B2SAFE works in detail, which software it is based on and which policies and workflows are supported and can be configured.
B2SAFE is based on the data management software iRODS and is implemented as a specific set of policies in iRODS called rules. The B2SAFE rule set integrates data handling in iRODS with tracking data across iRODS instances by means of Persistent Identifiers.
While iRODS itself offers to create workflows and rules with which users can directly work on data, data in iRODS that is subject to the B2SAFE ruleset is not meant to be accessed directly by the user (scientist), i.e. users should not be allowed to change data; data stewards should be careful with changing data and if doing so make sure that all necessary replicas and information on the data are updated and propagated through the replication chain. It is advised to have an iRODS Expert on Site when running the service in joining mode.
Short introduction to iRODS
iRODS is a Data management framework. It consists of storage which can be configured individually per iRODS instance; metadata database called iCAT to keep information on users, access rights, and additional information on files and folders; and a rule engine to implement and execute data management policies.
An iRODS instance is called an iRODS zone and is defined by a metadata database called the iCAT. This database contains all metadata on users, data objects (files), data collections (folders) and storage systems in the iRODS zone. Metadata which the system creates automatically are data size, checksums, last accession date and creation date. Moreover, users and systems can add own metadata structured as key-value-unit triples. This feature is used by B2SAFE to create links to replicas in other iRODS zones.
iRODS abstracts from actual storage system and location and provides a so-called logical path. This feature allows to replicate data in a unique way between iRODS zones without knowing anything about the configured storage media and is one of the concepts B2SAFE relies on.
The rule engine executes iRODS rules - iRODS implementations of low-level data policies. Rules can be called by command line (client side calls), they can be automatically invoked by a certain action in the iRODS system or can be executed on a regular basis (server side calls). B2SAFE implements its data policies as iRODS rules which can be combined to achieve the appropriate behaviour for a community (see Section Example workflows).
B2SAFE's replication mechanism
B2SAFE replicates data from one iRODS zone to another, i.e. copying data to another administrative domain. This lowers the risk of complete data loss, however it increases the need for proper management of replicas across zones.
The replication sites can be configured in iRODS itself as federated iRODS zones (reference to the iRODS manual for Federations)
In a long replication chain not all iRODS zones are directly federated. That means that from the original site one does not have the means to check the integrity of replicas by using iRODS mechanisms (see Figure 1). Hence, there is the need for an external system to log the replication chain and provide some minimal information to ensure data integrity across sites.
Figure 1: Replication across three iRODS zones; one community centre X replicates its repository data to an EUDAT centre Y, which in turn replicates the data to another EUDAT centre Z. By this three independent copies of the data are made. The blue arrows indicate direct access to the information in iRODS. That is to say the community centre can ensure the data’s integrity in the EUDAT centre Y, while the EUDAT centre Y can ensure the data integrity with EUDAT centre Z. However, the community centre cannot verify the data’s integrity with EUDAT centre Z by means of iRODS’s mechanisms (black arrow).
Tracking replicas across sites
To track replicas and record the whole replication chain of data in B2SAFE an external service is needed. B2SAFE uses the EUDAT persistent identifier service B2HANDLE.
In general PIDs are used to reliably identify and cite data objects throughout their lifecycle and they are thus a vital part of long-term data management. More specifically, B2SAFE employs PIDs and designed specific PID profile to reliably find and identify replicas.
A persistent identifier is an opaque string which usually is resolvable by the HTTP protocol and thus contains the mapping from the opaque string to a URL. Upon creation one can add more information to the PID. In the case of EUDAT's B2SAFE service PIDs of direct replicas, the direct parent's PID and a link (usually also a PID) to the very first data repository is added.
Figure 2: B2SAFE replication with creation of PIDs. Assume we are replicating data from a community centre to one other EUDAT centre. The figure shows the creation of PIDs and their additional information. 1 - The community calls the B2SAFE rule which creates a PID (opaque and unique string) for the data object (DO1) and 2 - creates an entry in the PID system using B2HANDLE containing the additional information EUDAT/CHECKSUM and the identifier for the data in the community repository EUDAT/ROR. 3 - Subsequently the B2SAFE rule for replication is called which creates a copy of DO1 at EUDAT Centre Y. 4 - The same rule registers the new data copy with a new PID and 5 - creates an entry in the PID system with the following information: EUDAT/CHECKSUM, the PID of the direct parent EUDAT/PARENT, the EUDAT/ROR and the PID to first EUDAT centre that holds the data (EUDAT/FIO). 6 - Finally the PID of the direct parent at Community Centre X is updated with the location of its replica (EUDAT/REPLICA).
The PID generated by B2SAFE and all information stored in the PID system is publicly accessible. Figure 2 describes how the B2SAFE module integrates the replication of data with the PID system.
With the integration of PIDs a data centre can now track all replicas across the replication chain as shown in Figure 3.
Figure 3: Integration of the PID system and iRODS. Blue arrows indicate data replication between iRODS zones, black arrows indicate PID registration. With the fields EUDAT/PARENT and EUDAT/REPLICA one can follow the full replication chain in the PID system and retrieve the actual location of data (URL field in the PID entry). In addition to the automatic metadata which is created in iRODS, B2SAFE also creates entries for the PID of the data itself, its parent and its replicas. Thus, having access to one data replica in the replication chain one can enter the PID system and retrieve all information to follow the replication chain.
The B2SAFE module is a set of iRODS rules which can be put together in workflows enabling data replication and PID management. In this section we describe several typical B2SAFE workflows. The full documentation of workflows and their respective code examples can be found on the service’s wiki.
Creating PIDs for files and folders is essential to track replicas across different administrative domains. Figure 4 shows how the B2SAFE rule attaches a PID to either a data object or a collection. The rule EUDATCreatePID from the B2SAFE ruleset takes as input the iRODS logical path to the collection or data object which should receive a PID. The rule will then connect to the PID service (B2HANDLE) create a PID, create the PID and store it in the iRODS metadata catalogue. By this the link between iRODS and the PID is established.
At the same time metadata such as the checksum is stored in the PID system to enable cross-domain integrity checks. Optionally one can provide a link to the original data (ROR) or the PID to the direct parent of a data object. This information is stored as iRODS metadata and in the newly created PID and by this establishes the upper connection in the replication chain.
Figure 4: A B2SAFE client rule gathers the iRODS logical path and optionally a PID pointing to the original data (ROR) or the direct parent (PARENT) and propagates these to the B2SAFE EUDATCreatePID rule. In turn this rule establishes the connection to the PID service, creates the PID with respective metadata and stores the created PID as metadata in iRODS.
The rule EUDATCreatePID from the B2SAFE rulebase can be called by another rule or an event hook in the iRODS system itself. In both cases the input for the PID creation needs to be propagated to the pid creation rule.
The replication according to the EUDAT policies is triggered by the rule EUDATReplication which is part of the B2SAFE ruleset. This rule steers the replication across iRODS zones. It takes as input parameters the iRODS path to the source and the destination object or collection.
One can execute and suppress the creation of PIDs (Figure 5, upper panel) or PIDs can be created synchronously with the replication (Figure 5, lower panel). In the first case there will be no link in the irods metadata database nor in the PID system to build the replication chain. The PID creation and thus the creation of metadata to build the replication chain is triggered by setting the flag registered to true.
Since the PID registration costs some time it can be advantageous to decouple the data replication and PID creation when transferring large collections of data. In such a case the replication rule needs to be combined with the EUDATPIDRegistration rule from the B2SAFE rulebase. This rule ensures that after data transfer, PIDs are created and the replication chain is built in both the iRODS metadata database and the PID system.
Figure 5: The EUDAT replication. Upper panel: The EUDATRelication rule steers the replication of data across iRODS zones. In this case only minimal information on the data is stored in the iRODS metadata database, the link between the original data and its replica is not introduced. The PID creation upon replication can be triggered by setting the flag ‘registered’ (see lower panel). Here PIDs are generated as soon as the data is replicated (synchronous PID registration) and the link between the original data and the replica is introduced in the iRODS metadata database and the PID system.
The B2SAFE module offers also rules for integrity checks across zones, recovering failed transfers and updating the information on data location in the PID system in case of changing the iRODS path to the data. Furthermore, the ruleset contains experimental features like community metadata handling and messaging.
EUDAT communities can deploy B2SAFE or let an EUDAT site run B2SAFE for them.
For an extensive, modular, hands-on training course on B2SAFE, please see the EUDAT training repository on GitHub.
Support for B2SAFE is available via the EUDAT ticketing system through the webform.
If you have comments on this page, please submit them though the EUDAT ticketing system.
Service Level Agreement
Service Level Agreement
This document is a service level agreement (SLA) between EUDAT and a CUSTOMER to cover the provision and support of the B2SAFE SERVICE.
This agreement is made between:
EUDAT, [address – city –country]
[Customer name], [address – city –country]
represented by [Customer representative].
This SLA is valid from
[date] to [date].
The purpose of this SLA is to agree quality and reliability requirements and targets of a B2SAFE SERVICE. In addition, this document defines processes, roles, and contact addresses that are used to ensure that required service level can be maintained.
EUDAT services are provided by EUDAT Service Providers listed in Annex A. These Providers are contractually committed to provide services for EUDAT CUSTOMERs. EUDAT Service Providers that provide services to CUSTOMER are defined in a contract between EUDAT and CUSTOMER.
B2SAFE is a data management service which allows community and departmental repositories to store their research data in a trustworthy manner.
B2SAFE service packages enable selection of number of replicas of data to be selected. By default, two copies of data are created. Both of these copies locate in the same site.
Additional service packages:
The service is available on a 24x7 basis excluding time required for service maintenance and incident management. Service Desk hours are as follows:
09-16 CET, normal working days.
Normal working days are from Monday to Friday excluding generic holidays, i.e. new year's day, good Friday, Christmas day, and 26:th of December. Detailed service hours and holidays of each EUDAT Service Provider are described in detail in EUDAT service information web page described in section 7.1.
The following exceptions apply:
EUDAT B2SAFE has a maintenance breaks when the service is not available or service level is degraded. The duration of the service break, actual day of the week, and actual time depend on used EUDAT Service Providers. Location of detailed information is given in section 7.1 (Eudat service information).
Maintenance windows are advertised in advance using the channels described in EUDAT service information web page at least 14 calendar days before the regular maintenance. CUSTOMER (Customer contact for the service provider) will be notified by email. As a part of incident management process, announcement can be given later. Service breaks announced at least 7 calendar days before the maintenance break are considered as planned and they do not affect the Service Reliability (See section 5). Service breaks advertised in short time (less than 7 calendar days before the maintenance) are considered as emergency service breaks.
Unplanned service break due to emergency maintenance is only done as a part of incident management process required to ensure integrity of data or to fix security vulnerability. Unplanned service breaks are taken into account on Service Reliability.
More detailed EUDAT Service Provider dependent information is given in EUDAT service information web page described in section 7.1.
The services covered by the scope of this SLA are provided with the following level of support.
Service Desk Response Time
Service Desk Resolution time
Service Desk Response Time is the Target time for a Service Desk to respond to a service request. Service Desk Resolution time describes the maximum time for a Service Desk to fulfill the standard service request. In most of the cases Service Desk Resolution time is considerably shorter than the Service Desk Resolution time given above. Non-standard service requests include service requests of new customers and large capacity requirements.
Disruptions to the agreed service operation will be handled according to an appropriate severity classification of the incident. In this context, following guidelines apply:
Incident Response Time
Total loss of a Service:
4 hours during normal working time
Service is degraded:
1 working day
Minor incident / warning:
4 working days
EUDAT helpdesk agents acknowledge the reception of the information about the incident, notify the severity level and provide, if possible, a first and not committed estimate of the time required to restore normal operational status. If an incident cannot be solved within the mentioned timeframe, i.e. Incident Resolution time, a statement explaining the reasons is issued. This statement will contain a plan how the incident will be solved during the following working days.
Security incident resolution coordination is to be lead by EUDAT security officer. Security incident is defined such that there has been an event indicating possible unauthorized access to data, applications, services, networks and/or devices by bypassing their underlying security mechanisms. Alternatively, known vulnerability and tools to utilize the vulnerability exist.
The following are the agreed service level minimums for the B2SAFE Service:
Service Level target
Service Reliability is measured using EUDAT monitoring system. EUDAT monitoring system checks that the B2SAFE service responds correctly and that a test file can be accessed. Availability checks are done in 5 minute intervals.
B2Safe service level target is quarterly average of Service Reliability. Quarterly average is calculated for following periods: January-March, April-June, July-September, October-December.
The provisioning of the service under the agreed service level targets is subject to the following limitations and constraints:
The following contacts will be generally used for communications related to the service in the scope of this SLA:
Contacts are defined in Annex B. Contact information.
EUDAT service information address is a web page that contain information about service hours, generic holidays, service provider specific holidays, and service maintenance breaks.
EUDAT commits to inform the CUSTOMER, if this SLA is violated or violation is anticipated. The message contain detailed information about what parts of the SLA were violated, when this violation occurred, and what kind of effects this have had to User experience. The message will also contain initial plan what will be done to ensure that SLA violations will not happen again.
The following rules are agreed for communication in the event of SLA violation:
Customer contact for the service provider will be contacted by EUDAT by e-mail.
EUDAT is committed to support high quality services. In case the incident has not been solved during the incident resolution time or there are other issues in the service prohibiting or disturbing normal usage incident escalation (or complaint) can be initiated by contacting EUDAT contact for escalations.
Escalation process has following two levels.
1) EUDAT contact for escalations
2) EUDAT operations coordinator
EUDAT contact for escalations is a person responsible of EUDAT service production in one site. If needed, he/she will work in collaboration with other EUDAT Service Providers involved in B2Safe service production. EUDAT contact for escalation contacts the customer directly, makes a new analysis of the situation, makes a plan and schedule how the issue will be solved, and informs Customer about status of the escalation process. EUDAT contact for escalations keep the EUDAT operations coordinator informed.
In the case the escalation process stalls for longer than three normal working days, responsibility of the escalation process will be changed to EUDAT operations coordinator. In addition to this internal escalation, CUSTOMER has also a possibility to raise the escalation level to EUDAT operations coordinator, in case the escalation process seems to be stalled and the service disturbances cause high risk to critical customer operations.
The following rules for information security and data protection apply:
The parties recognize legal requirements including the European General Data Protection Regulation on protection personal data in their roles of Data Processor and Data Controller, and:
EUDAT B2Access service provider is the data controller.
EUDAT B2Safe service provider is the data processor
Data preservation by the EUDAT does not include or imply any transfer of ownership or immaterial property rights, unless otherwise agreed upon in writing.
The Customer is responsible for ascertaining that he is authorized to transfer the data, documents, or other content to EUDAT. This means the Customer has to secure the relevant permission from third parties, for example, from data producers, authors, and right holders.
The EUDAT ensures:
The CUSTOMER has to:
The following documentation of B2SAFE service is available :
EUDAT provide some hands-on material in order to support the customer in the B2SAFE service deployment and integration.
For the purpose of this SLA, the following terms and definitions apply: See Annex C: Glossary
[Unique document identifier]
SLA for EUDAT B2SAFE service
Definitive storage location
[Storage location, e.g. URL of the file on a server or document management system]
[Name of the person primarily responsible for maintaining and reviewing this document]
Last date of change
Next review due date
Version & change tracking
CSC EUDAT B2SAFE
CSC - Finnish IT Сentre for Science Ltd
P.O. Box 405 (Keilaranta 14)
tel. +358 9 457 2821 (operator)
Business ID: 0920632-0
(hereinafter referred to as "we" or "CSC"
This service is one of the services developed and maintained by EUDAT limited company (Y-tunnus) (together with EUDAT service providers.
In case a customer has a contract with CSC, CSC uses EUDAT limited company as a subcontractor. In case a customer has a contract with EUDAT limited company, CSC act as a subcontractor for EUDAT limited company.
CSC Service Desk
tel. 09 457 2821 (operator)
Data Protection Officer Marita Pajulahti
CSC EUDAT B2SAFE Service
The service holds and processes following personal data of customers or other registered persons:
Following personal data may also have been retrieved from a remote identity provider, i.e. from EUDAT B2ACCESS:
In addition, we collect technical information like logfiles on the service activity.
The lawful basis for processing your data is either the performance of a contract
between you and CSC, or CSC’s legitimate interests based on the relationship between
you and CSC.
We process your personal data to
The personal data retrieved from remote identity provider is needed to map you to the local account, contact you and provide a comfortable interface.
Cookies are text files that are stored in a computer system via a web browser.
You may, at any time, prevent the setting of cookies through our service by means of a corresponding setting of the web browser used, and may thus permanently deny the setting of cookies. Furthermore, already set cookies may be deleted at any time via a web browser or other software programs. This is possible in all popular web browsers. If you deactivate the setting of cookies in the web browser used, not all functions of our service may be entirely usable.
We acquire data primarily from the following sources
Upon request, we hand over the personal data for statistical and reporting purposes and for fulfilling our commitments and obligations contained in contracts or other agreements for Ministry of Education and Culture, Finnish institutions of higher education and research funders.
In addition, to enable EUDAT service management and development in EUDAT community, personnel data may be given to EUDAT limited company. EUDAT limited company may share the data among other EUDAT service providers.
Neither we nor EUDAT limited company transfer personal data outside of the EU/EEA.
As a data subject, you have the right to inspect the data about yourself that has been saved into B2SAFE and B2ACCESS services and to demand the correction of inaccurate data or its removal, provided that there is a legal justification for its removal. You also have the right to withdrawal your approval or change it.
As a data subject, you have the right under the General Data Protection Regulation (as of 25.5.2018) to oppose the collection of your data or to request that it be restricted and to make a complaint about the processing of personal data to the supervisory authority.
As a data subject, you also have the right, at any time and without cost, to oppose data-processing, wherever it relates to direct marketing.