Sample lifecycle

Let's imagine a scenario in which we have Clepsydra Storage instance, and three more services. The first one is responsible for extracting data records from some data sources and putting them into Clepsydra Storage. The second one observes all new data records in Clepsydra Storage and applies data processing on that records, according to seme preconfigured rules. The last one periodically checks whethere there were new data records in a specific format and gets such records for its own purposes. The potential interaction between these services is described below (refer to the REST API for more details):

  1. Agent from Clepsydra Aggregation component (or any other external service) submits POST request to /records URL. It provides the data record itself and three metadata elements: identifier of the data format (e.g. "USMARC"), identifier of the data source (e.g. "BNF-Gallica-Manuscripts"), identifier of the object which the data record represents (e.g. "1234"). The object identifier should be unique within the scope of given data source identifier. Similar operation with the same object identifier and source identifier can be repeated for representations of given object in other data formats (e.g. "thumbnail"). The data record is stored and it receives its own globally unique identifier assigned by Clepsydra Storage.
  2. Clepsydra Storage sends JMS notification about new data record.
  3. External processing service (may be a part of Clepsydra Processing} component) which subscribed for notifications about all MARC data records, gets information that new MARC record is available. The same approach can be used for modified records.
  4. The processing service, using the data record identifier submits GET /records/{id} request, and obtains data record for processing. The result of the processing in this example can be the data record transformed to a DublinCore format.
  5. Processing service submits PUT /records request reusing the object identifier and data source identifier from the first request in this example, but providing another data format identifier (e.g. "DublinCore").
  6. Clepsydra Storage sends JMS notification about new data record.
  7. Another external processing service (again, may be a part of Clepsydra Processing component) which subscribed for notifications about all DublinCore data records, gets information that the new record is available.
  8. The processing service, using the data record identifier submits GET /records/{id} request, and obtains data record for processing. The result of the processing this time is also in DublinCore schema but the data is semantically enriched by the processing service.
  9. Processing service has two options now. It can update the original DublinCore record, replacing the non-enriched one, or it can submit the enriched data as a new representation (using new schema identifier (e.g. "DublinCore enriched"). In this example the services uses the second option submitting new data record to the Clepsydra Storage.
  10. Finally, another external service using GET /records request, asks for all data record in "Dublin Core Enriched" schema, which were added or modfied since some specific time. It gets a number of records, which it uses for some other activities.