Interoperability been data formats/metadata/applications has always been a research subject that I am interested on. Without searching very hard, I’ve found 4 of my publications with interoperability in the title! There are so many solutions covering different aspects, which makes it very hard to choose amongst the best data-format-application-framework-query-language-etc.
We defined, with a MsC studeng (Leandro Pulgatti), a simple solution, in which we didn’t want to define any new format-stantard-framework, but to use JSON as interoperability format between NoSQL databases (just a few of them!). Despite not being expressive enough to cover some data formats (e.g., an Ontology), we thought it could be used as an integration format amongst several other NoSQL databases. We added an extra restriction: we have focused on JSON documents withouth cross-elements references, otherwise other input formats would probably be more suitable.
This restriction was very useful, enabling to implement transformations taking JSON as input and producing as output Key-Value stores, Column stores, Document stores and Graph Databases. We covered 12 different NoSQL representation strategies (such as the ones presented in related work: Bugiotti 2013).
It also enabled to use the Pull-parser programming model. It has already been used to process XML documents in different scenarios. In this model, the JSON objects can be read from a stream in a per-object basis (with a loop and a next() method!), without the need to be keept in the main memory after being processed. The code reads the input objects, categorize them into 4 categories: 1) object boundaries, 2) collection boundaries, 3) object identification and 4) object types and then applies the implemented rules. An example of a categorized JSON is below.
{START_OBJECT
“Person“KEY_NAME: {START_OBJECT “firstName”KEY_NAME: “John” VALUE_STRING, “lastName”KEY_NAME: “Smith“VALUE_STRING, “age“KEY_NAME: 25 VALUE_NUMBER, “phoneNumber”KEY_NAME :
[START_ARRAY {START_OBJECT “type”KEY_NAME: “home”VALUE_STRING, “number”KEY_NAME: “212 555-1234″VALUE_STRING } END_OBJECT,
{START_OBJECT “type”KEY_NAME: “fax”VALUE_STRING, “number”KEY_NAME: “646 555-4567″VALUE_STRING }END_OBJECT
]END_ARRAY
}END_OBJECT
}END_OBJECT
More details about our solution, will all the rules and formats supported, are explained in this paper, part of the program at ICEIS 2018 (presentation here).