2 Background and Related Work and 2.1 From Bitcoin to Blockchains
2.2 Open and Permissionless Blockchains
2.3 Interoperability Between Blockchains
3 Cross-Chain Query Language and 3.1 Integrated Data Model
3.2 Grammar and Query Processing Architecture
4 Evaluation of Implementation Feasibility and 4.1 Software and Hardware Configuration
5 Conclusion and Outlook, Acknowledgment, and References
The language syntax is rooted in well-established concepts of data query languages, specifically the Structured Query Language (SQL). SQL and similar languages on the one hand permit formalized representation of queries through relational algebra, and on the other hand, allow queries and their execution to be comprehensible to domain experts without deep knowledge of the underlying concepts.
The syntax of SQL is structured around the ’SELECT-FROM-WHERE’ block (SFW block). Based on English-language commands, the ’SELECT’ clause conducts a projection in the underlying relational model, semantically equivalent to columns. This is followed by the source of the relations in the ’FROM’ clause and the selection of tuples utilizing conditions in the ’WHERE’ clause. In the relational model, set operations, and notably the Cartesian product, form the foundation for all queries. In particular, this strong theoretic foundation motives the application of the relational model. The relational algebra allows for processing multiple blockchain sources in a SOURCE clause such that (a) data can be combined, e.g., with associative operations, (b) attributes can be selected in a QUERY clause according to highly efficient projection operations, and (c) arbitrary combinations can be produced and filtered in a FILTER clause with very high efficiency. For the cross-chain data language, these concepts are applied in the following manner.
Requirements for Queries. Query statements consist of query (Q), source (S), and filter (F) clauses as follows:
Q Query attributes can be any attributes of the data model classes. Each attribute needs to be specified alongside its class, which establishes one column of the query result for each source. This practice prevents ambiguity for conflicting attribute names and allows users to select data based on the required attributes.
S Sources specify where data is extracted from in terms of blockchain and network classes. This can be paired with additional parameters including specific blocks, transactions, and accounts along with associated assets, tokens, and data.
To specify each source, attribute values of the identifying attributes from the Chain, Network, and ChainDescriptor classes must be given. This forms the base of the data source from where extraction will begin. Further specificity can be achieved by providing additional classes, attributes, and attribute values of identifying attributes from other classes such as Block, Transaction, Account, Asset, Token, or Data. This level of granularity allows for data queries targeted at one or more blockchains.
F Filters optionally refine the results of a query based on conditions. By using filters, specific subsets of data can be removed from the query result based on their attributes and attribute values. A filter is specified by a filter function which should contain a comparison operation taking two inputs in the form of query attributes into account. At run-time, filter functions compare the related attributes and their values. Filter functions are applied sequentially to the results obtained before. Due to sequential filtering, the query result only contains data meeting all specified filter conditions.
Grammar and Syntax. In the provided EBNF (Extended Backus-Naur Form) syntax in Listing 1.1, the structure of a query is divided into a series of clauses. These clauses are used to define the aforementioned aspects of each query and are further detailed and fully specified in the complete grammar[14]. Query clauses specify projections on the data returned from source clauses, where each source clause relates to the extraction of data as described in the requirements. Finally, filter clauses enable selection by attributes and attribute values through comparison functions. When specifying multiple values within any clause, multiple result sets are the result. In the case of SourceSpec, this would trigger the collection of data from multiple, optionally with a block, transaction, or account, as per requirements and data model. Accounts with assets, tokens, or data are given also according to the specification by the data model. The source and filter clauses are further detailed with the full EBNF grammar specification. For an implementation with a domain-specific language, the concrete syntax might be adopted according to its design guidelines with further usability considerations.
Query Processing Architecture. Figure 2 shows the steps involved in query processing within as part of an application architecture. An application initiates the process by issuing query statements to the parser component where clauses are constructed for further query processing in conjunction with a number of connected local nodes. In the query processing component, the source clause is processed for each specific source, i.e., each SourceSpec with network and chain data with their respective attributes leads to the collection of data from the connected nodes. The results are stored as instances of the data model classes. In the next stage of the process, the query attribute clause is processed. Each data model class instance is read to establish a newly appended column in the result table of the query. For the final process stage, the filter clause is applied with each of the specified filter functions, filtering the existing result table.
Author:
(1) Felix Härer[0000 −0002 −2768 −2342], Digitalization and Information Systems Group, University of Fribourg, Switzerland ([email protected]).
This paper is
[14] Available at https://github.com/fhaer/CCQL/tree/main/grammar