Challenges in building an efficient relational architecture for MASHQL
AuthorGeorgiou, Michael A.
PublisherΠανεπιστήμιο Κύπρου, Σχολή Θετικών και Εφαρμοσμένων Επιστημών / University of Cyprus, Faculty of Pure and Applied Sciences
Place of publicationCyprus
Google Scholar check
MetadataShow full item record
MashQL is a query-by-diagram mashup language, which collects web data that are expressed in a Resource Description Framework (RDF) and stores them into a backend database, allowing people to query it very easily. MashQL assumes that web data sources are represented in RDF and it can be inquired using a SPARQL query language. Resource Description Framework (RDF) is a language for representing information (metadata) about resources in the World Wide Web. In this paper we present the design and implementation of two important modules of the MashQL, the RDF Loader, which downloads and loads RDF data from the web into an Oracle’s RDF model database and the Query Optimizer, which is designed for the purpose of executing all MashQL’s queries successfully, efficiently and in a timely fashion. With the RDF Loader, we achieved to design and implement a concrete system that includes a combination of the market’s lasted technologies that exist in the Extract-Transform-Load (ETL) process for RDF data, such as Oracle, Java and Jena. On the basis of these technologies, we created a powerful, stable and intelligent RDF loader that loads any RDF data in any format and of any size in a very short time. The Query Optimizer, implements our optimization solution in order to provide MashQL's queries with the highest speed performance execution. Our optimization solution includes the creation of data summaries on top of the RDF data have are already been loaded onto the database and the BR-Algorithm that catches queries’ results regarding the most important MashQL's queries. Using the database summaries we have the advantage of, instead of scanning and sorting all the data during the query’s execution course, the data have already been sorted and pre-computed. This focuses on MashQL's queries requirements matter. By using the BR-algorithm the most important MashQL's queries acquired high response time, since their results are had already been caught in the database. For the highest algorithm's performance execution course, we achieved to reduce 3 times in average the original graph’s size by dividing it in three parts using our graph's partitioning novel idea. This partition concept helps the Bralgorithm to run faster ,producing less and more carefully caught data. Finally, our optimization solution against MashQL’s queries has been compared with Oracle’s corresponding technology and it presents very good results. More concretely, our solution is performing 10 times faster in MashQL queries and 45 times faster concerning the MashQL's most important queries.