Continuous decaying of telco big data with data postdiction
Mokbel, Mohamed F.
Google Scholar check
MetadataShow full item record
In this paper, we present two novel decaying operators for Telco Big Data (TBD), coined TBD-DP and CTBD-DP that are founded on the notion of Data Postdiction. Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which does not exist anymore as it had to be deleted to free up disk space. TBD-DP relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary. Our proposed TBD-DP operator has the following two conceptual phases: (i) in an offline phase, it utilizes a LSTM-based hierarchical ML algorithm to learn a tree of models (coined TBD-DP tree) over time and space(ii) in an online phase, it uses the TBD-DP tree to recover data within a certain accuracy. Additionally, we provide three decaying focus methods that can be plugged into the operators we propose, namely: (i) FIFO-amnesia, which is based on the time that the tuple was created(ii) SPATIAL-amnesia, which is based on the cellular tower’s location related with the tupleand (iii) UNIFORM-amnesia, which picks randomly the tuples to be decayed. Similarly, CTBD-DP enables the decaying of streaming data utilizing the TBD-DP tree to extend and update the stored models. In our experimental setup, we measure the efficiency of the proposed operator using a ∼10GB anonymized real telco network trace. Our experimental results in Tensorflow over HDFS are extremely encouraging as they show that TBD-DP saves an order of magnitude storage space while maintaining a high accuracy on the recovered data. Our experiments also show that CTBD-DP improves the accuracy over streaming data.