IoT Data Discovery and Learning




Journal Title

Journal ISSN

Volume Title



The massive number of Internet-of-Things (IoT) creates a torrent of data. These data may be stored and hosted by nodes dispersed over the edge of the Internet, forming peer-to-peer (p2p) IoT database networks (IoT-DBNs) that can be dynamically discovered and used to enhance daily operations and solve real-world problems. The issues toward making use of the massive amount of IoT data include how to discover the IoT data streams from the IoT-DBN and how to learn and extract useful knowledge from the discovered data to help cope with dynamically arising tasks. In this dissertation, we consider these two problems and develop solutions for them.

First, we consider the IoT data discovery problem in growing IoT-DBNs. We show the benefits of p2p unstructured routing for IoT data discovery and point out the space efficiency issue that has been overlooked in keyword-based routing algorithms. As the first in the field, this work investigates routing table designs and various compression techniques to support effective and space efficient IoT data discovery routing. Novel summarization algorithms are proposed, including alphabetical-based, hash-based, and meaning-based summarization and their corresponding coding schemes. We also consider routing table design to support summarization without degrading lookup efficiency for discovery query routing. To evaluate our approach, we collected 100K IoT data streams from various IoT resources and distributed them over a simulated Internet. Then, our data discovery routing with the summarization techniques is applied for handling discovery queries. The results show that our summarization solutions can reduce the routing table size by 20 to 30 folds with a 2-5% increase in latency compared with other peer-to-peer discovery routing algorithms. Our approach outperforms DHT-based approaches by 2 to 6 folds in latency and communication cost.

After IoT data discovery and retrieval, a prominent problem is how to learn from the data to address real-world tasks. Since different applications require different learning schemes, we choose to focus on one example application, the estimated time of arrival (ETA) problem, which is very important in intelligent transportation systems and has received a lot of attention recently. Though many tools exist for ETA, ETA for special vehicles, such as ambulances, fire engines, etc., is still challenging due to the scarcity or non-existence of data. To tackle it, we propose a deep transfer learning framework TLETA for the ETA of special vehicles, namely TLETA. TLETA constructs cellular level spatial-temporal knowledge for fine-grained extraction of driving patterns. The learning network contains transferable layers to support knowledge transfer between different categories of vehicles. Importantly, our transfer models only train the last layers to map the transferred knowledge, significantly reducing the training time to achieve real-time learning. We also introduce the inter-region transfer method to build a mapping function between vehicle domains within a region. The mapping functions of top-k region spatial-temporal similarity are then used to construct the predictor in regions whose target data is unavailable. The experimental studies show that our model outperforms many state-of-the-art approaches in accuracy and training time.



Computer Science