The Internet of Things (IoTs) is becoming ubiquitous in our everyday lives, implying that more technologies will generate data. IoT devices use sensors to monitor various attributes of the environment such as temperature, humidity, light, etc.
These sensors produce data periodically and storing this massive data in a database is becoming a huge challenge in the data storage infrastructure.
Prior research has proposed compression algorithms and signature techniques to reduce data storage but do not specify how the data patterns are defined. Since similar patterns are exhibited everyday by the environment, this data generates the same information from everyday sensing. Therefore, in this study, we propose a system that stores data models rather than storing raw data points.
Instead of storing each data point at a time, we develop and store data models with the corresponding time periods that captures the behavior of the sensor data. This helps in reducing data storage requirements.
The data models developed are mathematical polynomial models that fit a sample data set. In addition, we propose a sensor database structure that addresses the issues of data redundancy as well as temporal constraints in the database.
The integration of physical devices into data networks has progressed a lot in the recent years and it is setting a new pattern in the world of IoT. The data collected from various devices in sensor networks is in the form of physical environment measurements which are communicated to other end-user devices via the Internet.
The data communication between the sensor network and the Internet can be done using a gateway node. A gateway node has the power to convert disparate formats of the raw data to standardized formats, thus further reduction in transmission depends on this processing done at the gateway node.
Figure 2.1 shows the overview and the main components of an IoT system. In the following subsections, we briefly describe function of each component.
As IoT devices are increasingly becoming a part of our daily lives, interaction between these devices is producing more and more data. In wireless sensor networks, sensors are generating data periodically and hence the data is growing rapidly. Generating the data is easy but the issue is to manage and store this large volume of data for desired application.
Sensor Data Models:
In the aspect of avoiding redundant data generated from the sensors, data models in are created using polynomials while the sensor node is providing new samples. When adding points to the polynomial, the algorithm tries to add as many points to the polynomial by adding degrees to the polynomial to fit the data.
If the data does not fit, the polynomial keeps adding degrees until the maximum number of degrees is added. If the value point does not fit within the maximum number of polynomial degrees, the polynomial is stored with the timestamp, then, a new polynomial of degree zero is created to fit the next sample, and the process is restarted.
This algorithm is an online segment construction based on live machine learning. Versions for model elements are created if an attribute in the IoT object has changed, but this research has not reused their models for future time intervals.
Thantriwatte et al. developed a query processing system in WSN based on NoSQL database, but the work done is quite elementary. Instead of targeting the IoTs, it mainly talks about WSN, and the issue of how to store such huge amount of IoT data along with solution to adequately organize and manage this data was not addressed in the article. Also, query optimization performed is not as good as in the relational databases.
SENSOR DATA MODELS
Sensor data models provide an efficient way to represent data and minimize storage space with the same data utility. The actual data reading gathered by sensors from their environment are raw data points. Instead of storing these raw data points in the database, we can efficiently utilize the storage space by representing groups of similar raw data points in the form of mathematical equations.
There fore, we adequately manage the storage space by storing these mathematical equations (data models) in the database. Hence, in order to retrieve a raw data reading, instead of fetching a raw data point, we retrieve a data model that corresponds to this data point. We calculate the data value using the data model retrieved.
Generating Data Models:
The data models are generated from a set of raw data points, and they are stored in the database against a time interval this data model is considered an effective representation of raw data points that were observed at timestamps which lie within this time interval. The mathematical models, M1, M2 ,M3, …, MN, are polynomial equations that are stored in the database in the form of numeric coefficients of the equation with the corresponding time periods T1, T2, T3, …,TN.
Nowadays, sensors are everywhere and the data they are producing is growing at a phenomenal rate. The sensors gather information of various phenomenon in their environment on a regular basis. Thus and so, a large number of phenomenon readings are generated every day. And therefore, it is becoming difficult to store, manage and analyze this large volume of data.
As a solution to this, we have developed an algorithm for these phenomenon readings that converts raw data points into data models, as discussed in, and in addition, storing this large number of models still requires huge amount of space.
IoT Hierarchical Structure:
We are moving from a time where there are millions of devices connected to a network today to a time where there will be billions of devices connected to this network. We need to create a hierarchical structure, that makes query processing easier by creating a logical flow between IoT objects.
Our model-based IoT database is a database management system (DBMS) built for IoT objects and their various sensors. It presents a set of relational database operations that helps in creation of the database and solves complex data queries.
For IoT database, we use the standard terminology from the concept of relational database. A relational database is a collection of information related to a particular topic or purpose. It specifies the data types, structures, and constraints of the data to be stored. A database management system is a collection of programs that enables users to create and maintain a database.
A relation or a table is a format of rows and columns that displays related information. An attribute is a specific item of information that contains a homogeneous set of values throughout the table. Attributes appear as columns in a table. A record is an individual listing of related information that contains a number of related attributes stored in a table.
Query IoT Model Database:
Queries play a major role in the abstraction of data from a database. A relation is defined by using a set of operations. It consists of a set of attributes, their data types and a set of constraints on the records to be inserted in the relation. The CREATE TABLE operation specifies the layout of a relation.
Once a relation is defined, the data records are inserted in it. While inserting, the database verifies that the attribute values satisfies the domain or the data type of the corresponding attribute. The SELECT operation is used to retrieve a set of records from a relation. With the REMOVE operation, records can be deleted in a relation.
This chapter provides experimental results that we obtained from the implementation of our proposed Generation of Sensor Data Model algorithm that develops data models from a set of raw data points. We also developed a relational database to store these data models and perform user query execution.
For this experiment, we have installed a TM4C1294 Connected LaunchPad Evaluation Kit which is a low-cost development platform ARM Cortex-M4F-based micro-controllers. We have also installed a Texas Instruments Sensor Hub BoosterPack (BOOSTXL-SENSHUB) which is an add-on board that provides a platform for evaluating the use of ARM Cortex-M4F-based TM4C devices in sensor fusion applications.
IoT Model Database:
In our relational database, we construct a relation known as objects, as shown in Table 6.1, that contains all the information about an IoT object such as its ID, name and type. This relation is created using a set of queries, as shown in Figure 6.1, where objects is the name of the relation in a database. It contains three attributes-objectID, objectName, and objectType where objectID is the primary key of this table. These attributes represent the identification numbers, name and type of various IoT objects.
To generate the models, we used past data from a SHT21 temperature sensor that returns the temperature in degrees Celsius. Then, we applied the Generation of Sensor Data Model algorithm to produce our data models. The algorithm was applied to 12 days’ worth of data to capture all the behaviors that the data exhibits throughout the day.
With the increasing trend of information communication technologies, data is being generated at very high rates. Data is becoming very hard to manage and an efficient way to organize data in databases is an important issue.
IoT model databases is becoming an important notion to alleviate data generation by decreasing the space that data consumes while also maintaining the same information. Data models also provide data with a negligible error that can fit many raw data points from sensors. These models are created by fitting a function to the data points.
In this research, we used polynomials with different order, for example, first order, second order, etc to fit the data points. Our algorithm, Generation of Sensor Data Model, finds a polynomial curve whose parameters are the coefficients of the polynomial equations.
These parameters now cover many raw data points within a time range. In other words, with data models we can represent enormous amount of data points without having to overfill databases or sacrifice data utility. Also, an IoT storage management architecture is proposed to meet the needs of massive IoT data.
It not only supports how to reasonably and effectively store big IoT data but is also concerned about how to respond to the queries satisfying the temporal and spatial correlation constraints.
As future work, more robust algorithms can be created to segment data into more accurate models using a set of mathematical functions other than the polynomials of higher degrees such as logarithmic functions. Finding the most probable model efficiently will also help the system save energy.
Source: University of Miami
Author: Parul Maheshwari