**ABSTRACT:**

Aggregation of data from multiple sensor nodes is usually done by simple methods such as averaging or, more sophisticated, iterative filtering methods. However, such aggregation methods are highly vulnerable to malicious attacks where the attacker has knowledge of all sensed values and has ability to alter some of the readings.

In this work, we develop and evaluate algorithms that eliminate or minimize the influence of altered readings. The basic idea is to consider altered data as outliers and find algorithms that effectively identify altered data as outliers and remove them. Once the outliers have been removed, use some standard technique to estimate a true value.

We calculate local outlier factor (LOF) for each data point. We propose methods for computing threshold values from these LOF values. The data points that have LOF value above a calculated threshold value are removed before computing an estimated signal from the data points reported by the sensors.

Thus, the proposed data aggregation algorithm operates in two phases: removal of outliers and computation of an estimated true value from the remaining sensor data. Extensive evaluation of the proposed algorithms show that they significantly outperform all existing methods. For simple cases, we have developed expressions for calculating LOF values.

**BACKGROUND**

**Model:**

The sensor network topology used for our work is an abstract model proposed in (see Figure 2.1). A sensor network is built of a base station and a set of sensor clusters. Each cluster has a cluster head that gathers data from all sensor-nodes connected to it.

A cluster head is also known as an aggregator node, because it gathers readings from multiple sensor nodes. The main functions of an aggregator node are collecting data from its sensor nodes, aggregating the raw data to produce an estimated reading, and communicating the processed data to the base station.

**Problem Statement:**

Sensor networks often operate in unattended environments and are deployed distributively, which makes them highly susceptible to failure and physical attacks. This creates threats to sensor networks security.

Although in general, security can be defined as the combination of availability, confidentiality, and integrity, in our work we focus only on integrity. In addition, it is assumed that the aggregator nodes and the base station are not compromised. Assuming these limitations, we identify possible threats to the sensor network and propose solutions to overcome them.

**RELATED WORK**

The extensive use of cheap and unreliable sensors mandates development of new algorithms for aggregating data securely, because these sensors are not only unreliable, but they are also easily compromised by malicious attackers.

Also, it is necessary to take into account that new types of attacks continue to emerge with time. Thus, a robust algorithm that is resistant to known attacks, also should be resistant to emerging attacks. There are a number of research approaches and works related to our topic.

**Iterative Filtering Algorithms:**

Certainly the most studied method related to our research is Iterative Filtering Algorithm. In general, the majority of the work was dedicated for rating networks, where users give ratings to objects.

Mizzaro proposed an algorithm for the assessment of scholarly papers. Yu et al.and, more recently, Zhou et al. presented iterative algorithms for rating networks. However, these reputation-based algorithms may not always converge.

**Enhanced Iterative Filtering Algorithm:**

To reduce effect of collusion attack, Rezvani et al. proposed a Robust Data Aggregation Method (RDAM), which is an improvement to IF method. In this method, unequal initial weights are calculated using the readings available to the aggregator.

This enhancement not only makes an IF algorithm collusion attack resistant, but it also reduces the number of required iterations to converge to an estimated value. The method is based on the assumption that the distribution of stochastic components of sensor errors is known or can be estimated.

**Statistical Estimation:**

If the readings from the sensors have no outliers, a statistical method for estimating true values is described here. The main assumption for the method is that every sensor has a certain error in its readings, because no sensor gives accurate readings continuously.

This suggests to find a method for estimation of a true value in such a way that all the readings are considered, but weights vary based on temporal variation pattern of the readings. The sensor that has least variance is considered to have provided better readings. We propose to use the variance of readings of a sensor to determine a weight that is applied to the readings of the sensor when a true value is estimated from readings of all the sensors.

**Outlier Detection Algorithms:**

Detection of outliers in a dataset is a well studied subject. But the information age, especially big-data and Internet of Everything (IoE), has seen renewed interest in the subject to fit the current needs. In our study, we use a recently proposed method and it is described in. In this section we provide a brief overview of the existing outlier detection method.

**AN ALGORITHM FOR LOF AND ITS APPLICATION**

**Steps for Calculations of LOF:**

In this section we present basic definitions and steps for computation of local outlier factor

(LOF) presented. The idea of LOF is based on the concept of local density. Computation of LOF involves six steps .

Since each step is essential for obtaining a final value of each data object and we will use these definitions for developing some novel computation algorithms for special cases, we formally define computation of each step.

**Selecting k for Computing LOF in the Presence of Collusion Attack:**

Weaknesses of the Local Outlier Factor method are complexity O(n2)and selecting the right k is not obvious. According to the optimal value of k is defined as k=(ng−1), where ng is the number of reliable objects. For our case the simplest way to select k is based on the assumption: there are considerably more ‘normal’ observations than ‘abnormal’ observations (outliers/anomalies) in the data.

**PROPOSED TWO PHASE DATA AGGREGATION ALGORITHMS**

In this chapter we present our algorithms. Logically and functionally our algorithms consist

of two phases: 1) Detection and removal of outliers, 2) True value estimation with remaining data (see Figure 5.1). Table 5.1 contains notations used in descriptions for the algorithms.

**Detection of Outliers:**

Following the logic of the construction of our algorithms, in this section we describe methods that we apply in the first phase of our algorithms. This phase is dedicated to the detection of outliers.

**Combinations of Algorithms:**

After the first phase of the algorithm is completed, readings suspected as outliers have been removed. In the second phase of the algorithm a true value is estimated from the remaining sensor-readings for each time instance. At this phase we can choose from IF, RDAM, and statistical estimation method.

**ESTIMATION OF LOF FOR SIMPLE DATASETS**

Let G={g1, g2, ···, gng} be a set of readings from ng‘good’ sensors. In our study, a reading from a ‘good’ sensor means that the sensor, the embedded device, and communication from the embedded device to the destination are normal; there is no hardware, software, and network issues have changed or altered the reading. Similarly, let B ={b1,b2,···,bn b+1}be a set of ‘bad’ readings from b b+1 sensors.

**EXPERIMENTAL RESULTS**

The experimental evaluation of the proposed algorithms aims to verify their reliability and efficiency in the presence of a collusion attack. Since the RDAM performs best in the presence of collusion attack, we use this algorithm’s performance as a benchmark for measuring that of the proposed algorithms.

In order to evaluate the accuracy of the investigated algorithms in the presence of collusion attack, we assume that an attacker compromises c sensor nodes out of n nodes in a cluster and c & lt; n/2.

**CONCLUSION**

In this work, we concentrated on the development of outlier detection methods. We described the conception of local outlier factor, which plays a key role in our methods. Then we proposed several two phase outlier detection algorithms.

The main feature of our algorithms is detection and removal of outliers before estimation true value. First, this increases the accuracy by removing the influence of outliers in aggregated result. Second, having only reliable data true values can be estimated non iteratively, which decreases computational cost. We used the RDAM as the benchmark for comparison since it has shown best results against original collusion attack. We tested RDAM and our algorithms against collusion attacks considered.

Moreover, we created examples of collusion attack scenario that were not considered. Under these novel attack scenario the proposed algorithms performed much better than the RDAM. We have presented experimental results that show that (1) the estimates have higher accuracy than RDAM and the algorithms have better efficiency than that of the RDAM. We observed that detecting local outliers for sensor readings using LOF is efficient.

Since computation of local outlier factor is not a low computational-cost method, we have found expressions that are useful for calculating outlier factors for simple data. While conducting this research we changed some original definitions and discovered promising opportunities that can lead to new clustering methods, which will be a scope for future work.

Source: University of Miami

Author: Anes Yessembayev