Iscte

Mestrado

Engenharia Informática

Título

Big data analytics applied to sensor data of engineering structures: Automatic detection of outliers

Autor

Antunes, António Lorvão Ferreira

Resumo

De forma a controlar a segurança de estruturas de Engenharia Civil, estas são monitorizadas através de diversos sensores. Os dados de sensores são utilizados para construir modelos estatísticos e preditivos, porém necessitam de ser previamente tratados. A deteção e tratamento de Outliers (erros nos dados) é um processo lento e difícil que vamos tentar melhorar através da utilização de técnicas de Aprendizagem Automática e de Data Mining. Com o crescimento de Big Data, Outliers vão aparecer com mais frequência, e sem um método automático de deteção podemos não ser capazes de antecipar problemas e agir a tempo. Nesta dissertação temos como objectivo identificar e tratar Outliers em dados de sensores (utilizando dados reais de uma barragem), comparando e tentando melhorar os métodos actuais. Devido à falta de datasets classificados, vamos utilizar métodos de Clustering (agrupamento de dados) que nos permitem compreender que dados devem ou não ser classificados como outliers. Vamos introduzir um algoritmo que utiliza dados dos Sistemas de Aquisição Manuais e utilizá-lo juntamente com um algoritmo de clustering (DBSCAN) e métodos actuais de maneira a criar um método que é capaz de identificar e remover a maioria dos outliers nos datasets usados para demonstração.

To be able to control structural safety, engineering structures are monitored by different kinds of sensors. The sensor data collected is used to create statistical and predictive models, but data needs to be processed. Outlier detection and treatment is a costly and slow process that we will try to improve through the use of data mining and machine learning techniques. In a Big Data centered world, outliers will appear more often and without an automated way to detect them, we may not be able to anticipate and act on time. In this dissertation we will try to identify and treat outliers from sensor data (using real datasets from a dam), comparing and trying to improve the current baseline methods. Since we do not have labeled datasets, we will use clustering methods that allow us to group data and therefore understand which points should be classified as an outlier. We will introduce an algorithm that makes use of Manual Acquisition System measurements and combines it with a clustering algorithm (DBSCAN) and baseline methods to create a method that is able to identify and remove most of the outliers in the datasets used for demonstration.