Abstract
Rapid developments in hardware, software, and communication technologies have facilitated the emergence of Internet-connected sensory devices that provide observations and data measurements from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As these numbers grow and technologies become more mature, the volume of data being published will increase. The technology of Internet-connected devices, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interactions between the physical and cyber worlds. In addition to an increased volume, the IoT generates big data characterized by its velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this big data are the key to developing smart IoT applications. This article assesses the various machine learning methods that deal with the challenges presented by IoT data by considering smart cities as the main use case. The key contribution of this study is the presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying a Support Vector Machine (SVM) to Aarhus smart city traffic data is presented for a more detailed exploration.
1. Introduction
Emerging technologies in recent years and major enhancements to Internet protocols and computing systems, have made communication between different devices easier than ever before. According to various forecasts, around 25–50 billion devices are expected to be connected to the Internet by 2020. This has given rise to the newly developed concept of Internet of Things (IoT). IoT is a combination of embedded technologies including wired and wireless communications, sensor and actuator devices, and the physical objects connected to the Internet [1], [2]. One of the long-standing objectives of computing is to simplify and enrich human activities and experiences (e.g., see the visions associated with “The Computer for the 21st Century” [3] or “Computing for Human Experience” [4]). IoT requires data to either represent better services to users or enhance the IoT framework performance to accomplish this intelligently. In this manner, systems should be able to access raw data from different resources over the network and analyze this information in order to extract knowledge.
Since IoT will be among the most significant sources of new data, data science will provide a considerable contribution to making IoT applications more intelligent. Data science is the combination of different scientific fields that uses data mining, machine learning, and other techniques to find patterns and new insights from data. These techniques include a broad range of algorithms applicable in different domains. The process of applying data analytics methods to particular areas involves defining data types such as volume, variety, and velocity; data models such as neural networks, classification, and clustering methods, and applying efficient algorithms that match with the data characteristics. By following our reviews, the following is deduced: First, because data is generated from different sources with specific data types, it is important to adopt or develop algorithms that can handle the data characteristics. Second, the great number of resources that generate data in real-time are not without the problem of scale and velocity. Finally, finding the best data model that fits the data is one of the most important issues for pattern recognition and for better analysis of IoT data. These issues have opened a vast number of opportunities in expanding new developments. Big data is defined as high-volume, high-velocity, and high variety data that demands cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation [5].
With respect to the challenges posed by big data, it is necessary to introduce a new concept termed smart data, which means: ”realizing productivity, efficiency, and effectiveness gains by using semantics to transform raw data into Smart Data” [6]. A more recent definition of this concept is: ”Smart Data provides value from harnessing the challenges posed by volume, velocity, variety, and veracity of Big Data, and in turn providing actionable information and improving decision making.” [7]. Finally, smart data can act as a good representative for IoT data.