I Dream of IoT/Chapter 5 : IoT and Big Data

Introduction to big data
The Internet of Things (IoT) is an integrated part of the future internet where physical and virtual things that can interact with objects, animals, or people receive unique identifiers. It also has self-configuring capabilities that are able to transfer data over an internet network without need of interaction. The Internet of Things has been in development for decades, but the concept wasn't named until 1999. The first internet appliance, for example, was a Coke machine at Carnegie Melon University in the early 1980s. The programmers could connect to the machine over the internet and were able to check the status of the machine and determine whether or not there would be a cold drink awaiting them, should they decide to make the trip down to the machine. Nowadays, IoT is applied to many systems that can benefit humans, including machine-to-machine systems, cloud systems, human-to-machine systems, and big data systems. "Big data" is a buzzword used to describe the massive volume of both structured and unstructured data that accumulates across many enterprises today, data the remains difficult to process and manage by traditional database and software techniques. Big data is important and has the potential to help companies make faster and more intelligent decisions while also improving company operations. A simple example of big data in use: retailers can track user web clicks to identify behavioral trends that improve campaigns, pricing, and product stocking.

Internet of Things and big data
With respect to commercial, industrial, and other applications, IoT and big data are two different topics. IoT refers to the world of devices connected to the internet, which is the method by which much of big data is collected, stored, and managed. The discussion of big data additionally includes the analysis of this information to produce useful results. In short, big data is about data, plain and simple, and IoT is about data, devices, and connectivity.

IoT consists of three main components: the things (or assets) themselves, the communication networks that connect them, and the computing systems that make use of the data flowing to and from our things. Using this structure, assets can communicate with each other and optimize activities between them based on the analysis of data streaming through the network. Big data, on the other hand, relates to data creation, storage, retrieval, and analysis that is remarkable in terms of  :

Volume

Aside from its inherent value and potential, the sheer large quantity of structured and unstructured data largely determines whether it can be considered to be big data or not. IBM estimated in 2014 that most U.S. companies have at least 100 terabytes of data stored.

Variety

Big data is not simply coming from one industry in one format. From healthcare to social media, the variety of data types and formats is similarly as daunting as the volume.

Velocity

This refers to how quickly big data is generated and analyzed to meet demand.

Veracity

The quality of data being captured can vary greatly. Accuracy of analysis depends on the veracity of the source data. IBM estimated in 2014 that poor data quality costs the U.S. economy $3.1 trillion per year.

Variability

This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

Complexity

The large volumes of data we generate need to be linked, connected, and correlated in order to retain some level of usefulness. Complexity refers to the attributes of big data that make that task more difficult.

Despite these challenges, IoT and big data can be used to improve operations. It helps to determine where data is produced and collected across a wide array of vertical markets, including but not limited to agriculture, electricity, forestry, water treatment, and almost every type of manufacturing facility. IoT and big data can potentially be implemented to improve predictive health monitoring, lessen downtime, lower reject rates, improve quality, increase throughput, improve safety, streamline labor, and enable mass customization of manufacturing and other related vertical industry operations. These operational improvements will optimally result in better products, increased quantity, and lower costs.

Big data operations
Big data operations vary from system to system, but they all essentially capture and store incoming data, which will be analyzed later to gain insights, improve operations, or make discoveries. This processing of data is based on three major steps: data intake, storage, and analytics. This data is managed using new technologies such as Hadoop, Map Reduce, etc. These tools are necessary as the volume of date continues to increase, particularly as IoT transforms the environment with the addition of more connectable devices. When this happen, better and faster processing technologies need to be introduced to allow all this information to be analyzed.

Manufacturing
Based on a TCS 2013 Global Trend Study, improvements in supply planning and product quality provide the greatest benefit of big data for manufacturing. Big data provides an infrastructure for transparency in the manufacturing industry, with the ability to unravel uncertainties such as inconsistent component performance and availability. Predictive manufacturing as an applicable approach toward near-zero downtime and transparency requires vast amounts of data and advanced prediction tools for a systematic processing of data into useful information. A conceptual framework of predictive manufacturing begins with data acquisition where different type of sensory data is acquired, including acoustic, vibration, pressure, current, voltage, and controller data. The vast amount of sensory data, in addition to historical data, make up big data in manufacturing. The generated big data acts as the input into predictive tools and preventive strategies.

Internet of Things
The second most popular use of big data is in IoT-connected devices managed by hardware, sensor, and information security companies. These devices are sitting in their customers' environment, and they phone home with information about the use, health, or security of the device.

Storage manufacturer NetApp, for instance, uses Pentaho software to collect and organize messages that arrive from more than 250,000 NetApp devices deployed at its customers' sites. This unstructured machine data is then structured, put into Hadoop, and then pulled out for analysis by NetApp.

Information security
Large enterprises typically have sophisticated information security architectures, as well as security vendors looking for more efficient ways to store petabytes of event or machine data. In the past, these companies would store this information in relational databases. These traditional systems tend not to scale well, both from a performance and cost standpoint. The previously mentioned Hadoop is a better option for storing such machine data.

Advantages
1. Big Data can give accurate access to more data than ever before. Under other circumstances, unstructured data would have been considered dead and of no value, but with big data, it can be collected and analysed. It gives the opportunity to discover data correlations and patterns that before would have remained hidden. This means that organisations have access to more accurate information.

2. Big data can help to provide new products and services. The most interesting use of big data analytics is to create new products and services for customers. Many companies have made a major investment in new service models for its industrial products using big data analytics.

3. The business has the potential to be more agile and make better decisions. Big data is not just a process of storing petabytes or exabytes of data. It is also about the ability to make better decisions and take actions at the right time through analysis and interpretation of that data.

4. It has the potential to create cost savings. Big data technologies like Hadoop and cloud-based analytics can provide substantial cost advantages. The problem with traditional relational database management systems is they are extremely cost prohibitive to scale to such a degree in order to process such massive volumes of data. However, Hadoop is designed as a scale-out architecture that can affordably store all of a company’s data for later use.

Disadvantages
1. Big data requires an increased number of security checkpoints. With more data located in and moving between more places than ever before, there are also a vastly increased number of ways to hack into that data.

2. Upfront management and analysis means a short-term loss of agility. Transaction, e-mail, analytical, etc. data is housed on multiple platforms. But if the data isn’t evaluated, organized, and stored properly, critical information can be either difficult or impossible to utilize. Therefor it takes more time to create infrastructure and manage the data to get the most out of it.

3. Only a few people have the necessary skills to use big data tools properly. Big data represents one tech area that's evolving rapidly. However, it's not typically taught in most universities and is learned of in reactionary form. That makes finding the right people all the more crucial.

Conclusion
At its heart, big data is about data, plain and simple, while IoT is about data, devices and connectivity. IoT and big data are reworking the relationships between people and information. Many new hardware and software technologies have been developed to bring field sensor information from the very edge of the process, to collect it in a distributed or centralized manner, and to curate it through databases and historians. Each of these data harvesting tasks is becoming more automated, which removes the elements of delay and error associated with manual readings and data entry. Improving and automating data collection, concentration, and curation enables end users to take full advantage of visualization and analysis software to make their operations more efficient.