Forte Business Solutions

Big Data – Concept & Terminology

By: Shahzad Ali, CEO – Forte Business Solutions

With the proliferation of Big Data and the expansion of its use, there is confusion between some of the amateur or students between data, information, knowledge and Big Data.

As Big Data is becoming more widely used these days and a target for many specialists! In this topic you will find an introduction to some concepts and definitions of some of the relative terms that must be well understood before diving into the sea of Big Data.

Data, Information and Knowledge:

What is data, what is information, what is knowledge and what is the difference between them?

Data is a description of a particular thing or event in numbers, words, or symbols that can be grouped through a program or through paper or electronic forms that are packaged in different ways. The simplest example of data is that you have; customer data or patient record data, employee data, or someone’s Twitter tweets …. These are all data! Usually stored in a given context and their size usually grows rapidly over time.

Having data like this will be of little use except that you have a huge archive that you can retrieve later but cannot be base for any decision or read through any indicators!

In order to benefit from this data, it must be converted to information.

What is information:

If the data is studied and analyzed and logical relations are established between the tables of this data, then the information will be obtained. Therefore, we find that the data is the raw material for the information. How?

For example, you have a spreadsheet that contains the patient’s basic data (patient’s number, name, gender, date of birth, address, contact information, etc.) and another table of patient visits (patient number, date of visit, condition, the name of the treating physician, the medicines etc.)

Here all that is mentioned is data distributed in tables –

With simple contemplation of these spreadsheets, after linking the two tables (patient table + patient visit record table) and conducting some statistics you will be able to obtain the following information:

·       The number and percentage of cases received by the hospital within a period of time

·       The most common diseases according to:

* time period       * age group * geographic region * gender

·       Relationship of symptoms to diseases

·       Most effective medicine for treating every disease

·       Most efficient doctors

·       Most doctors visited

·       And much more ….

The above six items are so-called information. As a researcher or doctor, you can make your decisions based on this information. If the data tables remain the same, you will not be able to make any decision based on that data.

Knowledge:

Knowledge is the best use of information by combining it with the accumulated experience of the reader. Information is a tool that support the decision-maker to take the lead in his organization, but experience remains an important input to, how to use this information and transform it into knowledge that can be later transcribed.

From the above, we find that data is a key input and very important for the health of the information and knowledge that is built on this data.

Back to massive data … What is Big Data?

We deal daily with a huge amount of data, imagine with me, the volume of data you read from social networking sites, from e-mail, news, mobile messages and whatsapp, also imagine the data collected from GPS systems, the Internet of Things (IoT) that collect huge amounts of Data from sensors, etc.

According to statistics published by Forbes Middle East, by 2020 there will be more than 31 billion devices connected to the Internet (imagine the amount of information generated from these devices). The statistic also reported that the share of each individual data is 1.7 megabytes per second and the data volume on the network is 4.4 zettabyte (4,400,000,000,000 gigabytes)!

These data are not in the same context. Some of them are text, some are tables, some are files, etc. There is a need to find a mechanism to deal with all these types of data and to extract information in a comprehensive manner.

The primary objective of dealing with these massive data is to study past and present and predict the future.

There are many techniques that help to aggregate this data in its raw form, including the MQTT protocol, which is a specialized protocol to monitor this data and is used to send and receive, and is a much faster alternative than other protocols such as HTTP. Cloud computing is very important in dealing with massive data, as it is very difficult and expensive to provide a fast-local environment that responds to the monitoring of the vast amount of data flowing.

One of the most important features of the Big Data is that it is not structured data meaning that it does not impose any particular format of the data. In the Big Data the sources are different and each sourced data is also different The Big Data is usually distributed over more than one place unlike the normal data that are usually stored in one server.

Dealing with all those distributed data as if they were in one place, this is the concept of Distributed Data Processing.

One of the most important techniques used in storing Big Data is the Document-based Database. The most widely used technique is the JavaScript Object Nation (JSON), which can be translated as document-based data (unlike regular SQL data based on structures, tables and relationships).

Visualization of Big Data

As long as the data is present, the tools that present the data in a way that commensurate with the needs of the organization must be presented in a way that supports the decision makers; visually.

Some of the most popular Big Data visualization applications are:

·       PowerBI

·       Tableau

·       JupyteR

·       QlikView

Get in touch with Us

    Forte TechnologiesDubai, United Arab Emirates.

    +971 4 346 5555

    info@forte.tech

    www.forte.tech