Skip to main content

Big Data: A Brief Historical Description

Big data is a voluminous dataset from a variety of sources and of varying types that is continuously generated at a high velocity. It can only be managed by specialized processes different from the traditional relational database processing tools in order to infer any relevance or insight from it and to establish the veracity of the data.

In other words, big data is a huge collection of data from different sources and of varying types that are growing at an exponential rate with time with no traditional means of storing or processing it but are valuable for insight.

The description of big data above stems from the historical use of the term up to the recent definition of parameters (i.e. V's) that are being used to capture the various aspects of this sort of dataset. My history consideration for my description starts from various uses of the term between 1997 and 1998 which are similar to the current usage to mean information explosion starting with Michael Cox and David Ellsworth in 1997, John Mashey in 1998, and Francis Diebold in 2000.

Also, the description considers the volume of the data, the variety of the data, and the velocity of the data which are credited to Doug Laney from Gartner's early attempt to describe the term big data. Other considerations in my description of big data are the value that can be inferred from the data as well as the veracity of the data. The value from the veracity of the data could provide the benefits of product development, predictive maintenance, fraud and compliance, operation efficiency and drive innovations in general.

Big data could be structured, semi-structured, or unstructured. Even though it is noted that above that big data requires non-traditional processing, the data could still be structured like traditional relational data. This data is referred to as structured data. This is data with a fixed format. The semi-structured big data has a structured form but it is not fixed or defined. The last type is unstructured big data which has an unknown form or definition.

In conclusion, examples of real-time big data are data from the Internet of Things like Alexa, home security cameras, social media site interaction, etc., for which big data is used to provide real-time recommendations based on current views, and news recommendation sites. Another example is weather prediction from continuous data from weather sensors and atmospheric information, health information from wearables, live transport information, and fuel management from autonomous vehicles.


Reference and Further Reading

Comments

Popular posts from this blog

Fix HTTP error code 513 on Wildfly

Description :  Many users have reported a network issue where they see a large number of connections in TIME_WAIT and IDLE states. This indicates that the connections are not being closed properly by the server or the client.  When I analyzed the network traffic, I found that the server was sending HTTP Error 513 to the client, which means that the server is overloaded and cannot handle more requests. The client was also logging a socket close event, which means that the client was terminating the connection. This can be simulated using Jmeter. The HTTP error code 153 resulted from the max concurrent connection limit reached, and the allowed queue was also full. This issue can cause performance degradation and resource wastage on both the server and the client. To resolve this issue, we recommend the following steps: Resolution :  1. Investigate the process of holding onto the connection longer than necessary. 2. Increase the server capacity or scale up the server to handle mo

Wildfly EJB Remote Client Exceptions.......

Below are exceptions and the solutions I applied while developing a remote client enterprise application. To start with, enable debug for your application server to see the detail and good luck troubleshooting. Exceptions : Starting with  java.lang.ClassNotFoundException: org.hibernate.collection.internal.Persistent......... Solution : I fixed the exception be adding hibernate Entity Manager Dependencies. Exception: Error javax.naming.NoInitialContextException: Need to specify class name in environment or system property, or as an applet parameter, or in an application resource file:  java.naming.factory.initial Solution : Add JBOSS-CLIENT Dependencies to your project Exception : IllegalStateException: EJBCLIENT000025: No EJB receiver available for handling Solution : Confirm that your EJB module have been deployed. Exception : org.jboss.naming.remote.client.initialcontextfactory wildfly     javax.naming.NamingException: WFLYNAM0027: Failed instantiate Initia