Unstructured and semi-structured data types typically don’t fit well in traditional data warehouses that are based on relational databases oriented to structured data sets. Further, data warehouses may not be able to handle the processing demands posed by sets of big data that need to be updated frequently or even continually, as in the case of real-time data on stock trading, the online activities of website visitors or the performance of mobile applications.
As a result, many of the organizations that collect, process and analyze big data turn to NoSQL databases, as well as Hadoop and its companion data analytics tools, including:
In some cases, Hadoop clusters and NoSQL systems are used primarily as landing pads and staging areas for data before it gets loaded into a data warehouse or analytical database for analysis — usually in a summarized form that is more conducive to relational structures.
More frequently, however, big data analytics users are adopting the concept of a Hadoop data lake that serves as the primary repository for incoming streams of raw data. In such architectures, data can be analyzed directly in a Hadoop cluster or run through a processing engine like Spark. As in data warehousing, sound data management is a crucial first step in the big data analytics process. Data being stored in the HDFS must be organized, configured and partitioned properly to get good performance out of both extract, transform and load (ETL) integration jobs and analytical queries.
Once the data is ready, it can be analyzed with the software commonly used for advanced analytics processes. That includes tools for:
Text mining and statistical analysis software can also play a role in the big data analytics process, as can mainstream business intelligence software and data visualization tools. For both ETL and analytics applications, queries can be written in MapReduce, with programming languages such as R, Python, Scala, and SQL, the standard languages for relational databases that are supported via SQL-on-Hadoop technologies.
We can help you build the solution from Collect to React. Contact us now