Posts

Big Data Analytics

Author: Emilia Colonese What is Big Data Analytics?  Let's divide and conquer.. Big Data is defined as collections of datasets whose volume, velocity or variety is so large that it is difficult to store, manage, process and analyze the data using traditional databases and data processing tools. Analytics is the process of extracting and creating information from raw data by filtering, processing, categorizing, condensing and contextualizing the data. This information obtained is then organized and structured to infer knowledge about the system and/or its users, its environment, and its operations and progress towards its objectives, thus making the systems smarter and more efficient. Big Data Analytics focus on big data tools & frameworks, programming models, data management and implementation aspects of big data applications. I teach Big Data Analytics in the Data Science Specialization course at  Instituto Tecnológico de Aeronáutica (ITA) . The objective of this subjec...

Data Pipeline Framework

Image
Author: Emilia Colonese Figure 1: Data Pipeline Framework (by author). The figure above presents a data pipeline framework. It encompasses generic data processing phases used by Business Information (BI) Systems, either in traditional or big data context. Traditional Systems: They use structured data, relacional schema and databases.  Big Data Systems: They use structured, simi-structured and/or unstructured data; relational and/or non-relational schemas; SQL and/or NoSQL data storages, and databases servers .  These BI Systems uses data modeling process and tools for both, on premisse and cloud environments. The fundamentals for data modeling used are related to the data storage.  A traditional system uses relational data model or SQL model, either for OLTP (transational/operational) or OLAP (analytical) systems.  The data schema contains tables and their relationsips that will store data. On the other hand, a big data system uses non-relational data model or NoSQL ...