top of page

Slow Changing Dimensions
in a Data Warehouse Using Hive

This project was developed with the purpose of predicting in real time the turnover of customers of a Telecom Company providing metrics for the company to act more quickly in preventing customer losses and also to retain satisfied consumer

HiveHDFS.png
SCDs.png

A Slow Changing Dimension will be applied to a Data Warehouse using Hive that is running on Apache Hadoop, and for that to happen it will be necessary to make total configuration of a pseudo-distributed cluster to store and process the data.

The cluster will be built in a Linux environment through a Virtual Machine, that is, through the virtualization of the environment on my physical machine. So I also performed the entire configuration of the Virtual Machine preparing to receive Hadoop ecosystem.

With the Database in the Virtual Machine we started the business problem using Sqoop to load a table from the source Database into a datalake in HDFS (Hadoop Distributed File System).

Finally, I developed a sequence of codes so that Hive can make Slow Changing Dimension, that is, read, compare and add new information to the table, making changes to dimensions intelligently whenever they change.

.

bottom of page