top of page
Log Processing from NASA (Space Agency) with Apache Flume and Hbase
This project was developed with the aim of processing, in a Hadoop cluster, log data from requests made to NASA Kennedy Space Center servers located in Florida, United States.
For the project a cluster was built in a Linux environment through a Virtual Machine, that is, through the virtualization of the environment on my physical machine. So, I also performed the entire configuration of the Virtual Machine preparing to receive the Hadoop ecosystem.
To work with the large amount of logs from the NASA server I used Apache Flume. This Hadoop ecosystem framework is a free and reliable log management service for handling, aggregating and collecting large amounts of data by moving this information to Apache Hadoop HDFS according to its configuration specifications.
With Apache HBase I created a table to receive the logs directly from Flume. This part of the project was challenging and at the same time interesting because it showed the power of both tools since from a set of codes it was possible to partition the log file into different columns improving the management and future analysis processes.
Download Full Project
bottom of page