Data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It is a subject-oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to take informed decisions in an organization but relational data volumes are increased day by day.
To overcome the challenges, distributed data warehouse system shares data across multiple data repositories for the purpose of Online Analytical Processing(OLAP). Each data warehouse may belong to one or more organizations. It performs load balancing and scalability. Metadata is replicated and centrally distributed.
Apache Tajo is a distributed data warehouse system which uses Hadoop Distributed File System (HDFS) as the storage layer and has its own query execution engine instead of MapReduce framework.
Content this book include 22 chapter guide to learn Apache Tajo Step by step.