On Demand Business Intelligence reports for US based publication company.
California based online publication company whose got 1800 publishers, that generates digital data(logs) in terabytes. The acquisition, storage, access, and analysis of digital information provides knowledge and strategic insights to all higher management of each publishers. Each time users visit a website or make a transaction, the company has the ability to log everything we do; every link we click, each form we submit, what time we log in and out, any errors we encounter, and just about anything else we can imagine. Generating the reports & graphs which is to be OnDemand on this data is what required and that helps management to develop and incorporate new business strategies and direct planning for the future.
Existing system used to collect the logs, parse it, and gets stored in RDBMS. Various BI reports extracted from RDBMS data. With increase in business, it is the storage and access aspects (requisites for later BI analytics) that started imposing challenges on how to store this growing data, especially since data can accumulate to gigabytes and even terabytes in a relatively short period of time. Data management becomes a greater obstacle as more and more data needs to be collected and stored. As the size of data sets was getting increased, it was possible to run out of space on a single system.
We studied the existing data flow, and rate with which data was getting generated, came up with a system which can handle very large amounts of data and keep scaling to keep up with growth, and that it can provide the input/output operations per second (IOPS) necessary to deliver data to analytic tools. We chooses the Hadoop eco-system, which not only solved the scalability issue, but also made the system highly available.