Big Data – Publication

search analytics copy
The client was facing issues in storing the growing data generated, especially since data can accumulate to gigabytes and even terabytes in a relatively short period.
Client Overview
The client is a California-based publication house having 1800 publishers that generate digital data logs in terabytes. They generate on-demand reports or graphs with the help of data collected to develop and incorporate new business strategies and direct planning for the future.
Agile Soft Systems provided the Big Data empowered solution for the client. To help them solve the scalability issue and make their system highly available.
The system helps the client to handle the large amount of data and keep scaling the storage as per the business growth.

Project Details

Challenges They were Facing!

Business Challenges

The client collects the data in the form of logs w.r.t. different transactions. The acquisition, storage, access, and analysis of digital information provide knowledge and strategic insights to all higher management of each publisher. Each time users visit a website or make a transaction, the company can log everything they do; every link they click, each form they submit, what time they log in and out, any errors we encounter, and just about anything else they can imagine. With the increase in the business data, the client was facing processing it in their existing system.

Technical Challenges

The existing system client collects the logs, parses them, and gets stored in RDBMS. Various BI reports were extracted from RDBMS data. With the increase in business, it is the storage and access aspects (requisites for later BI analytics) that started imposing challenges on how to store this growing data. Data management becomes a greater obstacle as more and more data needs to be collected and stored. As the size of data sets was getting increased, it was possible to run out of space on a single system.

What We Delivered!

Experts at AgileSoft Systems studied the existing data flow and the rate getting generated on a regular basis. Then we came up with a system that can handle a larger amount of data and keep scaling to keep up with growth and that can provide the input/output operations per second (IOPS) necessary to deliver data to analytic tools. We chose the Hadoop ecosystem, which not only solved the scalability issue but also made the system highly available.