Big Data – How Do We Get There?

by Colin Hart

January 31, 2017.

Many of our clients have begun to consider dipping their proverbial toes into the ‘Big Data’ pool. Recently Groundswell and some of our partners held a well-received one day workshop in both Calgary and Vancouver. Big data was, understandably, a very hot topic.  One question that was often asked was:

“We’re interested in this but how do we get there?”

The truth is that there isn’t a one-size-fits-all strategy as there are numerous factors that come into play for any one application. There are, however, strategies that we’ve employed in the past that have been particularly successful. In this blog I’ll review a recent implementation of our 4-step, iterative approach.

Background

It started as an interesting challenge: the enterprise had a huge amount of, primarily, unstructured data outside of the database but with a metadata catalog in a Relational Database Management System (RDBMS). This data was mission critical due to related regulatory requirements and was served to multiple clients multiple times a day. The data was growing at an exponential rate and, ultimately, the RDBMS began to experience scalability related performance issues to such an extent that the enterprise started to lose subscribers.

 
4 steps to implement a NoSQL ‘Big Data’ solution

1.Evaluate options

We didn’t approach this in a “when all you have is a hammer; every problem seems like a nail” fashion. We didn’t go straight to a NoSQL solution; instead, we were careful to evaluate RDBMS scaling options as well to prove that introducing a new technology to our environment would be beneficial.

Here is a simplified outcome of the evaluation for this implementation:

 

RDBMS NoSQL
Can solve current challenge Yes Yes
Hardware Proprietary, Expensive Commodity, Inexpensive
Future bottleneck potential Higher Low
Licensing fees Higher Lower
Scaling increments Scale up – Big increments, expensive Scale out – Small increments, cheap
Cloud solution No Yes
Distributed No Yes

 

With our recommendation in hand, it was decided that we would move to the next phase of proving out an example on a distributed, NoSQL model.

2.P.O.C.

The next step was to develop a Proof of Concept (POC) to verify that our conceptual application would work in real life. First, we selected a specific set of data to use for the POC. We wanted the smallest subset of data that was sufficient to give confidence in the solution. Then designed a data model (i.e. the relevant metadata elements we wanted to capture) and implemented this in a NoSQL Cluster in a hosted cloud service. This included three full environments for Dev (1 node), Test (1 node) and Prod (3 node).  This approach was completed quickly and was also very cost effective because:
A. The NoSQL licenses were free; fees are only being incurred once moved to Prod
B. We only had to focus on a fulfilling the requirements for the smaller subset
C. Fulfilling the smaller scope of a subset required less architectural and development work compared to an end-to-end solution

3.Deploy a Subset

‘Succeed fast’ – Once the environment was ready we wrote conversion scripts to get a small (but viable) subset of the data into the NoSQL architecture.  By keeping the API of the Data layer unchanged we didn’t have to spend any time adjusting the Business & Presentation layers.  With this approach, we saved significant time because we only had to re-write matching portions of the existing data layer to fit the new platform.

The first subset was tested and ready for release into Production in just one month.

4.Iterate

From this point we followed the same pattern, iterating over these steps:
A. Select a subset,
B. Move data & RDBMS metadata NoSQL
C. Rewrite relevant portions of data layer
D. Test
E. Deploy to Production
F. Goto A

In the end, it only took a few months to migrate all documents into the cloud and NoSQL and into Production. The end result performed significantly faster with drastically reduced client complaints & client attrition while also easily accommodating inexpensive scaling for the growing data demands of the business. It was so successful that the client is looking for other opportunities to leverage a cloud, NoSQL, ‘Big Data’ solution.