How to deal with Dark and Dirty data in a legacy migration project

August 28, 2019

The amount of data we create every day is mind-blowing.
According to a Forbes report of May 2018, at that time the daily amount of data was 2.5 quintillion bytes of data, and growing every day. It is expected that the total amount of data generated will reach the staggering amount of 44 zettabytes by 2020. That is 40 times the total number of stars in the part of the universe that we are able to observe from earth.

A large part of the data generated is stored by businesses and organizations, presenting them with quite some Big Data challenges. Although a lot of the data gathered by these organizations is used for business improvement, analysis or marketing purposes, there’s still a huge amount of stored data with no real purpose. This data is called Dark Data. Gartner’s definition of Dark Data is: “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.”

There’s a wide range of data that could be defined as Dark Data, such as all outdated or unstructured data, for example: log files, customer data, account information, financial data, email, research data, presentations, etc.

So what?

Why should we worry about this Dark Data? There are 3 important issues. First of all, there is the issue of space. This large amount of ‘unused’ data takes up an equal amount of space. Expensive space. The second issue is a security issue. Although this dark data has no purpose in the present, there is a high probability that it contains sensitive proprietary information that could potentially leak.

Last but not least, dark data may contain hidden treasures. Next to all the ‘useless’ stuff this unattended data most likely also contains valuable business logic. But how do you find these hidden treasures in such a giant amount of unorganized legacy data?

Legacy transformation – Moving the dark into the light

In the many large legacy transformation projects that we’ve done, we’ve worked with giant amounts of data, among which a lot of dark and dirty data. In our experience it is no use trying to sort out and clean data before starting the migration process. It is a time-consuming and cumbersome task that you do not want to waste your resources on.

We strongly believe in a like for like migration approach, migrating ALL data to the new environment. Here you will have access to a whole world of new technology, integrations and methodologies that will give you the opportunity to sort, clean and analyze the migrated data in an easier and cost-effective way.

If you are interested in our migration solutions, feel free to contact us.