Can Data Lake design enhance business capabilities by ironing out the data chaos?
Over the past year, one of our teams took on the challenge of supporting a client with a rather complex challenge. The company wished to better understand its data, given the industrial ecosystem in which it operates, and to figure out some ideal ways to extract value from those hundreds of gigabytes of data accumulated from its devices and platforms. The purpose of such an endeavour was to then build smart algorithms which could automate processes that still required human intervention, such as maintenance or device verification.
Supporting the purpose
To help out, what we offered was:
- Data Lake Design and Implementation - Designing a data lake that allowed our client to import data from all the platforms and databases. This led to faster analysis and increased speed in transforming it in reports directed to the business team.
- Statistical analysis and reporting - we provided our client with an analysis of the data they had acquired, giving them initial insights into the quality of the data and the initial value we can extract from our datasets.
- Intelligent algorithms and AI - once we were able to build our datasets and identify the most valuable data, we used it to train machine learning models to make our client's devices work smarter and capable of handling more scenarios automatically and with less human intervention.
Our general way of working is defined by a genuine desire to put our passion for technology into action and support clients in achieving their digital evolution goals. More precisely:
- We have a proactive attitude by always analysing the client's data and making suggestions for newer solutions.
- We understand the big picture and the real need of the client, not just the technical problem that must be solved.
- We keep the overall perspective and the real problem in mind and find the best solution, sometimes with improvements in areas they have not explicitly requested.
- We implement new techniques for cost reduction and operational efficiency.
We worked on every software development phase, starting with the development of the data lake infrastructure, ETL jobs, data processing scripts, as well as the development of the business-oriented reports, data analysis scripts and ending with the development of the statistical and machine learning models to improve the way our client's devices work.
What we have rapidly understood via hands-on experience is that, in order to ensure a sharp vision about our client's business need, each new team member should attend several knowledge-sharing sessions. This is what we apply in a lot of projects, and the sessions cover different topics, starting with more general ones (client's industry, data lake infrastructure, goals) and going further into topics of specific relevance to the new member's area of expertise. These discussions have led to results like reducing the ETL and data processing time by more than five times.
Due to the implementation of our Data Lake and ETL jobs, we have reduced the response time of getting answers to our clients from their data by over 10 times, also allowing a much easier connection to PowerBI reports, and faster attainment of updated insights.
Building on the data analysis, we have trained machine learning models and statistical models that allow our client to tune their devices more efficiently, whilst, at the same time, allowing those devices to handle more scenarios with less human intervention.
What are the Top 3 struggles when it comes to making data-driven decisions?
The entire project described above touches on a series of data-decision challenges that companies face nowadays, such as:
- A lack of long-term vision - If an organization doesn’t have a clear intended use for data lake and can’t elaborate on how it can benefit the business, the lack of long-term commitment can lead to poor business outcomes.
- Not leveraging integrated data management - To use data lake at its full potential, a plan for integrated data management must be in place to ensure the right governance model.
- An absence of the right technology match – If the technology stack and infrastructure are lagging, it can lead to poor integration and add complexity and costs when it comes to data management solutions.
Even though the landscape can be challenging, building and improving processes based on the information in the company’s devices, sensors, tools and platforms is worth on the long run. There are business drivers that can be supported by a data lake.
What are the Top 3 business drivers supported by a data lake?
- Complexity reduction and cost efficiency – Data environments become aligned and more consistent which reduce IT costs and effort and brings in more value.
- Self-service BI and new sources of Data-Smart reports are delivered fast and in a simple form that directly impacts business and operations in a positive way.
- Advanced Analytics to provide agility and flexibility – A new way to use data of all types both from internal and external sources to include in strategic management processes.