Data Engineering at Lingokids
Lingokids is a data-driven company, and pretty serious at that. As such, it has a strong, young and modern Data Engineering team, which is taken as a Software Engineering specialization. Because of this, we strive on automating as much as possible our day-to-day work, focusing on bringing value to the company fast and reliably, on each batch process. That’s why we apply heavily principles in monitoring, testing our code, our data quality and deploying everything into production as flawlessly as possible.
Data Engineering is tightly integrated with the rest of the company. It plays a crucial role for the Data Science team, by enabling the data that is buried deep within the App and creating an environment where they can merge with other sources, in order to provide insights and analytics with high precision. We call this environment, The Data Platform.
Lingokids current Data Platform
Lingokids’ current Data Platform is our own take on the Business Intelligence Architecture (1) (2). It is organized and planned around the Data Warehouse, in our case Redshift. We rely heavily on dbt to execute our transformations, which helps us ensure that our processing engine delivers the data as intended and is accessible to our Data Scientists. By doing this, we let them play and fiddle with the data in a safe environment while letting us concentrate on what Data Engineering is really for: ensure the system’s reliability and enable as much data as possible to the company.
To the right of the above diagram, we can see that our main sources are the operational data and whatever external sources our teams use. This all falls into the Data Lake, which is currently our staging area for the data. All our extractions are made using Python code, either using micro-services supported by AWS Fargate, or Apache Spark powered by AWS Glue.
To the left, we have our BI Tools. Dashboards and reports generated by our Data Scientists, let the company understand how everything is really working.
The future of our Data Platform
In spite of having a very modern architecture and technological stack, Lingokids is evolving and growing. And because of this, its needs are also changing. We are currently working on the evolution of this architecture: The Lakehouse Architecture (1). We are moving away from a Data Warehouse centric model, to a Data Lake centric one. With this we aim to make our platform cheaper, more flexible and more reliable. Allowing us to incorporate new data sources faster, enabling data arctifacts to Data Scientists sooner and ensuring our data quality all across the board
As is visible from the above diagram, the architecture becomes more complex, but so its operational capabilities. And in order to do this, we must look into the newest technologies in the market, and venture applying the latest best practices to ensure the success of this endeavour.
The bottom-line
As it hopefully perspired in the beginning, working in the Data Engineering team at Lingokids is exciting. We are working with most up-to-date tech stack, but we are not staying still. We want to improve and be better. We want to make our Data Platform the best there is for this company, while still being at the spear tip of the technologies there are out there and applying the best practices that a Software Engineer can apply. After all, that’s what we are, a passionate group of Software Engineers with a knack for data.