About
Information per se can be seen as anything measurable, a tangible fact about any substance or its actions. The process of recording and analysing of all these measurements in digital format allows us to understand and predict the observed matters, whether it is for a medical, scientific or business purposes, the aim of it, is to guide and guard our decision making.
Data engineering is at the center of information collection and processing. My (mainly engineering) values evolve from professional experiences and other widely adopted methodologies like Agile1 and DevOps2. The order of the values correlates size and complexity of an organisation to what its data solution should adhere to, with the exception of the last point.
-
Accuracy.
Value delivering analytics can only be build atop of accurate data, descriptive statistics are, this is for most cases the essential foundation for a decision making process. -
Security.
Data is the lifeblood of an organisation/system, it is ought be stored and processed securely. Engineers should understand concepts such as file Systems and Permissions, IAM’s, Networking fundamentals, secrets management, authentication protocols, encryption, and exercise best practises around them, in order to provide flexible and secure access to users and applications. -
Versioning.
Infrastructure to Analytics and Insights are best stored as code, whence versioning is a must, it enables teams to build more robust processes, review, collaborate and support ownership. -
Reproducibility.
Analytical systems, should just like other systems be fault tolerant, fully recoverable and reproducible on demand, if a disaster occurs then engineers should be able to recover and bring systems up to date within reasonable SLA’s without hindering the organisations' users and stakeholders. -
Transformation logic.
Business rules and requirements evolve over time, data engineers and analysts must be prepared to accommodate frequent change requests from stakeholders, hence data transformations should be designed with agility in mind, and logically resemble processes behind data and stakeholders. -
Orchestration3.
It starts to play a crucial role when the complexity of cloud services, environments and workloads grows and cannot be managed via consoles or UI’s. Successful organisations use IaC tools to coordinate and manage their computing resources. -
Continuous Deployment4.
When organisations and teams grow in size, the need for common practises and conventions arises. Incorporating test suits (continuous integration) and deployment processes (continuous delivery) in order to ship features as often as needed helps to maintain the integrity of teams and their product. Tests should ideally be present at every layer of an analytical product; infrastructure, data pipelines, transformations and business logic. -
Observability5.
Modern data solutions should just like other applications be monitored, send metrics, store logs and alert on errors and failures, so that the system owners and engineers are the first to discover and act upon them (&enjoy their sleep at night). -
Separation of computation and storage resources.
Once distributed file systems and processing frameworks are in use, a plug and play like scenario in which these decoupled technologies can be used interchangeably is a desired one and other important benefits like scalability, availability and cost. -
Heroism and egos are anachronisms.
Varying technical abilities and debts in teams are best taken advantage of only when an open knowledge sharing culture is present, code reviews, pull requests, pair coding sessions, and egoless programming6 should be a part of every healthy working environment.
You may ask yourself, why are the stakeholders left out? I see every aforementioned point as an essential part of delivery of an analytical product to them. Checkout my favourite Tools for the data journey.
I am a passionate and delivery-oriented data professional who enjoys deploying reliable and elegant solutions, following best practices, sharing knowledge and promoting product ownership.