AWS Athena QuickDive

Outline Athena is a managed Prestodb offering from aws, it primarily serves the purpose of query execution/data exploration against Hive DDL predefined databases (namespaces). In order to quick-dive to Athena we’ll use terraform to provision the necessary resources needed for a basic functional setup, then we’ll process and convert a dimensional like data set from csv to parquet format, create & apply DDL statements on top of our data, run some analytical queries and and finally look at further recommendations and other common use cases.
Read more →

Postgres Role Based Data Access Policy Design

Outline User management and Data access is a reasonably popular topic in the current advent of data and GDPR like policies. Many companies in the earlier (and even later) stages of data maturity choose Postges to be their Data Warehousing database. Now, to add some complexity, let’s assume that our company has multiple departments (analysts taking care of their respective business subjects) and operates in multiple countries or regions.
Read more →

Reporting DB User Lifecycle Management with Postgres

Outline Postgres is traditionally deployed as an application’s backend RDBMS solution serving the OLTP workloads, and also it is not uncommon to find it used as a reporting/analytical database, serving the OLAP workloads. The aim here is to illustrate the most common tasks performed when administering such a database server (or cluster in postgres jargon) with good practices in mind. We are not concerned with the users authentication process, that is a separate topic in itself, the assumption held here is that users are not connecting to a publicly exposed database hostname, but still are required to submit their password, as an extra security feature.
Read more →

Good Reads

On engineering ladders framework Great collection of resources @bmoeskau’s repo On basic code writing https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/ On Computer Science Basics https://github.com/aspittel/coding-cheat-sheets On streaming: https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/ https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102/ On Data platform enigineering http://shop.oreilly.com/product/0636920032175.do Publicly accessible datasets: https://registry.opendata.aws/ On DevOps: https://www.terraform-best-practices.com/ On AWS: http://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_testing-policies.html https://github.com/doingcloudright On Application Networking: https://medium.com/containers-on-aws/using-aws-application-load-balancer-and-network-load-balancer-with-ec2-container-service-d0cb0b1d5ae5 On GCP: https://robmorgan.id.au/posts/deploy-a-serverless-cicd-pipeline-on-gcp-using-cloud-run-and-terraform/ On CI/CD: https://circleci.
Read more →