Postgres Role Based Data Access Policy Design
Outline
User management and Data access is a reasonably popular topic in the current advent of data and GDPR like policies. Many companies in the earlier (and even later) stages of data maturity choose Postges to be their Data Warehousing database. Now, to add some complexity, let’s assume that our company has multiple departments (analysts taking care of their respective business subjects) and operates in multiple countries or regions.
A typical scenario would involve some employees building data pipelines (Data engineers) from different sources, and offering these data to their stakeholders for analytical purposes. From the RDBMs objects perspective these are likely to be Tables, Views and Materialised Views stored across different schemas and available to be queried. This loose guide suggests a strategy which builds atop the available fine grained Postgres permissions and shows how to achieve a flexible access strategies ranging from data sets being accessible to all to strictly only accessible to the selected users.
Postgres roles and permissions contain all the atomic pieces needed to design data access policies which can satisfy any scenario
Contents:
Predesigned policies
From an administrative perspective, we should starting at the top, we’ll always need Superusers, to carry out DBA tasks, henceforth Admins. From a data ownership perspective, we’ll need users who you’ll own and integrate data sources themselves as well as including creating new data sources (such as reports) from an already existing one’s, henceforth Owners. And lastly the type of users who’ll only be engaged in consuming the data, henceforward Readers (or Read only).
Role type | DDL | DML |
---|---|---|
Admin | All | All |
Owner | CREATE/DROP (Table, View, MatView, Sequence, Index) | SELECT, INSERT, UPDATE, DELETE |
Reader | - | SELECT |
The most commonly applied access strategies are combinations of RBAC (Role-Based Access Control) and ABAC (Attribute-Based Access Control). The majority of this post focuses on RBAC and the end shows a simple implementation example of ABAC via row security.
Role inheritance
Role inheritance is the basis for flexible permission sharing.
GRANT group_role_ro TO analyst_1;
AND REVOVE group_role_ro FROM analyst_1;
I’ll the terms role and group interchangeably, although there is no real distinction between, we can simply perceive
a role as a placeholder for privileges.
The main assumption here is going to be the use of namespaces (schemas SELECT * FROM pg_namespace
) and generic access roles assigned to them.
In other words, as many different data sources (either by their origin or other logical grouping) as many schemas there will be.
For instance, the marketing department, they are likely to request channel attribution, adcosts and other data sets to be stored.
In order
If we wish to inspect the each grant
SELECT * FROM information_schema.role_table_grants WHERE table_schema = 'marketing'
The INHERITS role attribute is the basic building block here, it allows us to outsource the object (table) ownership (or it access privileges) to another role (you can also think of it as the deprecated group context) and share that with other roles (users in our case).
Let’s assume that our table to be marketing.order_tracking
containing some standard fields (channelGrouping, source, medium, campaign, keyword, transactionId, deviceCategory
).
Now it’s we there are two options, since each schema should posses two roles (Owner & Reader), we’re going to need two roles to cater for this.
The generic naming convention pattern can resemble <schema>_<role_type>
the owner role => marketing_group and the reader role => marketing_group_ro (_ro suffix for read only Reader role).
The GRANT command,
When we need to restore a database, all grants will simply be restored from a backup. To find out which members, to find
These clauses determine whether a role “inherits” the privileges of roles it is a member of. A role with the INHERIT attribute can automatically use whatever database privileges have been granted to all roles it is directly or indirectly a member of. Without INHERIT, membership in another role only grants the ability to SET ROLE to that other role; the privileges of the other role are only available after having done so. If not specified, INHERIT is the default.
Read only and editor type of roles. Admin, owner and read only roles permission separation. General object Owner granting guidelines
Grantable roles
Naming conventions Row access policies proposal Other functions Other Users/Roles
Here are the guidelines for lifecycle of users in the operational Postgresql data-warehouse. Here we mainly follow user management guide recommended by Postgresql documentation as well as some best practices from several nines and 2nd-quadrant. To manage the growing number of DWH users from different departments we use the concept group roles. Group roles evolve around the idea of collecting different privileges on different database objects based on the needs of the given group. These administrative actions can be performed by dwh superusers only. The snippets of SQL found here are versioned in this GitHub repo.
General user lifecycle evolves around user on-boarding, the management of his data access rights, and if the user is creating other data in a form of tables, views, or materialised views, it is his responsibility to further manage access to these objects by granting the privileges on other users. The implementation of functions designed to simplify the user on-boarding (access credentials creation), data access and off-boarding is discussed across the document.
To request a new user access please create a ticket @RDMBS board (TEMPLATE LINKs), stating which schemas/namespaces will you need to access and whether you will be publishing tables/views under some of these schemas or not, for access to personally identifiable attributes in some read only schemas tag your direct manager (auditing practise).
Read only and editor type of roles. Schemas in current DWH (Postgres) serve mainly the purpose of logical data separation (namespaces as schemas), hence this can also serve logical data access. The proposition here is to create primary roles in schema to hold default permissions on all object in a given schema. These two types of users are a common use pattern. An analyst might often only need a read (SELECT) privilege to perform analysis or exploration of available data, as reporting queries are published for instance in Tableau, and reporting needs grow, the ability to create tables, views and materialised views and run DML or functions in order to populate reporting schemas is essential. (Currently analysts who publish tableau sources, BR payops department, and others).
Admin, owner and read only roles permission separation.
Admin SUPERUSER,CREATEDB,CREATEROLE, ALL DDL + DML All
Owner DELETE,INSERT,REFERENCES,SELECT,TRIGGER,TRUNCATE,UPDATE,SELECT
on other objects
Read only SELECT, REFRESH MATERIALIZED VIEW
None
Admin is responsible for user and schema creation, admin’s role has the rolsuper attribute (superuser), which means all access data security features are by bypassed. Privileges of administrative type such as SUPERUSER, CREATEDB, and CREATEROLE are only reserved for admins.
Owner roles collectively share the ownerships of objects in reporting schemas. This includes analysts who publish their reports and data engineers whose ETL processes populate schemas from third party data sources (although this is not longer the general ETL strategy see #DATA-IN squad). To give enough flexibility to users in for example reporting schemas, users should be able to perform all statements such as CREATE/DROP objects, perform DML statements and GRANT privileges to other users/group on objects they are owners of (these are namely DELETE,INSERT,REFERENCES,SELECT,TRIGGER,TRUNCATE,UPDATE), this is their cascading responsibility.
Read only roles simply fulfil the purpose of digesting the data, either for creation of other reports or retrieval of the data for analysis, dashboards etc. For instance the tableau user (Application credentials) in our case only holds other read only and SELECT on table GRANTs.
General object Owner granting guidelines To demonstrate the most common use cases we’ll the example of reports schema. For creating/asserting the basic set of privileges for schema see functions db_utils.assert_schema_privileges(, <read_only>)
Tables
An analyst who wishes to publish a table under the reports schema must be the member of a reports_group role, should the report be visible to all users who are members of reports_group_ro, the publisher should include GRANT SELECT ON reports. TO reports_group_ro; ATLER TALBE reports. OWNER TO reports_group.
Views
If an analyst is let’s say a member of multiple read only roles (public_group_ro, referrals_group_ro), and she wishes to publish a view which depends on read permissions from multiple schemas, the SECURITY DEFINER clause should be added to the DDL statement, should she want to expose such data to someone who is read only user and has the reports_group_ro role membership.
Otherwise changing the ownership to the schema’s group role releases the control over it to all who own schema and granting a select on the view to a corresponding schema read only role makes the report accessible to all.
Materialized Views
Postgres documentation states that a user must be the owner of a MatView in order to execute REFRESH MATERIALIZED VIEW <mat-view-name>;
This leaves the data refresh to the owner, and since other users (including script executors/schedulers) are likely to have to refresh it too, we have to overcome this limitation (or rather a security feature)
via another UDF.
In order to refresh materialized views when needed (scheduled jobs) or by other users who are not the owner of the object the db_utils.refresh_matview(matviewschema VARCHAR, matviewname VARCHAR)
function can be used.
CREATE OR REPLACE FUNCTION db_utils.refresh_matview(matviewschema VARCHAR, matviewname VARCHAR)
RETURNS VOID AS
$$
BEGIN
RAISE NOTICE 'REFRESHING %.% ',matviewschema, matviewname;
EXECUTE 'REFRESH MATERIALIZED VIEW ' || matviewschema ||'.'|| matviewname || ';';
END
$$
LANGUAGE plpgsql
SECURITY DEFINER;
-- Usage
SELECT db_utils.refresh_matview('matviewschema', 'matviewname');
The most important important piece of that stored function is the SECURITY DEFINER;
clause, the definer of the function
must be our Admin role with the ROLSUPER
attribute in order to effectively bypass it
The EXECUTE
on a function has to be explicitly granted to them. The SELECT
permissions are identical to the above Views, so they have to granted separately, either allowing the reports_group_ro to SELECT from the view, or alternatively just the users who were granted the select individually.
See GRANT
command for definition of access privileges.
The role membership describes the basic mechanism of permission inheritance. The role attribute rolinherit is by default set to INHERIT, meaning a role can be granted all permissions of other role, this allows for indirect role membership, meaning if a group role (of schema br_payops) br_payops_group has been granted all the permissions of a group every user belonging to the br_payops_group has also all the permissions which users of qa_group_ro have.
Group roles group object privileges so no direct privileges are granted to users (although they may be, if that is the desired effect, performed by schema owner roles).
Our current Postgres instance currently contains ~ 40 different schema (could see them as different namespaces), out which some are being replicated from different instances (public, poslite, plugger, argo). Here we only wish to perform reads. Other schemas (ods, olap, data_mart, br_data_mart, salesforce, external) serve analytical purposes and DML statements have to be performed. A read only role for a schema is denoted as _group_ro such as (_public_group_ro or data_mart_group_ro) and owner role contain only the group suffix such as (olap_group, reports_group, ods_group)
User on-boarding The administrative operations below should ideally be performed using the default postgresql client psql corresponding to major version of the dwh instance currently in use.
To provide consistency across company and systems dwh user names should mirror employee’s email addresses local part (everything before @) where dots are replaced by underscores, i.e john.doe@example.com results in john_doe username.
From the process perspective we are concerned with user on-boarding and off-boarding. For the DWH and BI teams it is assumed that a user will have to be created. For other teams such QA and dev, if access is needed
To create a new user:
CREATE USER <username> WITH ENCRYPTED PASSWORD '<*******random*ascii*chars*****>';
GRANT CONNECT ON DATABASE <database> TO <username>;
GRANT <group_role> TO <username>
When Roles and Users are created the INHERIT role attribute is the default behaviour. Each member of a group is granted all privileges.
Assertion of read only and owner roles privileges:
CREATE OR REPLACE FUNCTION db_utils.assert_schema_privileges(schema VARCHAR, read_only BOOLEAN DEFAULT TRUE)
RETURNS VOID AS
$$
DECLARE schema_group VARCHAR;
DECLARE schema_group_ro VARCHAR;
BEGIN
-- PREPARE schema ROLEs
schema_group := schema || '_group';
schema_group_ro := schema || '_group_ro';
-- CHECK IF schema EXISTS
IF (SELECT nspname FROM pg_namespace WHERE nspname = quote_ident(schema)) IS NULL
THEN
RAISE EXCEPTION 'SCHEMA % DOESN''T EXIST!',schema;
END IF;
-- CHECK IF SCHEMA read only ROLE EXISTS, if not create it
IF (SELECT rolname FROM pg_roles WHERE rolname = quote_ident(schema_group_ro)) IS NULL
THEN
RAISE NOTICE 'CREATING read only ROLE: %',schema_group_ro;
EXECUTE 'CREATE ROLE ' || schema_group_ro || ';';
EXECUTE 'GRANT USAGE ON SCHEMA ' || schema || ' TO ' || schema_group_ro || ';';
END IF;
-- read only GRANT's statements for existing AND DEFAULT PRIVILEGES for future objects
EXECUTE 'GRANT SELECT ON ALL TABLES IN SCHEMA ' || schema || ' TO ' || schema_group_ro;
EXECUTE 'GRANT SELECT ON ALL SEQUENCES IN SCHEMA ' || schema || ' TO ' || schema_group_ro;
EXECUTE 'ALTER DEFAULT PRIVILEGES FOR ROLE ' || schema_group_ro|| ' IN SCHEMA ' || schema || ' GRANT SELECT ON TABLES TO ' || schema_group_ro;
EXECUTE 'ALTER DEFAULT PRIVILEGES FOR ROLE ' || schema_group_ro|| ' IN SCHEMA ' || schema || ' GRANT SELECT ON SEQUENCES TO ' || schema_group_ro;
-- schema editor/owner GRANT's statements for existing AND DEFAULT PRIVILEGES for future objects
IF read_only IS NOT TRUE
-- CHECK IF SCHEMA owner ROLE EXISTS
THEN
IF (SELECT rolname FROM pg_roles WHERE rolname = quote_ident(schema_group)) IS NULL
THEN
RAISE NOTICE 'CREATING editor ROLE: %',schema_group;
EXECUTE 'CREATE ROLE ' || schema_group || ';';
EXECUTE 'GRANT USAGE ON SCHEMA ' || schema || ' TO ' || schema_group || ';';
END IF;
RAISE NOTICE 'CREATED editor ROLE: %',schema_group;
EXECUTE 'GRANT ALL PRIVILEGES ON SCHEMA ' || schema || ' TO ' || schema_group;
EXECUTE 'GRANT ALL ON ALL SEQUENCES IN SCHEMA ' || schema || ' TO ' || schema_group;
EXECUTE 'GRANT ALL ON ALL TABLES IN SCHEMA ' || schema || ' TO ' || schema_group;
EXECUTE 'ALTER DEFAULT PRIVILEGES FOR ROLE ' || schema_group|| ' IN SCHEMA ' || schema || ' GRANT ALL ON TABLES TO ' || schema_group;
EXECUTE 'ALTER DEFAULT PRIVILEGES FOR ROLE ' || schema_group|| ' IN SCHEMA ' || schema || ' GRANT ALL PRIVILEGES ON SEQUENCES TO ' || schema_group;
END IF;
END
$$
LANGUAGE plpgsql;
Naming conventions The group role naming convention should follow a set of simple rules. First part of the role name has to start with schema name followed by _group suffix in case of the main’s schema object owner group and for the readonly _group_ro, Example for . the reports schema would look like reports_group and reports_group_ro. User’s created in order to provide a programmatic access (let’s say to another application) should be appended with clear suffix (proposition _app) in order to have a clear distinction between personal accounts and those serving a programmatic access.
Row access policies proposal The proposal here is to keep the groups more granular(schema oriented) and provide more custom access rights to individuals, let’s say that a BI manager from Europe needs access to accounting to accounting schema or other data sources, then he is simply granted these groups.
– HERE provide an example of country_id based row access policy, and how such policy is applied to a user.
The Data Warehouse Developer and Data Engineer roles The data-warehouse team collectively shares ownerships of all objects in the database and hence should have the ability to perform all DDL and DML statements across the schema’s. The proposition is to maintain a dwh_group role which should inherit permissions from all other schema roles so that members of the team can perform tasks interchangeably, or in other words having sharing ownership through _group or _group_ro in all schemas.
To review all group roles a user or a role is member of, the procedure below can be used.
To inspect group memberships
CREATE OR REPLACE FUNCTION db_utils.role_memberships()
RETURNS TABLE (
rolname VARCHAR,
member_of TEXT[]
) AS
$$
BEGIN
RETURN QUERY
SELECT r.rolname::VARCHAR,
ARRAY(SELECT b.rolname
FROM pg_auth_members m
JOIN pg_roles b ON (m.roleid = b.oid)
WHERE m.member = r.oid)::TEXT[]
AS member_of
FROM pg_roles r;
END;
$$
LANGUAGE plpgsql;
-- Usage
SELECT username,member_of FROM db_utils.all_role_memberships();
Reassignment of database objects owned by users/group roles which are to be discarded should always assign the ownership of these objects to trusted roles such as superusers or the initial postgres user: Note REASSIGN cannot across different databases, as of now that is not an issue (only sumup_dwh), and has to be run for each database separately.
Other functions Renaming of users should also be standardised, the main purpose here is to provide consistency for moving all privileges between old and new user names
(A case of an operations analyst per different region, such as Latin America and Europe). Other Users/Roles Other users whether applications/scripts or postgresql internals should be kept track of.
Row security was added in 11, in advent of GDPR.
It is a powerful feature, which let’s say enables you to bake the required security with your data. I’d like to put it to a contrast of a classic scenario.
The multi region analyst teams mentioned earlier, let’s assume we two regions, AsiaPacific (AP) and Latin America (LA). The traditional approach
would involve a creation of Views, which would be differentiated via WHERE
clauses and the View name would likely be suffixed with its region (or other defining attribute).
Now with the growing number of regions the number of View definitions grows too, of course we could come up with a templated solution in order to keep drier.
So I leave the decision to the cook, although my north star has decided for me, nevertheless one thing has to be given to the Postgres itself, it has the flexibility to allow developers to mix their own cocktail of access policies and satisfy the legal requirements present in business and product.