Greenplum Objects Metadata Queries to get table sizes ,statistics and its Lineage

I would like to cover how we can use combination of Greenplum user defined and built-in functions to create dashboard that can provide insights into the periodic Growth(Daily, Weekly, Monthly) of your database, schema and object level with information like table size, row count, column count, column population as well as it’s lineage.

Below provided code samples can be leveraged to derive the information..

Step 1: Dynamic Function (count_rows)can be used to pass the schema and table name to capture the row count.

CREATE OR REPLACE FUNCTION public.count_rows(schema text,tablename text)

RETURNS int8

LANGUAGE plpgsql

VOLATILE

AS $$

declare

result bigint;

query varchar;

begin

query := ‘SELECT count(1) FROM ‘ || schema || ‘.’ || tablename;

execute query into result;

return result;

end;

$$

;

Step 2 : Dynamic Function (cnt_cols) can be used to count the number of columns by passing the schema and table name.

CREATE OR REPLACE FUNCTION public.cnt_cols(schema text,tablename text)

RETURNS int4

LANGUAGE plpgsql

VOLATILE

AS $$

declare

result int;

query varchar;

begin

query := ‘SELECT count(column_name) FROM information_schema.columns where table_schema=’’’ || schema || ‘’’ and table_name=’’’ || tablename ||’’’’;

execute query into result;

return result;

end;

$$

;

Step 3 : Function(has_nonulls) can be used to verify if a column has no null values by passing the schema , table and column name iteratively.

CREATE OR REPLACE FUNCTION public.has_nonnulls(p_schema text,p_table text,p_column text)

RETURNS bool

LANGUAGE plpgsql

VOLATILE

AS $$

declare

b boolean;

begin

execute ‘select exists(select 1 from ‘|| p_schema ||’.’||

p_table||’ where “‘||p_column||’” is not null limit 1)’ into b;

return b;

end; $$

;

Step 4 : Now that the basic functions are created, they can be used inside the SQL Queries to capture the information around the specific schemas and objects. For example below query can be used to capture number of rows and columns in a table for a list of schema(s).

SELECT

table_schema,table_name,table_schema || ‘.’ || table_name AS Schema_TableName,

pg_size_pretty(pg_total_relation_size(‘“‘ || table_schema || ‘“.”’ || table_name || ‘“‘)) AS TableSize

,count_rows(table_schema, table_name) AS Row_Count

,cnt_cols(table_schema, table_name) AS Column_Count

FROM information_schema.tables

where

table_schema in (provide the schema here as parameter)

and table_type = ‘BASE TABLE’

group by table_schema,table_name

Step 5 : Below SQL Query can be used to verify if any column in a table is completely null

select

table_schema,table_name,

,sum(case when Is_Null_Check=’False’ then 1 else 0 end) as Is_Null_count

from

(

select table_schema,table_name,table_schema || ‘.’ || table_name AS Schema_TableName

,has_nonnulls(table_schema, table_name, column_name) as Is_Null_Check

from information_schema.columns

where table_schema in (

‘’

) — pass the schemas here

)a

group by 1,2

The above set of queries and functions can be leveraged to capture periodic snapshot information and persist to user defined tables specifically to monitor database growth , statistics, data quality issues via Dashboard to give Quick insights.

My name in Praveen Kadipikonda, I work on designing and implementing Datawarehouse systems as Solution Architect, Data Architect and BI Reporting aspects.