Greenplum Objects Metadata Queries to get table sizes ,statistics and its Lineage
I would like to cover how we can use combination of Greenplum user defined and built-in functions to create dashboard that can provide insights into the periodic Growth(Daily, Weekly, Monthly) of your database, schema and object level with information like table size, row count, column count, column population as well as it’s lineage.
Below provided code samples can be leveraged to derive the information..
Step 1: Dynamic Function (count_rows)can be used to pass the schema and table name to capture the row count.
CREATE OR REPLACE FUNCTION public.count_rows(schema text,tablename text)
query := ‘SELECT count(1) FROM ‘ || schema || ‘.’ || tablename;
execute query into result;
Step 2 : Dynamic Function (cnt_cols) can be used to count the number of columns by passing the schema and table name.
CREATE OR REPLACE FUNCTION public.cnt_cols(schema text,tablename text)
query := ‘SELECT count(column_name) FROM information_schema.columns where table_schema=’’’ || schema || ‘’’ and table_name=’’’ || tablename ||’’’’;
execute query into result;
Step 3 : Function(has_nonulls) can be used to verify if a column has no null values by passing the schema , table and column name iteratively.
CREATE OR REPLACE FUNCTION public.has_nonnulls(p_schema text,p_table text,p_column text)
execute ‘select exists(select 1 from ‘|| p_schema ||’.’||
p_table||’ where “‘||p_column||’” is not null limit 1)’ into b;
Step 4 : Now that the basic functions are created, they can be used inside the SQL Queries to capture the information around the specific schemas and objects. For example below query can be used to capture number of rows and columns in a table for a list of schema(s).
table_schema,table_name,table_schema || ‘.’ || table_name AS Schema_TableName,
pg_size_pretty(pg_total_relation_size(‘“‘ || table_schema || ‘“.”’ || table_name || ‘“‘)) AS TableSize
,count_rows(table_schema, table_name) AS Row_Count
,cnt_cols(table_schema, table_name) AS Column_Count
table_schema in (provide the schema here as parameter)
and table_type = ‘BASE TABLE’
group by table_schema,table_name
Step 5 : Below SQL Query can be used to verify if any column in a table is completely null
,sum(case when Is_Null_Check=’False’ then 1 else 0 end) as Is_Null_count
select table_schema,table_name,table_schema || ‘.’ || table_name AS Schema_TableName
,has_nonnulls(table_schema, table_name, column_name) as Is_Null_Check
where table_schema in (
) — pass the schemas here
group by 1,2
The above set of queries and functions can be leveraged to capture periodic snapshot information and persist to user defined tables specifically to monitor database growth , statistics, data quality issues via Dashboard to give Quick insights.