Shared Nothing / MPP Database Architecture
Greenplum database is a “Shared Nothing” database in which data is spread out across many segments located on multiple segment hosts. No two segments have the same data and data is distributed based on a distribution key.
For example, let’s say you have a customer table in Greenplum with the distribution key of (customer_id). Your Greenplum cluster has 16 segments so your data would actually be physically located like this:
Sement | Customer ID |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
5 | 6 |
6 | 7 |
7 | 8 |
8 | 9 |
9 | 10 |
10 | 11 |
11 | 12 |
12 | 13 |
13 | 14 |
14 | 15 |
15 | 16 |
When a query is executed like “SELECT COUNT(*) FROM customer”, there will be 16 processes created (one for each segment) where the number of rows are counted in parallel across all segments. This is the design behind a Shared Nothing or Massively Parallel Processing (MPP) database.
Skew
So what is skew? In my customer table example, I have even distribution. That is, each segment has the same number of rows. If I distributed instead by state_code and all of my customers are in Tennessee, then all of the data would be in a single segment and thus be skewed.
Having tables that are skewed will slow down processing significantly because it isn’t taking advantage of the entire cluster.
Do I need to Check For Skew?
If you set your distribution key to the logical or physical Primary Key of the table, checking for skew isn’t needed. This will give you great distribution of data.
There are times when you don’t have a PK on a table or you want to set the distribution key to something else because you commonly join two very large tables together. For example, customer and orders tables may perform better if distributed by customer_id so that the data is collocated. However, if the majority of your orders are by one customer, you will have skew in the orders table.
Checking for Skew Method 1
This is the most common and straight forward way to check for skew.
SELECT gp_segment_id, COUNT(*) FROM customer GROUP BY gp_segment_id ORDER BY 1;
This will count the rows per segment and give you a good indication of the skew. The downside to this is it physically counts the rows for this one table at at time. This takes time especially if you have a large number of tables.
Checking for Skew Method 2
I developed this second approach which looks at the file sizes for each table for each segment. It then will output only the tables that have at least one segment with more than 20% more bytes than expected.
Execute the following script in your database.
CREATE OR REPLACE FUNCTION fn_create_db_files() RETURNS void AS $$ DECLARE v_function_name text := 'fn_create_db_files'; v_location int; v_sql text; v_db_oid text; v_num_segments numeric; v_skew_amount numeric; v_version text; BEGIN v_location := 1000; SELECT oid INTO v_db_oid FROM pg_database WHERE datname = current_database(); v_location := 2000; v_sql := 'DROP VIEW IF EXISTS vw_file_skew'; v_location := 2100; EXECUTE v_sql; v_location := 2200; v_sql := 'DROP EXTERNAL TABLE IF EXISTS db_files'; v_location := 2300; EXECUTE v_sql; v_location := 3000; v_sql := 'CREATE EXTERNAL WEB TABLE db_files ' || '(segment_id int, relfilenode text, filename text, ' || 'size numeric) ' || 'execute E''ls -l $GP_SEG_DATADIR/base/' || v_db_oid || ' | ' || 'grep gpadmin | ' || E'awk {''''print ENVIRON["GP_SEGMENT_ID"] "\\t" $9 "\\t" ' || 'ENVIRON["GP_SEG_DATADIR"] "/' || v_db_oid || E'/" $9 "\\t" $5''''}'' on all ' || 'format ''text'''; v_location := 3100; EXECUTE v_sql; v_location := 4000; SELECT count(*) INTO v_num_segments FROM gp_segment_configuration WHERE preferred_role = 'p' AND content >= 0; v_location := 4100; v_skew_amount := 1.2*(1/v_num_segments); v_location := 4200; SELECT CASE WHEN POSITION('Greenplum Database 5' in v) > 0 THEN 'GPDB5' WHEN POSITION('Greenplum Database 4' in v) > 0 THEN 'GPDB4' ELSE 'other' END INTO v_version FROM version() AS v; v_sql := 'CREATE OR REPLACE VIEW vw_file_skew AS ' || 'SELECT schema_name, ' || 'table_name, ' || 'max(size)/sum(size) as largest_segment_percentage, ' || 'sum(size) as total_size ' || 'FROM ( ' || 'SELECT n.nspname AS schema_name, ' || ' c.relname AS table_name, ' || ' sum(db.size) as size ' || ' FROM db_files db '; IF v_version = 'GPDB4' THEN v_sql := v_sql || ' JOIN pg_class c ON '; ELSE v_sql := v_sql || ' JOIN gp_dist_random(''pg_class'') c ON '; END IF; v_sql := v_sql || ' (split_part(db.relfilenode, ''.'', 1))::text = c.relfilenode::text ' || ' JOIN pg_namespace n ON c.relnamespace = n.oid ' || ' WHERE c.relkind = ''r'' and c.relstorage not in (''x'', ''v'', ''f'')' || ' GROUP BY n.nspname, c.relname, db.segment_id ' || ') as sub ' || 'GROUP BY schema_name, table_name ' || 'HAVING sum(size) > 0 and max(size)/sum(size) > ' || v_skew_amount::text || ' ' || 'ORDER BY largest_segment_percentage DESC, schema_name, ' || 'table_name'; v_location := 4300; EXECUTE v_sql; EXCEPTION WHEN OTHERS THEN RAISE EXCEPTION '(%:%:%)', v_function_name, v_location, sqlerrm; END; $$ language plpgsql;
Now execute the function so it will create the necessary objects in your database.
SELECT fn_create_db_files();
Now select from the view to see the tables that are skewed.
SELECT * FROM vw_file_skew ORDER BY 3 DESC;
Summary
When tuning a database, a Greenplum DBA will typically only need to worry with skew instead of tuning queries like other databases. Once the tables are created with good distribution keys, the tuning job is 90% or more done. If you think one particular table is not distributed well, use Method 1. If you need to validate an entire database quickly, use Method 2.
Note: Edited to support GPDB5.
Note: Edited again to fix for GPDB5.