This is the third in a series of “How do I do X in Greenplum database” and today’s topic is How do I determine the language for a function.
Inline SQL
A common way to use a function is to transform data directly in the SELECT portion of a query. Here is a quick example.
CREATE TABLE person (id int NOT NULL, fname text, lname text, company_name text) DISTRIBUTED BY (id); --sample data INSERT INTO person VALUES (1, 'Jon', 'Roberts', null); INSERT INTO person VALUES (2, null, 'Smith', null); INSERT INTO person VALUES (3, null, null, 'ABC Company');
Now the use case is to retrieve the “name” of the person but it might be a combination of the three text columns. A business decision was made to do format the name as follows:
SELECT CASE WHEN company_name IS NULL AND fname IS NOT NULL AND lname IS NOT NULL THEN lname || ', ' || fname WHEN fname IS NULL AND lname IS NOT NULL THEN lname WHEN company_name IS NOT NULL AND fname IS NOT NULL AND lname IS NOT NULL THEN company_name || ': ' || lname || ', ' || fname WHEN company_name IS NOT NULL AND fname IS NULL AND lname IS NOT NULL THEN company_name || ': ' || lname WHEN company_name IS NOT NULL AND fname IS NULL AND lname IS NULL THEN company_name END as name FROM person; name -------------- Roberts, Jon ABC Company Smith (3 rows)
It would be much easier and consistent to add this logic to a function to be executed so it centralizes the logic to a single place and makes writing SQL a bit easier. This is when using the language “SQL” is preferred. Here is an example and notice that parameters are referred to $1, $2, $3 rather than the name of the parameter. The SQL language does not support named variables.
CREATE OR REPLACE function fn_get_name(p_fname text, p_lname text, p_company_name text) RETURNS text AS $$ SELECT CASE WHEN $3 IS NULL AND $1 IS NOT NULL AND $2 IS NOT NULL THEN $2 || ', ' || $1 WHEN $1 IS NULL AND $2 IS NOT NULL THEN $2 WHEN $3 IS NOT NULL AND $1 IS NOT NULL AND $2 IS NOT NULL THEN $3 || ': ' || $2 || ', ' || $1 WHEN $3 IS NOT NULL AND $1 IS NULL AND $2 IS NOT NULL THEN $3 || ': ' || $2 WHEN $3 IS NOT NULL AND $1 IS NULL AND $2 IS NULL THEN $3 END as name $$ LANGUAGE sql;
And now using this function.
SELECT fn_get_name(fname, lname, company_name) AS name FROM person; name -------------- Roberts, Jon ABC Company Smith (3 rows)
Transformation
The next use case is when transforming data and PL/pgSQL does a great job at this. You can reference parmeters by name, declare variables, handle errors, etc. The code looks a lot like Oracle PL/SQL too.
CREATE OR REPLACE FUNCTION fn_update_person() RETURNS void AS $$ DECLARE v_rowcount int; BEGIN UPDATE person SET fname = initcap(fname), lname = initcap(lname), company_name = initcap(company_name); GET DIAGNOSTICS v_rowcount = ROW_COUNT; RAISE INFO '% Rows Updated.', v_rowcount; END; $$ LANGUAGE plpgsql VOLATILE;
Notice how I’m not looping through the data or issuing a commit after so many records. There are two reasons for this. First, Greenplum is designed for Big Data so manipulating lots of data in a single transaction is ideal. Secondly, a function in Greenplum is executed in a single transaction. You can’t create a transaction within a function because executing a function is by default in a transaction.
Summary
I hope this helps. When transforming data inline, use the SQL language and use the more robust PL/pgSQL language when transforming data.