Here is another technique worth mentioning in regards to building a date dimension. One way to do this is to create a loop and iterate over this to insert the records. This is slow in virtually every database and we have a way to do this with a single INSERT statement.
First create the table:
DROP TABLE if exists hr.date_dim; CREATE TABLE hr.date_dim ( date_key timestamp without time zone NOT NULL, date_week timestamp without time zone NOT NULL, date_month timestamp without time zone NOT NULL, date_quarter timestamp without time zone NOT NULL, date_year timestamp without time zone NOT NULL ) DISTRIBUTED BY (date_key);
And now the magic:
INSERT INTO hr.date_dim SELECT mydate AS date_key, date_trunc('week', mydate) AS date_week, date_trunc('month', mydate) AS date_month, date_trunc('quarter', mydate) AS date_quarter, date_trunc('year', mydate) AS date_year FROM ( SELECT '1995-01-01'::timestamp + interval '1 day' * (generate_series(0, (EXTRACT('days' FROM '2007-04-16'::timestamp - '1995-01-01'::timestamp)::int))) AS mydate ) AS sub;
My example builds the date dimension with every day populated between January 1, 1995 through April 16, 2007. I did this as it was the minimum and maximum dates in my HR table with job history information.
This solution is similar to the other post I made about using generate_series to avoid a nested loop because I’m using generate_series again. It is a very powerful and easy way to dynamically create a dataset without using a loop to do so.