Platform Extension Framework (PXF) – Azure Blob Store

Greenplum can read and write data to Azure blob store with PXF just like it can with AWS S3. Here is a quick demo on using Azure.

Step 1 – Start PXF

Run this as gpadmin on the Master (mdw) host.

pxf cluster start

Step 2 – Create Azure Storage Account

Log into Azure and navigate to Storage Accounts and create a new one.

Create New Storage Account

The Storage Account name must be unique within Azure. I picked “greenplumguru” for this demo but your name must be different.

Step 3 – Retrieve the Storage Account Key

After the Storage Account has been created, go to the resource and click on “Access Keys” to copy it.

Retrieve the “key”.

Step 4 – Configure PXF

On the master node (mdw), execute the following commands.

mkdir $PXF_CONF/servers/demo
cp $PXF_CONF/templates/wasbs-site.xml $PXF_CONF/servers/demo/
vi $PXF_CONF/servers/demo/wasbs-site.xml

Change YOUR_AZURE_BLOB_STORAGE_ACCOUNT_NAME to the name of your Storage Account from Step 2. Change YOUR_AZURE_BLOB_STORAGE_ACCOUNT_KEY to the key value from Step 3.

Step 5 – Sync PXF

pxf cluster sync

Step 6 – Create a Container in your Storage Account

Click on Containers in your Storage Account
Create Container

Step 7 – Create Writable External Table

(id int, description text)
LOCATION ('pxf://')
FORMAT 'csv';

In the above example, I used the following:

  • “greenplumguru” for the Storage Account from step 2
  • “demo” as the name of the server configuration from step 4
  • “demo-container” which is the container name from step 6

Step 8 – Insert Some Data

INSERT INTO ext_write SELECT i, 'foo_' || i 
FROM generate_series(1,100000) as i;

Step 9 – Create Readable External Table

(id int, description text)
LOCATION ('pxf://')
FORMAT 'csv';

Step 10 – Select from External Table

SELECT * FROM ext_read LIMIT 10;

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.