Using Microsoft Azure’s Data Factory you can pull data from Amazon S3 and Google Cloud Storage to extract into your data pipeline (ETL workflow). However, Microsoft does not allow you to load (put/upload) files back into these platforms at the end of your extract, transform, load cycle.
To get past this and to enable the ability to load files back into these platforms you can utilize the SFTP connector and Couchdrop. Couchdrop is a cloud SFTP / FTP conduit that acts as a fabric on top of cloud storage and offers webhooks, an API and supports web portal uploads, etc.
In this case Couchdrop supports Google Cloud Storage, Amazon S3, SharePoint, Dropbox, and anything in between. Using Couchdrop with Azure Data Factory you can pull data from any cloud storage platform, transform it and then load it to the same or a different cloud platform, all through SFTP.
Configuring Couchdrop is straightforward and only takes a couple of steps. You can create users who are locked to specific buckets and are limited to specific file operations (upload only, download only, read/write, etc.). You can also configure webhooks based on upload/download events on certain folders. This enables you to trigger different workflows based on the uploaded folder and user. Couchdrop also offers an API to assist with onboarding users programmatically.
To get up and running with Couchdrop’s cloud SFTP server and integrate it into your ETL processes, follow these steps:
Navigate to your storage portal and configure a new storage connector. Below we are configuring Google Cloud Storage.
Next, configure a user. When making the user, be sure to isolate them to the folder configured above (/Google Cloud in our case). In theory this could be an external party uploading data. You could create another user who has read/write access who can pull the data down based on the webhook event of the user uploading a file and extract it into your workflow.
Under the specific folder in Couchdrop’s SFTP virtual file system you wish to send a webhook on — select the event you wish the webhook to be sent on and the URL and save.
Sample Couchdrop SFTP webhook output:
As Couchdrop SFTP works as a standard SFTP server, you simply need the hostname (sftp.couchdrop.io) and your Couchdrop SFTP’s user credentials.
On a final note, Couchdrop is simply a conduit and does not store data, nor does it ‘sync’ data to storage platforms. It processes transfers in memory directly to your endpoint which is overwritten.
Want to try Couchdrop for yourself? You can get a 14-day free trial with no credit card and no required sales calls or other hoops to jump through. Simply sign up and go. Start your trial today.