Integrating Databricks with Azure DW, Cosmos DB & Azure SQL (part 1 of 2)

I tweeted a data flow earlier today that walks through an end-to-end ML scenario using the new Databricks on Azure service (currently in preview). It also includes the orchestration pattern for ETL (populating tables, transforming data, loading into Azure DW etc), as well as the SparkML model creation stored on CosmosDB along with the recommendations output. Here is a refresher:

Some ndatabricksDataflowonAzureuances that are really helpful to understand: Reading data in as CSV but writing results as parquet. This parquet file is then the input for populating a SQL DB table as well as the normalized DIM table in SQL DW both by the same name.

Selecting the latest Databricks on Azure version (4.0 version as of 2/10/18).

Using #ADLS (DataLake Storage , my pref) &/or blob.

Azure #ADFv2 (Data Factory v2) makes it incredibly easy to orchestrate the data movement from 3rd party clouds like S3 or on-premise data sources in a hybrid scenario to Azure with the scheduling / tumbling one needs for effective data pipelines in the cloud.

I love how easy it is to connect BI tools as well.  Power BI Desktop can connect to any ODBC data source and specifically to your Databricks clusters by using the Databricks ODBC driver. Power BI Service is a fully managed web application running in Azure. As of November 2017, it only supports Spark running on HDInsight. However, you can create a report using Power BI Desktop and upload it to an Azure service.

The next post will cover using @databricks on @Azure with #Event Hubs !

Respect Paid to Pitney Bowes – How PBBI Turned This Blogger’s Opinion Around

I posted the Gartner Magic Quadrant last week (Gartner Magic Quadrant for Data Integration – Delta Comparing 2007-2009) and commented my opinion on the choices for winners and losers (big surprise, but really, I was shocked).

For one, BI practitioners tend to believe they are experts at their domain, and rightfully so, if they are good at what they do and have been doing it for a few years. In my case, 11 years of my life have been spent learning, upgrading, relearning and immersing myself in business intelligence tools and platforms.

So, this year, I was surprised by their ETL quadrant because of Pitney Bowes – Here is my comment:

[Laura Edell comment] Ummm, I thought Pitney Bowes provided corporations with stamps and other business-related supplies…How does one leap from that genre to not just business intelligence, but data integration…? Maybe to compete with the former Business Objects Data Quality Zip Code Cleanser? j/k – but I thought that was eye catching enough to call out.

A little while later, I was most surprised to receive an email from Pitney Bowes’ VP of Communication following up on my comment in a most professional manner. He also offered to chat further about their PBBI solution and walk me through the history and evolution of the PBBI product stack:

Here is a list of ETL / BI related links from the Pitney Bowes site @ 

Improve Operational Efficiency

Automated Address Management for Improving Operational Efficiency

Kudos, Pitney Bowes! I stand corrected!