CICD (DevOps) for Data Science: Part Deux

It’s been some time since I presented Part 1 of this DevOps for Data Science short anthology. Since then I have been working on scripting out this solution for a series of presentations I am doing as part of a Discovery & Insights Roadshow.

The example draws inference (pun intended) from anomaly detection and predictive maintenance scenarios since I seem spend a chunk of time working in this space. This is an implementation specifically of a Continuous Integration (CI) / Continuous Delivery (CD) pipeline in support of the applications that support those scenarios – Typically, when one is developing an AI application, two parallel motions take place:

a) A data science team is working on building out the machine learning (ML) models; often the handoff occurs when they publish out an endpoint that gets integrated into the app.

b) AppDev or developers work on the build of the application as well as are typically responsible for exposing it to end users via a web /mobile app or take the pre-trained model endpoint and call it from within an internal business process application.

The example use case for this Continuous Integration (CI)/Continuous Delivery (CD) pipeline is a fairly standard / typical Anomaly Detection and Predictive Maintenance machine learning scenario.

In short, the pipeline is designed to kick off for each new commit, run the test suite using model parameters required as input (features, eg) and tests the outputs generated (usually in the form of a score.py file) – if the test passes, the latest build branch is merged into the master which gets packaged into a Docker container and possibly Kubernetes if the operationalization or requirements demand cluster-scale support.

DevOps for AI/ML Architecture



I have included a #Jupyter notebook for you to check out the process of deploying to @Docker and then productionalizing using @Kubernetes – Part of the latter productionalized code set includes the process for gathering data from how your users interact with your model. This is important from a retraining perspective because you need to have a closed loop architecture in order to enable said functionality.

Using Azure to Retrain Production Models Deployed Using Kubernetes

As a side note, Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster much like if not identical to the Jupyter Notebook icon above. Often, I will get asked about DevOps specifically for Databricks in addition to Azure ML so wanted to explictly reference the fact that you can use either or both.

In this section, you will find additional information on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks just like the example above. In additionn, you can also use Automated ML capability (public preview) of Azure ML SDK with Azure Databricks or without. A natural cloud convergence between ML/AI service offerings is currently underway across all vendors. This enables things like:

  • Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning.
  • You can keep the data within the same cluster.
  • You can leverage the local worker nodes with autoscale and auto termination capabilities.
  • You can use multiple cores of your Azure Databricks cluster to perform simultenous training.
  • You can further tune the model generated by automated machine learning if you chose to.
  • Every run (including the best run) is available as a pipeline, which you can tune further if needed.
  • The model trained using Azure Databricks can be registered in Azure ML SDK workspace and then deployed to Azure managed compute (ACI or AKS) using the Azure Machine learning SDK.

Integrating Databricks with Azure DW, Cosmos DB & Azure SQL (part 1 of 2)

I tweeted a data flow earlier today that walks through an end-to-end ML scenario using the new Databricks on Azure service (currently in preview). It also includes the orchestration pattern for ETL (populating tables, transforming data, loading into Azure DW etc), as well as the SparkML model creation stored on CosmosDB along with the recommendations output. Here is a refresher:

Some ndatabricksDataflowonAzureuances that are really helpful to understand: Reading data in as CSV but writing results as parquet. This parquet file is then the input for populating a SQL DB table as well as the normalized DIM table in SQL DW both by the same name.

Selecting the latest Databricks on Azure version (4.0 version as of 2/10/18).

Using #ADLS (DataLake Storage , my pref) &/or blob.

Azure #ADFv2 (Data Factory v2) makes it incredibly easy to orchestrate the data movement from 3rd party clouds like S3 or on-premise data sources in a hybrid scenario to Azure with the scheduling / tumbling one needs for effective data pipelines in the cloud.

I love how easy it is to connect BI tools as well.  Power BI Desktop can connect to any ODBC data source and specifically to your Databricks clusters by using the Databricks ODBC driver. Power BI Service is a fully managed web application running in Azure. As of November 2017, it only supports Spark running on HDInsight. However, you can create a report using Power BI Desktop and upload it to an Azure service.

The next post will cover using @databricks on @Azure with #Event Hubs !

KPIs in Retail & Store Analytics

I like this post. While I added some KPIs to their list, I think it is a good list to get retailers on the right path…

KPIs in Retail and Store Analytics (continuation of a post made by Abhinav on kpisrus.wordpress.com:
A) If it is a classic brick and mortar retailer:

Retail / Merchandising KPIs:

-Average Time on Shelf

-Item Staleness

-Shrinkage % (includes things like spoilage, shoplifting/theft and damaged merchandise)

Marketing KPIs:

-Coupon Breakage and Efficacy (which coupons drive desired purchase behavior vs. detract)

-Net Promoter Score (“How likely are you to recommend xx company to a friend or family member” – this is typically measured during customer satisfaction surveys and depending on your organization, it may fall under Customer Ops or Marketing departments in terms of responsibility).

-Number of trips (in person) vs. e-commerce site visits per month (tells you if your website is more effective than your physical store at generating shopping interest)

B) If it is an e-retailer :

Marketing KPIs:

-Shopping Cart Abandonment %

-Page with the Highest Abandonment

-Dwell time per page (indicates interest)

-Clickstream path for purchasers (like Jamie mentioned do they arrive via email, promotion, flash sales source like Groupon), and if so, what are the clickstream paths that they take. This should look like an upside down funnel, where you have the visitors / unique users at the top who enter your site, and then the various paths (pages) they view in route to a purchase).

-Clickstream path for visitors (take Expedia for example…Many people use them as a travel search engine but then jump off the site to buy directly from the travel vendor – understanding this behavior can help you monetize the value of the content you provide as an alternate source of revenue).

-Visit to Buy %

-If direct email marketing is part of your strategy, analyzing click rate is a close second to measuring conversion rate. 2 different KPIs, one the king , the other the queen and both necessary to understand how effective your email campaign was and whether it warranted the associated campaign cost.

Site Operations KPIs / Marketing KPIs:

-Error % Overall

-Error % by Page (this is highly correlated to the Pages that have the Highest Abandonment, which means you can fix something like the reason for the error, and have a direct path to measure the success of the change).

Financial KPIs:

-Average order size per transaction

-Average sales per transaction

-Average number of items per transaction

-Average profit per transaction

-Return on capital invested

-Margin %

-Markup %

I hope this helps. Let me know if you have any questions.

You can reach me at mailto://lauraedell@me.com or you can visit my blog where I have many posts listing out various KPIs by industry and how to best aggregate them for reporting and executive presentation purposes ( http://www.lauraedell.com ).

It was very likely that I would write on KPIs in Retail or Store Analytics since my last post on Marketing and Customer Analytics. The main motive behind retailers looking into BI is ‘customer’ and how they can quickly react to changes in customer demand, rather predict customer demand, remove wasteful spending by target marketing, exceeding customer expectation and hence improve customer retention.

I did a quick research on what companies have been using as a measure of performance in retail industry and compiled a list of KPIs that I would recommend for consideration.

Customer Analytics

Customer being the key for this industry it is important to segment customers especially for strategic campaigns and to develop relationships for maximum customer retention. Understanding customer requirements and dealing with ever-changing market conditions is the key for a retail industry to survive the competition.

  • Average order size per transaction
  • Average sales per transaction

View original post 278 more words

MicroStrategy World 2012 – Miami

Our internal SKO (sales kick off) meeting was the beginning of this years’ MSTR World conference ( held in Miami, FL at the Intercontinental Hotel located on Chopin Plaza). As with every year, the kickoff meeting is the preliminary gathering of the salesforce in an effort to “rah-rah” the troops who work the front lines around the world ( myself included).

What I find most intriguing is the fact that MicroStrategy is materializing for BI all of those pipe dreams we ALL have. You know the ones I mean : I didn’t buy socialintelligence.co for my health several years ago. It was because I saw the vision of a future where business intelligence and social networking were married. Or take cloud intelligence, aka BI in the cloud. Looking back in 2008, I remember my soapbox discussion of BI mashups, ala My Google, supported in a drag and drop off premises environment. And everyone hollered that I was too visionary, or too far ahead. That everyone wanted reporting, and if I was lucky, maybe even dashboards.

But the acceleration continued, whether adoption grew or not.

Then, i pushed the envelope again: I wanted to take my previous thought of the mashup a morph it into an app integrated with BI tools. Write back to transactional systems or web services was key.

What is a dashboard without the ability within the same interface to take action? Everyone talks about actionable metrics/KPIs. Well, I will tell you that to have a KPI BY DEFINITION OF WHAT A KPI IS, means it is actionable.

But making your end users go to a separate ERP or CRM, to make the changes necessary to affect a KPI, will drive your users away. What benefit can you offer them in that instance ? Going to a dashboard or an excel sheet is no different. It is 1 application to view and if they are lucky, to analyze their data. If they were using excel before , they will still be using excel, especially if your dashboard isn’t useful to day to day operations.

Why? They still have to go to a 2nd application to take action.

Instead, integrate them into one.

Your dashboard will become meaningful and useful to the larger audience of users.
Pipe dream right?

NO. I have proved this out many times now and it works.

Back in 2007-2008, it was merely a theory I pontificated with you, my dear readers.

Since then, I have proved it out several times over and proven the success that can be achieved by taking that next step with your BI platforms.

Folks, if you haven’t done it, do it. Don’t waste anymore time. It took me less then 3 days to write the web services code to consume the salesforce APIs including chatter, ( business “twitter” according to SFDC), into my BI dashboard ( mobile dashboard in fact).

And suddenly, a sales dashboard becomes relevant. No longer does the salesforce team have to view their opportunities and quota achievement in one place, only to leave or open a new browser, to access their salesforce.com portal in order to update where they are at mid quarter.

But wait, now they forgot which KPIs they need to add comments to because they were red on the dashboard which is now closed, and their sales GM is screaming at them on the phone. Oh wait…they are on the road while this is happening and their data plan for their iPad has expired and no wireless connection is found.

What do you do?

Integrating salesforce.com into their dashboard eliminates at least one step (opening a new browser) in the process. Offering mobile offline transactions is a new feature of MicroStrategy’s mobile application. This allows those sales folks to make the comments they need to make while offline, on the road , which will be queued until they are online again.

One stop, one dashboard to access and take action through, even when offline, using their mobile ( android, iPad/iPhone or blackberry ) device.

This is why I’m excited to see MicroStrategy pushing the envelope on mobile BI futures.