https://davewentzel.com/David Wentzel LLC2021-12-07T15:37:22-06:00Dave Wentzel is a Data and AI Architect with Microsoft.Dave Wentzeldave#davewentzel.comhttp://davewentzel.comJekyllhttps://davewentzel.com/content/DataLiteracySeries/Data Literacy Workshops2021-11-17T00:00:00-06:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comI am hosting a series of Data Literacy Workshops for the MTC. <p><em>Data Literacy</em> is a fundamental skill for the Digital Age. While companies have lots of data, very few folks know how to leverage data to be an <em>insights-driven</em> company. At the MTC, we want to change that. We created a workshop series for December to help evangelize data fluency.</p>
<p>Feel free to signup for one, or all, events.</p>
<p>The Microsoft Technology Center (MTC) delivers immersive engagements focused on business outcomes. This workshop is hosted by <a href="https://www.linkedin.com/in/dwentzel">Dave Wentzel</a>, the Philadelphia MTC’s resident Decision Architect. Dave has been working with all things data for more than 25 years. Many technologists have “know-how”, but few have “know-what” and “know-why”. Dave’s concentration is in data science, prescriptive analytics, and data literacy.</p>
<blockquote>
<p>“We have a data lake, a data warehouse, and Power BI…but no one has still been able to show us how to leverage our data assets to provide actionable intelligence.” – <em>heard during a recent MTC engagement</em></p>
</blockquote>
<p>Sessions:</p>
<ul>
<li>Thinking Like a Data Scientist</li>
<li>Prescriptive Analytics: The Future of Data in the Enterprise.</li>
<li>How to NOT make analytical cognitive mistakes</li>
<li>Design Thinking as a way to avoid data project failures</li>
<li>Building a Data CoE: Analytics Best Practices</li>
</ul>
<h1 id="rsvp-today">RSVP Today!</h1>
<p><a href="https://davew-msft.github.io/">Please RSVP</a>. We promise we won’t spam you. <strong>A Microsoft Teams meeting invite will be sent to you a few days before the event.</strong> We are not yet sure we will record these sessions.</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
</ul>
2021-11-17T00:00:00-06:00https://davewentzel.com/content/SoftwareDecisionCalc/Software Implementation Decision Calculus2021-08-31T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comHere is the decision matrix I use when evaluating software purchasing, implementation, and deployment decisions<p>As a Microsoft Technology Center technical evangelist, and former CTO, I get asked every day:</p>
<ul>
<li>What are your opinions on software package x?</li>
<li>We need a solution for ETL. Should we purchase a commercial software package or deploy some open-source project?</li>
<li>Should we use a first-party (native) Azure service, or the underlying open source code the Azure service is built on?</li>
</ul>
<p>Basically, folks are asking me for my unbiased opinions on software purchasing and deployment decisions based on my experiences. I’m going to assume you’ve already determined the most critical decision points:</p>
<ul>
<li>What are the requirements of your project? You certainly wouldn’t purchase an ETL tool if your requirement was a self-service data lake query tool.</li>
<li>What are the existing skill sets within your organization? If you are looking for a self-service analytics tool and your analysts have no knowledge of SQL, then you might want to start with a graphical querying tool.</li>
<li>What is your timeline, budget, etc? If you have zero available budget for software you certainly wouldn’t want to purchase a COTS solution.</li>
</ul>
<blockquote>
<p>Disclosures: I am a Microsoft employee. I tend to recommend first-party native Azure services for most data and analytics needs. But not because I’m paid to. Our services are really pretty good. I pride myself on always giving my unvarnished opinions and it has gotten me into trouble at times. I’m a technologist that prefers writing a few lines of python vs using a no-code/low-code tool because I value long-term flexibility over perceived short-term velocity gains when using these tools. I prefer writing a little SQL over having a point-and-click query writer.</p>
</blockquote>
<p>So, let’s assume you know what type of software you need and you are evaluating what is available on the market. For every package and vendor, here are the key decision points I believe you should consider:</p>
<ul>
<li>Does the software have an open-source or commercial license?</li>
<li>Who are the package’s reference customers?</li>
<li>Community: Does the software have a presence on StackOverflow and what is the state of the user community? Is there an open user forum?</li>
<li>Does the software run as a PaaS, SaaS, or containerized offering? Or will I need to deploy the solution on VMs and deal with backups, patching, security, etc?</li>
<li>What does Gartner and the other research advisories think?</li>
<li>What are the development paradigms? Is it customizable? Is there an API? Can I <em>shell out</em> to run a custom script? What is the learning curve?</li>
<li>Do a rapid prototype. Was your team comfortable using the package?</li>
</ul>
<p>Let’s dive deeper into each of these questions.</p>
<h2 id="licensing-modelsopen-source">Licensing Models/Open Source</h2>
<p>Is the software open source (OSS)? If it isn’t, I strongly look elsewhere. Although I rarely look at the source code, sometimes you have to if you come across a bug and the vendor’s on-call support is weak. Grokking source code scares a lot of technologists. It should not. Many times I have coded a simple solution that throws errors and the software does not appear to be working as the documentation suggests. Looking at the source code is a quick way to confirm my assumptions.</p>
<blockquote>
<p>Be careful! Not all OSS licenses are created equally. If you are an ISV it is critical you understand the OSS license of all software you deploy. Some OSS licenses will require YOU to also open source ALL of your IP as part of their terms. I’ve written about <a href="https://davewentzel.com/content/open-source-licensing/">these concerns</a>.</p>
</blockquote>
<p>Some organizations would rather not deal with “self-supporting” OSS and would rather have a software vendor with commercial licenses that can be called at 2am when emergency issues arise (think Microsoft SQL Server). If this is your organization’s culture then certainly stick with commercially-licensed software. There’s even a hybrid model that is the best of both worlds you may want to consider: some companies will provide commercial support for OSS offerings. An example: Microsoft provides commercially-supported and cloud-hosted versions of many OSS offerings…including <a href="https://azure.microsoft.com/en-us/services/postgresql/">PostgreSQL</a>. PostgreSQL is a well-known and well-liked OSS database, but commercial support offerings are difficult to find.</p>
<h2 id="reference-customers">Reference Customers</h2>
<p>ISVs will always provide a list of reference customers on their website, usually a little banner of their biggest customers. Here’s one I just scraped off an ISV’s website (I won’t disclose <em>who</em>):</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/software02.png" /></p>
<p>What value am I getting from this data?</p>
<p>If these are reference customers then I want to speak to a user at one of these organizations, hopefully in my industry, and have a frank conversation about the software I’m about to purchase. In all of my years in IT I have never once, yet, had an ISV that would allow me to speak directly to a reference customer.</p>
<p>I wonder why.</p>
<p>I don’t want to read a testimonial or see a quote. I want to speak to an actual user and ask questions about their satisfaction, implementation hurdles, on-call support nightmares, etc.</p>
<blockquote>
<p>While I’m surfing a vendor’s website I also like to look for spelling mistakes and similar issues that make me think the vendor does not value quality and user experience.</p>
</blockquote>
<h2 id="stackoverflow">Stackoverflow</h2>
<p>I always research software I’m not familiar with on <a href="stackoverflow.com">stackoverflow</a>. SO is the one website that most technologists can’t live without. If a software package has a weak presence on SO then I won’t deploy that tool. Period.</p>
<p><img align="right" width="300" src="https://davewentzel.com/images/so.png" />
I like to spend at least an hour reading the top entries on SO for the tool. What are the quality of the questions? What are the recurring questions? Are users generally satisfied or are there a lot of comments that indicate the software is functionally incomplete?</p>
<blockquote>
<p>Thanks to stackoverflow I haven’t written an original line of code in at least 5 years. I copy/paste all of my code from stackoverflow. My guess is most developers do too!</p>
</blockquote>
<h2 id="community-interest">Community Interest</h2>
<p>Does the software package have a thriving user community? Does the vendor host user groups, sponsor meetups, or speak at industry conferences? Years ago you could gauge community interest by looking at the forums on the vendor’s website. I’m less interested in these forums since most discourse is occurring on stackoverflow these days. However, if I see a forum is “closed”, requires a subscription or paid support, or is heavily moderated…it raises my suspicions. Why should I pay a fee to have access to a forum where the users are primarily solving problems I have with the software?</p>
<p>One benefit of forums is participants tend to be more honest about their true feelings about the software and especially the support (or lack of it) team. This discourse is often eye-opening. Not all vendors will allow this much openness. <em>Be wary of paywalled support forums.</em></p>
<h2 id="paas-saas-containerization">PaaS, SaaS, containerization</h2>
<p>In today’s world I don’t want to deploy software that requires my team to manage infrastructure. Minimally I expect the software to run in a container that I can customize and scale on Kubernetes. I don’t want to deal with backups and VM patching. I always prefer first and third party Azure offerings for these reasons. I want to focus on solving business problems, not operations problems.</p>
<h2 id="research-advisories-opinions">Research Advisories’ Opinions</h2>
<p><img align="right" width="300" src="https://davewentzel.com/images/gartner.jpg" />It’s always worthwhile to see what Gartner, Forrester, and the other industry analysts think of a software vendor and where it ranks versus its peers. Minimally, you will get a list of alternative software packages and vendors to research.</p>
<blockquote>
<p>To be honest, I don’t put much faith in industry analyst opinions. In most cases the analysts have never actually used the software and are generating their opinions based on interview data and feedback from large corporate customers. Many of these big customers are, in fact, defending their purchasing decisions instead of objectively evaluating the software under review. And then of course there are the uncomfortable relationships between vendors and analysts. It’s not always clear who is funding a research report.</p>
</blockquote>
<h2 id="development-concerns">Development Concerns</h2>
<ul>
<li>Customizability: When I hit a limitation with the tool, how easy is it to “shell out” and build something custom? Can I do that with python or C# or am I locked in to the vendor’s bespoke scripting language which I now must learn?</li>
<li>APIs and scriptability: Can the software be operated via automation? If this software is going to participate in your DevOps pipelines it needs to be scriptable.</li>
<li>Training: Frankly, if I need training to use the software, I’m immediately concerned. I don’t need my team taking even more training. And, honestly, most developers don’t want to take more “tool-based” training. Tool-based knowledge is not aspirational for most developers. Most developers don’t want to learn another language, they want to learn another <em>domain</em>. Domain-based knowledge is what most technologists really want. Example: Most .NET developers don’t want to learn python, they want to learn what differentiates python from .NET, why it is the <em>lingua franca</em> of data science, and how they can use domain-based data science training to become better .NET developers.</li>
<li>What is more important: Fit for purpose vs fit for use? It’s the former. If I have a team of python developers, I don’t want to force a low-code/no-code solution onto them. It will feel constraining…like a toy, not a productivity tool.</li>
</ul>
<blockquote>
<p>I see this a lot…salesmen present the wrong tools to development teams. Don’t do this. Understand your audience.</p>
</blockquote>
<p>Most developers just want to know if the tool is going to add more friction to their day.</p>
<h2 id="do-a-rapid-prototypebake-off">Do a Rapid Prototype/Bake-Off</h2>
<p>Most software vendors want to send you to free training. Training is great, but is not a substitute for hands-on experience. <em>Demand a rapid prototype.</em> Use the business problem you want to solve with this new software as the use case to do a 3 day rapid prototype. You won’t solve the business problem in 3 days, but you want your team to walk away with a better idea of “a day in the life” of using this software. Is the software cumbersome? Does it handle your team’s development concerns? What are the rough edges? A few pointers on rapid prototypes:</p>
<ul>
<li>Don’t waste time installing or setting up the software. This is a “once and done” activity and is not something that is done daily.</li>
<li>Let the vendor know this is a rapid prototype and they are being evaluated on how well the software fits the needs of your team. This isn’t training. You are expecting results.</li>
<li>Tell the vendor you intend on running similar bake-offs with their competitors. The vendor likely knows the rough edges of their competition.</li>
<li>Conduct a retrospective after the RP. Is this a tool your team <em>wants</em> to use? Did it solve the problem it claims to solve?</li>
</ul>
<h2 id="how-the-microsoft-technology-center-can-help">How the Microsoft Technology Center can help</h2>
<p>The Microsoft Technology Center helps customers make technology decisions. The MTC mandate is to be the Trusted Advisor for our customers. We do that by showing how data can add business value. The tech is easy, what’s hard is understanding the processes that work. We never make technology recommendations until we understand your business problem and the culture and capabilities of your team. (I’ve been told this is refreshing).</p>
<p>I recommend to every customer that they run a small-scale Rapid Prototype <img align="right" width="250" src="https://davewentzel.com/images/mtc1.jpg" />before making <em>any</em> technology decisions. Training is great, but there is no replacement for hands-on experience. At the MTC we advocate 3-day rapid prototypes. We know we can’t solve your business problem in a few days, but we want you to walk away with a better understanding of our best practices and processes, as well as how our software will fit with your team.</p>
<p>MTC architects are seasoned technology veterans, consultants, and former executives. We’ve used our tools for years, we’ve worked with our product teams to make them better, and we can evaluate which tools will work best for your unique team. We also know the tool is less important than understanding the underlying domain-knowledge, patterns, and processes that lead to successfully solving complex business problems.</p>
<p>Does that sound like a partner you can trust? Does “3 days” for a rapid prototype sound like a good investment of your time? Before you start your next technology project contact me on LI and let me show you a different approach to solving problems.</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-08-31T00:00:00-05:00https://davewentzel.com/content/DSaaS/MTC Data Science-as-a-Service2021-08-18T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comAt the MTC we offer a service for our customers that I call Data Science-as-a-Service. You could also call it Rent-a-Data Scientist. In this article I will show you some ways to leverage this offering.<p>I get a little bored this time of year. I’m a data scientist (among other things) for the Microsoft Technology Center. The MTC is a service our customers can leverage to learn about Azure, modern data analytics, data science, and a bevy of other topics. One offering that I particularly love is something I call “Data Science-as-a-Service”, or what I formerly called “Rent-a-Data-Scientist”. The fact is, our customers could always use a little help with data science and analytics. We’re here to help and here’s a few ways we can.</p>
<h2 id="design-thinking">Design Thinking</h2>
<blockquote>
<p>“Does this AI/ML/Data Science use case make sense?”</p>
</blockquote>
<p>Customers ask me this question all the time. There is a lot of hype around what AI can do and the MTC is here to help you. One approach to teasing out a good use case is to have a <a href="https://davewentzel.com/content/DesignThinking/">human-centered design thinking workshop</a>. These sessions are a lot of fun. We work together to solve a business problem by utilizing data science. We take the business problem and have a dialog around how we can improve outcomes by showing empathy with the solutions we create. We want you to walk away from these workshops inspired and energized. We help you create a prototype that you can evangelize after the session with your leadership.</p>
<p><img align="right" width="200" src="https://davewentzel.com/images/ds11.png" />When customers experience our DT workshops the common response is, “We didn’t know Microsoft did these things.” Exactly! Data scientists do more than just oil their slide rules and polish their pocket protectors. We actually use <em>science</em> to solve problems in fun ways. Even if you have an army of data scientists it’s likely they could use a little help with DT.</p>
<blockquote>
<p>We can always spot customers who have strong data scientists, but who are not trained in Design Thinking. They will build brilliant solutions to problems that may not exist. The MTC can help. Many customers allow us to host their DT sessions so they can learn our methods. We love to do this! And we’ve actually <a href="https://davewentzel.com/content/ForresterAward/">won awards</a> for our DT sessions.</p>
</blockquote>
<h2 id="we-arent-getting-the-value-we-thought-wed-get-from-our-data-scientists">“We aren’t getting the value we thought we’d get from our data scientists”</h2>
<p>We hear this a lot. Often this is due to the wrong work being assigned to your data scientists. As an example: most younger data scientists are well-versed in software development practices (such as DevOps, git, and scrum). But some are not software developers and struggle when they need to build their own DevOps processes.</p>
<blockquote>
<p>Does it really make sense to have a data scientist doing software development (or data engineering) if she isn’t really good at it?</p>
</blockquote>
<p>The MTC can help you focus your data scientists and ensure they are actually doing <em>data science</em>. Examples:</p>
<ul>
<li>MLOps: This is an off-shoot of DevOps. By automating mundane CI/CD tasks we can allow data <img align="right" width="200" src="https://davewentzel.com/images/devops.png" />scientists to do what they do best…<em>experiment</em>. I have an MLOps workshop that I do with data science teams. It’s about 2 days and we show both the data science team and the Ops team how to work together to show business value. This is my 2nd most popular workshop at the MTC and it’s a lot of fun.</li>
<li>Assist with other aspects of data: Many times data scientists are asked to build data engineering pipelines, perform ETL tasks, or simply acquire data from 3rd parties. Some data scientists are really good at this, and others need a little assistance. We’re here to help.</li>
<li>Learning the latest technologies: Microsoft has <a href="https://azure.microsoft.com/en-us/free/machine-learning">Azure Machine Learning Service</a> that helps data scientists improve their velocity by <img align="right" width="200" src="https://davewentzel.com/images/amls.jpeg" />integrating newer technologies like <code class="highlighter-rouge">automl</code> with monitoring facilities that will detect model drift and data skew. This service (which is basically “enhanced MLFlow”) is invaluable to data scientists that are not trained in software development. The MTC can help you discover the patterns that work.
<h2 id="help-us-scale-our-data-science-practice">“Help us scale our data science practice”</h2>
</li>
</ul>
<p>Most of the companies I talk to have a data science team. Often they are swamped, working on extracting value from high priority use cases. Often what is needed is something like “Rent-a-Data-Scientist”. But, throwing bodies at a problem, or outsourcing it, rarely works. What we like to do with customers is take on a riskier project and then add value by creating and building processes that are re-usable and will be beneficial to the rest of the team. Some examples:</p>
<ul>
<li>building processes that deploy inferencing endpoints to Kubernetes</li>
<li>showing “Citizen Data Scientists” how to be more productive using our patterns</li>
<li>building MLOps pipelines</li>
<li>creating human feedback loops to ensure what we build gets even more valuable over time.</li>
</ul>
<blockquote>
<p>Companies don’t fail at data science due to a lack of use cases; they fail because they have too many.</p>
</blockquote>
<p>The MTC can help you learn and experiment with <em>technology</em> and <em>processes</em> while you stay focused on core use cases.</p>
<h2 id="help-us-think-like-a-data-scientist">“Help us Think Like a Data Scientist”</h2>
<blockquote>
<p>“We hired smart data scientists, but we don’t know how to get started. Can you help us with a good first use case?”</p>
</blockquote>
<p><img align="right" width="200" src="https://davewentzel.com/images/ds20.png" />This is another mantra we hear from customers. Some organizations have tons of possible use cases (see above), others are struggling to come up with their first use case. Data scientists do more than just statistics and neural network building. While these are important, other <em>soft skills</em> like Design Thinking can prove invaluable, especially when you want to find a compelling use case.</p>
<p>The fact is, it’s very difficult to know if a use case is a <em>good</em> use case and how much time should be invested in it. Data science is not typical software engineering where we can discuss the basics of the product and then go and build it. Data science projects are a series of small experiments and deductive reasoning. We need to constantly evaluate when we are <em>done</em>, which could mean “deploy the model to production” or “let’s scrap this project”. The MTC has the calculus that you can leverage.</p>
<blockquote>
<p>“Don’t let <em>perfect</em> be the enemy of <em>good enough</em>.” We’ve seen TOO MANY data science projects fail because the team feared putting an imperfect model into production and continued to tweak the model in development. The problem is: training data is never the same as real, production data. Deploy the model as soon as you can and then monitor it. “No model is perfect, but many are <em>useful</em>”.</p>
</blockquote>
<p>Even Business Intelligence teams see radical throughput improvements when they begin to <em>think like a data scientist</em>. The time-to-value is shifted to the left and data projects become less risky with a higher success rate.</p>
<blockquote>
<p>Pro tip: At the MTC we’ve learned that agile/scrum rarely works for analytics and data science projects. Instead, consider using <em>Lean</em> management principles (fail-fast, MVPs) to control your initiatives at the program level and then kanban to control the day-to-day activities.</p>
<h2 id="we-dont-have-experience-managing-data-science-projects">“We don’t have experience managing data science projects”</h2>
</blockquote>
<p>As mentioned above, data science projects are not typical software projects. We have experience managing these projects and can help you.</p>
<p>Here’s one way we can help: assisting you with your outsourced analytics projects.</p>
<p>Many of our customers outsource their data science projects. This could be because their in-house talent is swamped or they want to leverage a consultancy to solve a particularly difficult analytics problem.</p>
<p>The problem is most consultancies want to manage these engagements as fixed scope, fixed fee engagements…much like a traditional software project. But, as I’ve said many times already, data projects are not linear, they are iterative. In many cases we have to “start over” because we had a flaw in our deductive reasoning. With a fixed scope SOW it’s in the consultancy’s best interest to deliver a flawed product that meets the SOW, yet provides no value.</p>
<p>Ugh!</p>
<blockquote>
<p>Think about it: how can a consultancy provide a SOW for a data science project that is filled with unknowns? They can’t. The only way this can work is if the contract is structured like a staff augmentation engagement. <a href="https://davewentzel.com/content/OutsourceAnalytics/">I’ve written about this extensively</a>.</p>
</blockquote>
<p><img align="right" width="300" src="https://davewentzel.com/images/ds12.png" />So, you’ve outsourced your project for a fixed scope SOW to a consultancy and now the project is faltering and not providing the value you thought it would. How can the MTC help? We can conduct a Design Thinking workshop to determine the true problem, evaluate possible solutions, and then determine the best solution using “Cost Impact Analysis” (as well as many other tricks). We can then work with your consultants to transition into truly valuable work. We’ve never failed doing these exercises and the consultancies never mind altering their SOWs <em>if</em> they see the value in delivering the <em>right</em> product to a happy customer.</p>
<h2 id="can-you-help-us-get-started-with-a-project">“Can you help us get started with a project?”</h2>
<p>Yes. Here’s one way: let us host your next hackathon!</p>
<p>Hackathons bring diversity of thought. They are like a combination of <em>Design Thinking</em> and <em>Rapid Prototypes</em>. Not only are you solving business problems but your staff is learning new approaches to analytics and new technologies.</p>
<p>Most companies provide time for their techies to run their own hackathons. While this is a good start, if you truly understand how to run a success hackathon you know it’s more than just a Saturday where your developers write some code. If hackathons are done correctly they are integral to your sprint and product planning. Hackathons should also be a little competitive. Motivation comes from competition. Gamification is a big piece of that. We can help you create hackathons that drive business outcomes yet feel highly creative and fun, while providing the structure to ensure success.</p>
<blockquote>
<p>We love hackathons. But they are totally underused in Corporate America. If you are a member of a software development scrum team consider replacing your sprint planning sessions with hackathons. Instead of <em>talking</em> about what we want to solve, why not actually <em>solve</em> something?</p>
</blockquote>
<h2 id="can-you-help-us-save-this-failing-data-science-initiative">“Can you help us save this failing data science initiative?”</h2>
<p>Anecdotally, the two biggest reasons data science projects fail:</p>
<ul>
<li>data acquisition: Ingesting data and preparing it for analytics is difficult. We have patterns that will remove this risk.</li>
<li>feedback loops: Many data scientists struggle designing feedback loops. A feedback loop is how an AI or ML algorithm truly learns. Designing a good feedback loop is an art and I’ll write about it in a future article. We can help you design and code feedback loops which will make your algorithms BETTER over time.</li>
</ul>
<h2 id="can-you-help-us-build-a-data-innovation-lab">“Can you help us build a data innovation lab?”</h2>
<p><img align="right" width="300" src="https://davewentzel.com/images/innovation.jpg" />Yes! We have an entire workshop on this. We think our approach is unique: We don’t just <em>build</em> an innovation lab with you, instead we focus on the processes that are most effective to drive growth. Leveraging the cloud is integral to our approach. Nothing stifles innovation more than having a great business idea that has to wait a month for infrastructure provisioning. We can teach you these processes, you can evaluate them, determine what works for you, and then scale your processes to your entire organization.</p>
<h2 id="we-have-free-time-how-can-the-mtc-help-you">We have free time, how can the MTC help you?</h2>
<blockquote>
<p>“We didn’t know the MTC did that” –a common reaction from our customers</p>
</blockquote>
<p>We, of course, focus heavily on our tech solutions at Microsoft, but we also strive to be the Trusted Advisor for our customers. <img align="right" width="250" src="https://davewentzel.com/images/mtc1.jpg" />MTC architects have actually done this stuff in the real world and we know the patterns and anti-patterns. A lot of what we do at the MTC is driving culture change. Data and analytics have made revolutionary advances over just the last few years, we can help you leverage these processes to become more <em>data-driven</em>.</p>
<p>SUCCESS for the MTC is solving challenging problems for respected companies and their talented staff. Our typical engagements are 1-3 days. We want to show value quickly. Does that sound like a good investment of your time? The Digital Transformation is here, and we know how to help. How can we help you with your data science initiatives?</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-08-18T00:00:00-05:00https://davewentzel.com/content/Governance/Top 10 Data Governance Anti-Patterns for Analytics2021-08-06T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comMore companies are adopting data lakes and self-service analytics. The problem is data governance has not kept up with the new technologies and the needs of modern analytics. In this article I'll show you some ways to modernize your governance strategies to achieve your business objectives.<p>At the Microsoft Technology Center (MTC) we talk to a lot of data leaders that are struggling to leverage their traditional data governance approaches in the Age of Advanced Analytics. Formerly tried-and-true governance approaches are not working anymore and they can’t exactly put their finger on the reasons. Analytics has changed and data governance has not kept pace. Microsoft and the MTC has some patterns and processes that can help you modernize your governance to achieve your Digital Transformation goals.</p>
<h2 id="what-is-data-governance">What is Data Governance?</h2>
<p>There are entire books on <em>data governance</em> that you should refer to if you need a refresher. Data governance is the overall management of the availability, usability, integrity, and security of an enterprise’s data assets. Most companies will have a data governance body that sets the rules, defines the procedures, and ensures the various data teams execute on the plan.</p>
<h2 id="the-big-problem-with-data-governance">The Big Problem with Data Governance</h2>
<p>Data has changed radically the past few years and governance best practices have not kept up. Years ago we only had operational systems where the notions of governance were easy and not too controversial. For example: few users had the ability to query the data directly for fear of bringing the system down, therefore the default posture was to DENY access. When we built the first generation of analytical systems (namely, data warehouses) the governance was only slightly more complex. Users now wanted the ability to query the data themselves, which usually involved exporting the data to Excel because we still feared that the user might issue a rogue query and bring down the analytics system. Every company governed these access requests differently based on its culture … some <em>trusted</em> their users and allowed access, others did not.</p>
<p>Today, data sources have exploded and business users are demanding <em>Self-Service Analytics</em>. They want to be able to look at data on their terms and do <em>Exploratory Data Analytics</em> without the need for IT to broker access. Data lakes and lakehouses have exploded which further encourage users to be curious about their data assets. With a data lake the users merely wire up their own compute instance (which might be a data warehouse, Spark, Power BI, or Excel) to the data asset and they can do whatever they want. Dashboarding and visual query tools like Power BI are easy to use and more business users are comfortable writing a modicum of SQL, the <em>lingua franca</em> of <em>data</em>.</p>
<p><img align="center" width="600" src="https://davewentzel.com/images/lake.png" /></p>
<p>Data governance teams have not kept pace. They are not accustomed to granting access to data for experimentation. They want to know what the data will be used for. Their default posture is to limit access to data. Certainly it makes sense to limit business users from having direct query access to operational systems that likely contains PHI and PII, but why hinder a business user from valuable data that can be monetized?</p>
<blockquote>
<p>Too many business people equate “data governance” with being told NO. If that’s your company, you need to start changing that culture NOW. Companies that CANNOT leverage their data assets to make data-driven decisions will soon find themselves marginalized as their competition embraces the promise of Digital Transformation. Data governance should enable innovation, not be its roadblock.</p>
</blockquote>
<h2 id="data-governance-anti-patterns">Data Governance Anti-Patterns</h2>
<p>Here are some common Data Governance Anti-Patterns we see at the MTC, and our recommendations for solutions.</p>
<h3 id="anti-pattern-01--data-governance-as-an-it-function">Anti-Pattern 01: Data governance as an IT function</h3>
<p>Is your data governance team comprised solely (or even mostly) of IT personnel? If so, I’ll wager your data governance is not aligned with your business goals. A company’s data is not OWNED by IT, it is owned by the business. IT is merely the steward. Business units should have representation on all governance committees. Data governance is multi-disciplinary and should be an enabler of business value.</p>
<blockquote>
<p>Data governance is not a function of IT. It should be driven by the business in support of company objectives.</p>
</blockquote>
<h3 id="anti-pattern-02--relying-on-a-tools-based-governance-approach">Anti-Pattern 02: Relying on a Tools-based Governance Approach</h3>
<p><img align="right" width="150" src="https://davewentzel.com/images/purview.png" />Too many times I hear “we want to use Azure Purview to help us with tagging, classification, access, and lineage of our data to help us with data governance project”. Azure Purview is great for that. The problem is you told me a TECHNOLOGY problem. When I probe deeper around what the company’s strategic goals are for its data assets I don’t get clear answers. And I never hear: “Well, we talked to our business users and what they would really like to see is…”.</p>
<p>The data governance goals should drive the governance tool choices, the tool should not drive the governance goals.</p>
<blockquote>
<p>Data governance is a journey, not a destination</p>
</blockquote>
<h3 id="anti-pattern-03--putting-piiphisensitive-data-in-the-lake-therefore-necessitating-governance">Anti-Pattern 03: Putting PII/PHI/sensitive data in the lake (therefore necessitating governance)</h3>
<p>Data lakes are the best and most-often used tools for analytics. But I see technologists copying EVERY piece of operational data to the lake. Why? I can see no good reason to put credit card numbers or SSNs in a data lake. They serve no analytical purpose. The more copies you make of sensitive data the more your risk for a breach increases.</p>
<p>I’ve heard data leaders say they need some sensitive data in the lake so the lake can be “the single source of truth” and support operational reporting that requires this data. If that’s the case then you have 2 choices:</p>
<ul>
<li>Make a copy of the sensitive data into the lake and risk a breach. Then you’ll need to determine how to mask the data and how to ensure every analytics tool is honoring the masking. This isn’t easy. It takes a lot of work and introduces too much risk, especially if a business goal is Self-Service Analytics, which implicitly means a more “open” approach to data.</li>
<li>Don’t copy the sensitive data and find another way to support those reporting needs that require sensitive data. This is a much better approach.
<ul>
<li>Use a <em>data mesh</em> approach where my reporting tool can call an API to retrieve sensitive data for reports that require it.</li>
<li>Have a separate data structure with separate reports for sensitive data.</li>
</ul>
</li>
</ul>
<p><img align="right" width="300" src="https://davewentzel.com/images/dg03.png" /></p>
<h3 id="anti-pattern-04--using-the-same-governance-rules-for-all-lake-zones-and-analytics-datasets">Anti-Pattern 04: Using the same governance rules for all lake zones and analytics datasets</h3>
<p>Data lakes (or whatever you use for your analytics sandbox) have “zones”. Everyone calls the zones by different names – “landing”, “raw”, “curated”, “bronze”, “silver”, “gold”. There are 2 key reasons why we have zones. Based on the zone name we should implicitly know:</p>
<ul>
<li>… the quality of the data. “Gold” data should be highly certified, guaranteed-accurate data, whereas “raw” might be a bit dirty, duplicated, or, well, <em>raw</em>. Quality is one component of governance, therefore, we should expect that some zones will have dirty data. That’s ok.</li>
<li>… who should be using it. Gold is likely meant for all self-service users. Therefore it should be modelled and governed accordingly. “Raw” should only be used by data scientists or those personas that understand that this data should not be used to derive quarterly sales data that we give to the SEC.</li>
</ul>
<p>Different areas of the data lake should have different governance approaches.</p>
<blockquote>
<p>Self-service analytics REQUIRES a more “open” approach to data access. There is no other way.</p>
</blockquote>
<h3 id="anti-pattern-05--insisting-on-early-governance">Anti-Pattern 05: Insisting on “Early” Governance</h3>
<p>Sometimes when I help customers with a difficult analytics problem I’ll have someone interrupt me and want to discuss the data governance implications of the outputs of our work. Said differently, they want to control the data we are trying to produce BEFORE WE’VE EVEN PRODUCED IT.</p>
<p>This is the wrong approach.</p>
<p>The better approach is to allow the analytics team to find “the nuggets of gold” first, and solve the business problems. Then, as part of a review process, let’s have a thoughtful conversation about how we should govern our new insights.</p>
<ul>
<li>Is this regulated or sensitive data?</li>
<li>Is this something that we want to provide to our self-service users? Should it be queryable in the gold zone? If so, should this data be on a dashboard? Is it discoverable in the data catalog?</li>
<li>If it is in the catalog do we allow anyone to see the data or do we want to have users make a request so we can track <em>why</em> and <em>how</em> they are using the data? This will help us further understand the governance needs without making knee-jerk decisions at the start of a project before we understand the <em>knowledge</em> our users are creating.</li>
</ul>
<p>I call this “late governance”. We are deferring all governance decisions until we are sure we have generated a valuable business insight. I’ll say it again, data governance should not stifle innovation.</p>
<blockquote>
<p>“Early” data governance is diametrically opposed to self-service analytics, which is a business goal for every customer I talk to.</p>
</blockquote>
<p>Similarly, most analytics projects I work on will require the team to ingest new datasets. Proponents of “early governance” will insist that this new data be controlled, before we’ve even experimented to see if it provides <em>lift</em>. This stifles the experimentation process.</p>
<h3 id="anti-pattern-06--following-a-one-size-fits-all-personas-policy">Anti-Pattern 06: Following a “One-Size-Fits-All-Personas” Policy</h3>
<p><img align="right" width="200" src="https://davewentzel.com/images/frustration.jpg" />Some analytics personas will want access to the most raw, dirty data, so they can spot trends and anomalies. Think data scientists. They will want access to the “raw” or “bronze” area of the data lake. But this data would be overwhelming and misinterpreted by other personas (like business users). Let these other folks see that the data is available (via the catalog), but make them request access and explain why you are asking them to request access to it (because you want to understand <em>why</em> they want access to raw data). Other personas need “self-service” analytics but don’t really have SQL skills. These users should have access to summarized and “certified” data that they can use in Power BI or Excel. These personas would have access to the “gold” and “platinum” zones which would map closely to the semantic tier and data warehouse facts and dims. “Business analyst” personas that understand SQL might need access to more granular, yet “clean”, data. This maps closely to the “silver” or “curated” zones for most data lakes.</p>
<p>This structure allows far more flexibility. We can give everyone access to what we think they need based on their persona and they always have visibility into ALL data and can request access if needed.</p>
<blockquote>
<p>Never force your data scientists to use data warehouse data (facts and dims). It almost never works for their needs. They’ll get frustrated and you will stifle their innovation. Data lakes were originally built to store data that these personas needed to do their jobs. Since then everyone else has seen the efficacy in using the lake for analytics where the data is easier to manipulate.</p>
</blockquote>
<h3 id="anti-pattern-07a--denying-write-access-to-users">Anti-Pattern 07a: Denying write access to users</h3>
<p>One of the goals of a data lake is to be “an analytics sandbox”. This means users will need to make copies of data for experimentation…they’ll need to semantically-enrich that data (usually via SQL) and they’ll need to save copies of it somewhere where they can reference it and continue to build upon it as they search for business value. To do that they need to have write access to the sandbox area of the lake. The sandbox is very much like a temp table in SQL Server.</p>
<blockquote>
<p>Human users, whether business analysts or data engineers, should never have write access to any area of the lake except the sandbox. The remainder of the lake should only be writable by the scheduler/job user. Not even the Ops Team should be allowed to write to the lake. This is a core lake governance concept.</p>
</blockquote>
<p>In the past the DBA <em>never</em> gave write access to the warehouse to common users. This made analytics very difficult since most analysts are not able to do all of the data enrichment they need in a single SQL statement. This is why it was always easier for a user to export the data to Excel and do the analytics there.</p>
<h3 id="anti-pattern-07b--allowing-shared-sandboxes">Anti-Pattern 07b: Allowing Shared Sandboxes</h3>
<p>While we want users to be able to have a writable sandbox, we do NOT want them to have a <em>shareable</em> sandbox. Why?</p>
<p>Imagine your analyst, Annie, found a valuable business insight and has it persisted in her sandbox. She tells the Operations team she would like to move it to the data warehouse so it can eventually be added to a Power BI dashboard. The governance team finds out and wants to hold various meetings to understand the data better. The data warehouse team says they’ll need 3 months to integrate the data into the fact table. The dashboarding team wants to meet with Annie to understand…ugh…isn’t this exhausting? Annie found something valuable and she’s being punished. What does she do? She shares the data in her sandbox with her team…and likely later she shares it with everyone. Now the data is “published” in various Power BI dashboards and has zero governance. That’s not good.</p>
<blockquote>
<p>Make people follow the governance process by not allowing ad hoc sharing. If users complain about <em>stifled innovation</em> then that means your governance process needs an overhaul. FIX IT!</p>
</blockquote>
<p>You’ve probably heard the old aphorism: <strong>Without governance the data lake soon turns into a data swamp</strong>. Not true. If you follow all of the above advice there is NO WAY you’ll ever have a swamp. But make sure your processes aren’t so onerous that you are a drag on innovation.</p>
<blockquote>
<p>How can you spot an organization that struggles with data governance? They have obviously inefficient business processes.</p>
</blockquote>
<h3 id="anti-pattern-08--not-managing-by-exception">Anti-Pattern 08: Not Managing By Exception</h3>
<p>Business users will find all kinds of valid reasons to do things that violate governance rules. It’s OK to make exceptions to your governance plan.</p>
<p>Here’s an example I’ve seen many times: the governance team mandates that only Spark, Synapse, and other “approved” tools can be used to query the data lake. But eventually a department will purchase a query tool they want to use <em>with THEIR data</em>. Too many organizations will disallow this. Why? This really doesn’t make any sense. The compute engine (ie, the “tool”) is not important, the data is.</p>
<blockquote>
<p>With a data lake, the model should be “Bring Your Own Compute” (BYOC). Use the tool you want to use.</p>
</blockquote>
<p>This is just one example. Governance teams need to be aware of when exceptions to policies need to be made.</p>
<blockquote>
<p>In so many companies I work with the users equate “data governance team” with <strong>NO</strong>. Don’t let this be your company.</p>
</blockquote>
<h3 id="anti-pattern-09--putting-the-permissioning-at-the-compute-layer">Anti-Pattern 09: Putting the Permissioning at the Compute Layer</h3>
<p>Always do the ACL’ing in the data lake. This allows users to BYOC (see above). The tool no longer matters. The user simply logs in and the tool passes the credential to the lake to determine access.</p>
<blockquote>
<p>Never apply the permissions at the compute tier, do all permissioning strictly at the data persistence tier…the lake.</p>
</blockquote>
<h3 id="anti-pattern-10--mandating-all-data-be-in-the-official-corporate-data-lake">Anti-Pattern 10: Mandating All Data Be in the Official Corporate Data Lake</h3>
<p>This never works. Why? For the same reasons departmental data marts sprung up 20 years ago in spite of the corporate data warehouse: The business can’t wait for IT, so it builds its own solutions. We would call these “skunkworks” projects or “Shadow IT”. These projects would solve the immediate business problem, but at the long-term expense of data governance and building more and more data siloes.</p>
<p>Every customer I talk to has multiple data lakes (“ponds?”). Most are structured around business units. My advice is not to stifle this innovation simply to conform to a corporate governance standard. Instead, assist these business units with their governance efforts by providing the tools and templates you are using for the corporate lake. Help them, don’t hinder them.</p>
<blockquote>
<p>Data governance is not a “project”. Data is constantly changing, and so is the data management field. Data governance should be viewed as an on-going corporate “program”.</p>
</blockquote>
<h2 id="how-the-mtc-can-help">How the MTC can help</h2>
<p>Is your data governance team embracing modern analytics notions like:</p>
<ul>
<li>self-service analytics</li>
<li>exploratory data analytics (EDA)</li>
<li>data monetization (using your data assets to drive business goals in non-traditional ways)</li>
<li>prescriptive analytics (using data to augment decision making that formerly was done using gut insights. Example: What should be our next marketing campaign?)</li>
</ul>
<p>Data governance is a bigger problem today than ever. Companies are using data for more novel use cases and regulators are taking notice. But your governance strategy still has to foster innovation. Your company’s view of its data estate and risk tolerance has likely evolved in the last few years. <em>Self-Service Analytics</em> is impossible to achieve with outmoded governance anti-patterns that I’ve outlined above. This is a delicate balancing act that the MTC understands very well. If you want to achieve <em>Digital Transformation</em> then you must realize these are issues of <em>culture</em>.</p>
<p>The Microsoft Technology Center is a service that helps customers on their Digital Transformation journey. We know that <img align="right" width="250" src="https://davewentzel.com/images/mtc1.jpg" />successful data governance efforts are less about the technology and more about modern processes…and people. Data governance is changing to support dual mandates of heightened regulatory burdens and self-service initiatives. At the MTC, we’ve been doing this for years. We are thought leaders, conference speakers, and former consultants and executives. We’ve learned the patterns that will help you transform your governance programs. And with the Azure cloud and our governance technologies like Azure Purview, we can execute in hours-to-days instead of months.</p>
<p>Does this sound compelling? SUCCESS for the MTC is solving challenging problems for respected companies and their talented staff. Does that sound like folks you can trust with your data? The Digital Transformation is here, and we know how to help. Would you like to engage?</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-08-06T00:00:00-05:00https://davewentzel.com/content/DeadDashboard/The Dashboard is Dead, Probably?2021-08-01T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comSome data vendors are claiming _the dashboard is dead_. Doubtful. But some of the underlying premises are worthy of discussion. You just might be doing it wrong.<p>There’s a movement by a few data analytics vendors (<a href="https://go.thoughtspot.com/e-book-dashboards-are-dead.html">here’s one</a>) that says, “<em>dashboards are dead</em>.” Most of this is slick marketing with a dose of clickbait (much like the title of this article). Visualizations are a great way to convey information and that likely won’t EVER change. But the promise of dashboards democratizing data for front-line workers and providing self-service analytics is still undelivered. And, as an employee of Microsoft, I fully understand that last sentence might be viewed as heretical. But…</p>
<p>Dashboards alone can’t solve key business problems. At the Microsoft Technology Center (MTC) we have a different approach. Power BI and Tableau won’t be going <a href="https://en.wiktionary.org/wiki/go_the_way_of_the_dodo">the way of the dodo</a> anytime soon, but there are patterns that your analytics teams can leverage to gain <em>information edge</em> today. I do agree that we need to think differently about the role of the dashboard in the analytics landscape.</p>
<h2 id="the-big-problems-with-dashboards-an-allegory">The Big Problems with Dashboards (…an Allegory)</h2>
<p>Tell me if you’ve experienced this story before: <img align="right" width="200" src="https://davewentzel.com/images/dead04.png" /></p>
<p><em>CIO Sherrie</em>: “Andy, I want you to build some dashboards that visually depict our company’s sales KPIs.”</p>
<p><em>Andy the Analytics Director</em>: “Sounds good Sherrie, I’ll put my best people on it right away.”</p>
<p><em>Sherrie</em>: “I’m so looking forward to what your team creates. We need to create actionable insights so our sales team can generate more sales.”</p>
<p><em>One month (or more) later</em></p>
<p><em>BI Developer Bev</em>: “Sherrie, I want to review the dashboards I’ve created for you.”</p>
<p><em>CIO Sherrie</em>: “Great. I’m so glad this is <em>finally</em> ready. Ah, I see you’ve incorporated the corporate colors into your visuals. Nice touch.”</p>
<p><em>Bev</em>: “Let’s start by showing our sales visuals. First, we have a breakdown of sales by region. Note I’ve color-coded the sales by state. Pennsylvania has the highest sales volume, so it’s green.”</p>
<p><em>Sherrie</em>: “Yeah, well, that’s because most of our stores and salespeople are in PA. Nothing too insightful there.”</p>
<p><em>Bev</em>: “Um, well, ok. Here we have a breakdown of sales my month. Note here the sales go up substantially in the Spring time and trough in the winter.”</p>
<p><em>Sherrie</em>: “Well, we sell consumer-grade pool supplies. No big surprise that sales increase in the Spring when everyone is anxious to get into their pools and stop buying when cold weather arrives.”</p>
<p><em>Andy</em>: <em>whispering</em>…“Bev, I think you are losing Sherrie, why don’t you click into the sub-report and we’ll explain all of the data.”</p>
<p><em>Bev clicks and opens this dashboard</em>…</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead02.jpg" /></p>
<p><em>Sherrie</em>: “Yikes, there is a lot of information displayed there. I really have no idea what this is trying to tell me. What conclusions should I make from this data?”</p>
<p><em>Bev</em>: “Um..uh…well…”</p>
<p><em>Andy</em>: <em>interrupting</em> “Sherrie, I’m so sorry, it seems like we didn’t deliver the dashboards you were hoping for. Could you help us understand exactly what you would like to see?”</p>
<p><em>Sherrie</em>: <em>getting perturbed</em> “I already told you, I want to see dashboards that create actionable insights that will drive sales. These dashboards are telling me what I already know. The dashboards need to guide our sales folks on <em>what to do next</em>.”</p>
<p><em>Bev</em>: <em>whispering to Andy</em> “Andy, I’m not in sales, I don’t know how to create <em>actionable insights</em>.”</p>
<p><em>Sherrie</em>: “…and another thing… I don’t just want to see the sales metrics, I want to know <em>why</em> the numbers are what they are. Look at the example on your screen…for the electronics category…<em>why</em> is the actual lagging the target mix so much? What can we do to fix that? This is what we need to know.”</p>
<p><img align="center" width="400" src="https://davewentzel.com/images/dead03.png" /></p>
<p><em>Andy</em>: “We’ll get right on this. We are so sorry for wasting your time.”</p>
<blockquote>
<p>Truth be told, Sherrie nailed it, this dashboard didn’t tell her anything she didn’t already know. How can we, as data analytics professionals, solve this problem?</p>
</blockquote>
<h2 id="a-quick-history-of-business-intelligence">A Quick History of Business Intelligence</h2>
<p><img align="right" width="300" src="https://davewentzel.com/images/dead05.png" />In the beginning, (think 1970’s-ish) businesses used data mostly for tactical purposes. We created “green bar reports” which consisted of printing the raw data from the database into an output (namely, paper) that was easy to handle and share in an age before everyone had a PC. The data was aggregated by a couple of levels, at most. Data was not time-critical.</p>
<blockquote>
<p>These <em>green bar reports</em> were usually shared and passed around the division, annotated heavily with pen and pencil, and usually so worn from use that the ink would fade. These reports, with their annotations, drove more intelligence than most modern dashboards.</p>
</blockquote>
<p>With the advent of what we now call “pixel perfect reports” in the 1990’s (basically, still green bar reports, but now a little more dynamic and easier to share), businesses could use reports for very specific purposes. “Here’s the report that proves to our auditors that we are in compliance.” Reports were used to manage risk and show facts (like sales) that could be used as the basis for more strategic business decisions. Data still was not time-critical. This data was usually sourced from a data warehouse that was batch loaded nightly.</p>
<p>Visualizations (line graphs and pie charts) were always available in spreadsheet and stats tools but it took some time before the average Citizen Analysts could use them intuitively over larger datasets. Today we can build a visualization over terabytes of data almost instantaneously. <em>But is this providing business value if it is still telling me the same thing those green bar reports did in 1977</em>: the bulk of our sales come from Pennsylvania. Have we advanced our business? The only benefits I see are slick new visuals and the ability to dynamically slice-and-dice data. We still aren’t answering Sherrie’s question: “What do we do to improve sales?”</p>
<p>None of this, by the way, is “business intelligence”…this is really just operational reporting with a spiffy front-end. Where is the actual “intelligence”? I did “Business intelligence” as an intern in the 1990s…we applied statistics and regression to create demand curves to determine product sales forecasts and help drive the product development teams. THAT was business intelligence! It answered Sherrie’s question…and that was 1995.</p>
<blockquote>
<p>I posit that many companies still aren’t doing “business intelligence” as I just defined it. Are you? Yet I always see lots of dashboards.</p>
</blockquote>
<p>BI’s meaning has radically changed over the years. Back in the 1990’s (maybe earlier) BI took the form of “What If” analytics, applying basic statistics to your data, and regression curves. We’ve dumbed-down BI over the last 30+ years to be <em>operational reporting</em>. We can do better.</p>
<p>At their inception (mid-2000’s), dashboards provided metrics and KPIs to knowledge workers without needing coding or querying knowledge. The analytics are done via simple drag-and-drop paradigms. But pie charts and histograms still don’t equate to cost savings or revenue generation. Business leaders, like Sherrie, still expect the dashboards to provide actionable insights.</p>
<h2 id="analytics-maturity-models">Analytics Maturity Models</h2>
<p>You’ve probably seen a variant of this graphic:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead07.png" /></p>
<p>This maturity model is my term but I’m sure Gartner and Forrester have something similar. Most analytics leaders can tell you what the chart means but they can’t give a definitive roadmap on how to move from the lower left to the upper right. Here’s what the different levels of the journey mean:</p>
<ul>
<li>Level 0 <code class="highlighter-rouge">Descriptive</code>: this is rear-view mirror reporting. It is <em>WHAT</em> has happened. Its data source is historical data warehouse-style operational data. This is what green bar and pixel-perfect reports and most dashboards currently display.</li>
</ul>
<blockquote>
<p>“In the business world, the rearview mirror is always clearer than the windshield.”
—-Warren Buffett</p>
</blockquote>
<ul>
<li>Level 1 <code class="highlighter-rouge">Diagnostic</code>: this is sometimes lumped together with <code class="highlighter-rouge">descriptive</code>. This analytics style tell you <em>WHY</em> something happened. Most reports and dashboarding tools today handle this level too. When you can drill-down, drill-across, or drill-through the data to determine why last month’s sales dropped, you are doing diagnostic analytics. This is also <code class="highlighter-rouge">root cause analysis</code> and sometimes <code class="highlighter-rouge">data mining</code>.</li>
</ul>
<p>In Level 0 and 1 there is mathematically always a correct answer to the business problem. For instance, if the goal is to report <code class="highlighter-rouge">Quarter Over Quarter</code> sales then a business person will define the math needed for that calculation given the company’s data assets. Sometimes the math is even defined by industry standards like GAAP. We can prove and test if our formula works.</p>
<ul>
<li>
<p>Level 2 <code class="highlighter-rouge">Predictive</code>: most dashboarding tools support this but it requires a little effort on your part and most companies still are not leveraging it. If I have historical (Level 0) sales data then I should be able to use simple linear regression to forecast sales into the future. This is also known as <code class="highlighter-rouge">machine learning</code>. In theory, ML algorithms are not really black boxes and we can get inside of them and tweak the model weights and get a totally reproducible and explainable model. Said differently, we <em>should</em> be able to get one right answer given a set of inputs, but that rarely happens in the real world. Furthermore, if you ask 10 sales executives in your company how they calculate their forecast you’ll likely get 11 answers. The point is: <code class="highlighter-rouge">predictive</code> is also opinionated, or often driven by gut instinct. Most companies are trying to move towards data-driven insights, where data is helping to guide or decision making. Which is what Sherrie wants. That leads to:</p>
</li>
<li>
<p>Level 3 <code class="highlighter-rouge">Prescriptive</code>: few dashboarding tools support this and when they do it is your responsibility to spend the time to program it. Think of <code class="highlighter-rouge">prescriptive</code> as answering the question: <em>Given all of these data inputs, what should I do next?</em>. In the industry this is known as <code class="highlighter-rouge">Next Best Action</code> (sometimes abbreviated NBA or NBX). The <em>Action</em> depends on your use case and industry. If I’m in the insurance industry and I have a valuable customer I would like to up-sell/cross-sell to, the NBX is called <em>Next Best Recommendation</em>. There are many inputs into NBX and it requires knowledgeable business people to sit with analysts and programmers and think through the inputs to each use case.</p>
</li>
</ul>
<p>This level never has one <em>right</em> answer. It is totally up to how you interpret the data and which data you plug into your formulas. These formulas will need to change over time as you gain experience and measure your model’s outcomes and elicit feedback from users. Maybe even do some A/B Testing. There is rarely guidance on these formulas from industry standards bodies and even within a company different teams may want to weight the inputs differently to suit their needs. Many times prescriptive analytics’ outputs can’t be displayed as a spiffy data visualization on a Power BI report. Sometimes the NBX is a simple binary decision: <code class="highlighter-rouge">Should I cross-sell my new product to this customer, or not?</code> The answer may be quite nuanced.</p>
<p>That’s the key to prescriptive analytics…using data to tell a story and be more <em>data driven</em> and less <em>gut driven</em>. Now, how do you display that decision on a dashboard? Does the dashboard provide <em>any</em> value in this case? Or is the real value in having discussions around what data inputs are needed, how to structure the formula, and ultimately what the outcome is (and can we measure it and learn from it)?</p>
<h2 id="another-example-of-prescriptive-analytics">Another example of Prescriptive Analytics</h2>
<p>Imagine you are a marketing department spending $10M on various Facebook marketing campaigns. Your CMO asks you: “What should our <em>next</em> marketing campaign be…Instagram or Twitter?” That’s NBX. That’s “what should we do next?”. Now think to yourself…What would that dashboard look like? I have no clue what that would look like. Maybe something like this?:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead08.png" /></p>
<p>Joking aside, what your CMO really wants is an ROI analysis: “Which campaign will provide the best positive ROI?” Again, this is not something that really requires a dashboard. Right? What the CMO and marketing team will want to do is turn the knobs on the ROI calculation and see what happens (<em>What If</em> Analysis). Then they’ll want to debate and have a conversation on the data inputs and have your analytics team adjust the formulas on-the-fly. There will definitely be visualizations needed, but probably these are throw-away and would never be added to a production dashboard.</p>
<h2 id="problems-and-solutions-with-dashboards-and-dashboarding-tools">Problems (and solutions) with Dashboards and Dashboarding Tools</h2>
<p>My little allegory above points out a lot of the problems with the current state of dashboarding:</p>
<ul>
<li><strong>information overload</strong>: We have a <img align="right" width="400" src="https://davewentzel.com/images/brain.jpg" />predilection to build dashboards that are too <em>busy</em>. It’s just bad design. The cognitive load is too high. Listen, I can’t design a dashboard. Everyone hates my dashboards, and for good reasons. I am far too left-brained. But I do know <em>information overload</em> when I see it. My advice if you design dashboards: read everything by <a href="https://www.amazon.com/Edward-R-Tufte/e/B000APET3Y?ref=sr_ntt_srch_lnk_4&qid=1624986891&sr=8-4">Edward Tufte</a>, he knows how to visually display data. And of course, listen to user feedback.</li>
</ul>
<blockquote>
<p>Dashboard information density is inversely proportional to value-generated. A little white space on a page is a good thing. Don’t fear the white space.</p>
</blockquote>
<ul>
<li><strong>we don’t design for the persona</strong>: Dashboard authors rarely try to understand the “personas” that the dashboards should be designed for. Is this dashboard meant for <a href="https://www.mdrginc.com/sub-text/system-1-and-system-2-market-research-decision-making/#:~:text=System%201%20is%20the%20brain's,day%2Dto%2Dday%20decisions.&text=System%202%20is%20a%20slow,of%20thinking%20where%20reason%20dominates.">System 1 or System 2 thinking</a>? System 1 thinking is reflexive and requires little cognitive load to understand. System 2 is slow, analytical thinking. If you are designing a dashboard for a customer service rep persona to quickly see the “state of the customer” when the customer calls the call center (aka Customer 360)…you want a System 1 dashboard. It will have lots of traffic lights and gauges that will quickly tell the rep exactly how to handle the customer, with almost no thought or effort. But, if I’m building a customer dashboard for a salesperson persona, I want it to be far more analytical. I want the dashboard to tell my salesperson a story about the customer’s history, make some predictions about future sales, and even provide a <em>next best action</em>. This is far more analytical.</li>
</ul>
<blockquote>
<p>Don’t overwhelm your user with lots of data, yet little information.</p>
</blockquote>
<ul>
<li><strong>dashboards rarely tell a story</strong>. Assuming a System 2 dashboard, data should always tell a story. What is the information you want to convey to your users? Take your users on a journey. Remember, I’m right-brain <em>challenged</em>, but here’s how I like to start a data journey (please don’t judge me based on the visuals, look instead at the picture I’m trying to paint):</li>
</ul>
<p><img align="center" width="700" src="https://davewentzel.com/images/dead06.png" /></p>
<p>I list the purpose of each section of the dashboard starting with the metric or <a href="https://en.wikipedia.org/wiki/OKR">OKRs</a> for each department/persona. Each subsequent drill-down visualization should display some data attribute that is important to achieving that metric. Along the way I want users to be able to explore the data and ask their own questions. This is difficult for any dashboarding tool. Query interfaces in most of these tools are cumbersome and feel like a bolt-on extension. There are better ways to enable <em>exploratory data analytics</em> (EDA) than a dashboarding tool.</p>
<p>Here I’ve drilled into the marketing OKR and see this:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead09.png" /></p>
<p>Again, forgive me for the terrible visuals but I’m trying to paint a picture (note that in some visuals we are “in the black” and others we are “in the red”). Looking at the lower right I see the distribution of revenue by customer segment (by age demographic). Now, these “the dashboard is dead” vendors insist that dashboarding tools can’t provide any additional insights into this metric. I disagree. What I’ve been showing you is a sample Power BI dashboard I’ve created. Note that I can choose “Analyze” on this bar chart and Power BI will offer to tell me what factors affect the distribution of Revenue by Customer Segment the most:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead10.png" /></p>
<p>Power BI will tell me about the proportions of revenue for each input. In this case I can how media placements are affecting the demographics:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead11.png" /></p>
<p>This might be meaningful for our marketing team. It definitely gives them some ideas they can research on their own (self-service analytics) before they ask the analytics team for additional help. In my experience they’ll need help because…</p>
<ul>
<li><strong>predefined dashboards are constraining</strong>: Visual-based data exploration is a great <em>start</em> to data analytics but you soon get boxed-in. Whenever I’m looking at a new dataset for a customer I start by doing some basic data profiling. I look at the data in a histogram and “see” if there are any outliers. If the data is time-series I look at the data visually to see if there is seasonality in the data. But even with advanced dashboard training the average knowledge worker can’t ask their own questions of the data <em>unless the underlying data can be queried directly</em>. Dashboard tools expect the data to be modeled a specific way that may not support Citizen Analysts from asking their own questions or trying to determine which insights are the most important or actionable. The data is only as good as the dashboard’s context which leads to time-consuming, error-prone data exploration that might lead to bad conclusions, assumptions, and decisions. (BTW, anecdotally, this is the <strong>most cited reason</strong> I hear from executives as to why they fear Self-Service Analytics).</li>
</ul>
<blockquote>
<p>Dashboards have failed to deliver on the promise of a data-driven culture.
–Quote from one of the “Dashboards are dead” vendors. (I don’t believe this).</p>
</blockquote>
<p>Providing a dashboard is not the same as allowing a knowledge worker to have true Self-Service analytics.</p>
<p>But…modern dashboards give the user lots of opportunities to explore the data and answer questions that these “dashboards are dead” vendors conveniently overlook. Imagine you saw this visual and had no other context:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead12.png" /></p>
<p>What would be the first question you asked? Would it be “What caused that increase in revenue for July, August, and September? Well, Power BI (and most other modern dashboarding tools) can do this:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead13.png" /></p>
<p>Power BI tells me that the 207% increase in revenue for July to August is due to changes in 3 products we sell.</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead14.png" /></p>
<p>That’s really helpful.</p>
<h2 id="other-dashboarding-features-these--the-dashboard-is-dead-vendors-conveniently-omit">Other dashboarding features these <em>the Dashboard is Dead</em> vendors conveniently omit</h2>
<p>There are a lot of other features that modern dashboarding tools have that allow for some basic <em>exploratory data analytics</em>:</p>
<ul>
<li><strong>NLP (Natural Language Processing)</strong>. This allows you to ask questions of your data without understanding SQL. Here’s an example:</li>
</ul>
<p><img align="center" width="500" src="https://davewentzel.com/images/dead15.png" /></p>
<p>In this case I type a natural language question and Power BI presents me with some ideas on what I might want…as well as one possible answer.</p>
<ul>
<li>
<p><strong>Chatbots</strong>. These are meant to guide the user/analyst through a series of questions about the data. This is getting closer to <em>guided analytics</em>.</p>
</li>
<li>
<p><strong>Wizard-like interfaces</strong>. If you are look me, these are sooo helpful. If I drag a data element onto my dashboard Power BI will give me a wizard-like interface that will help me determine what visual might look best.</p>
</li>
</ul>
<h2 id="where-these-vendors-get-it-right">Where these vendors get it right</h2>
<p>These vendors do have a point. In 2021 dashboarding alone will not provide the insights that enable true Prescriptive Analytics. But the dashboard isn’t <em>dead</em>. It’s not even <em>dying</em>. Visualizations are the best way to convey numerical information. What the analytics industry needs to do is understand there are multiple <em>personas</em> that need to do true <em>Exploratory Data Analytics</em> and need something more advanced than a dashboarding tool. Executives have created <em>self-service analytics</em> projects and their ROI isn’t great. We can do better, without throwing the baby out with the bathwater.</p>
<h2 id="how-can-the-mtc-help">How can the MTC help?</h2>
<p><img align="right" width="250" src="https://davewentzel.com/images/mtc1.jpg" />Are you convinced your dashboards are giving your organization the insights it needs? Are you getting the most value from your data assets? Most of our customers at the Microsoft Technology Center believe they can do better. And we can help.</p>
<p>The MTC mandate is to be the Trusted Advisor for our customers. We do that by showing how data can add business value. Sometimes that’s a dashboard, other times that’s understanding how the modern analytics technologies can be leveraged by business analysts to unlock value. Every day MTC data scientists like me work with customers to show them how we think through answering difficult Prescriptive Analytics problems with data. The fact is, anyone with a modicum of SQL knowledge can do amazing things with data.</p>
<p>We provide hackathons and Rapid Prototypes where our customers’ analysts learn how to <em>Think Like a Data Scientist</em> and monetize their data. Our data architects know that data projects are risky with a high fail rate. We structure 2-3 day workshops with business goals YOU define. We want you to walk away thinking “we learned a lot and WE CAN DO THIS”. The tech is actually really easy, where companies struggle is understanding processes and patterns that really work.</p>
<p>If that sounds compelling, contact me on LinkedIn and we can work with your team to show you how to build compelling dashboards and answer difficult Prescriptive Analytics problems.</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-08-01T00:00:00-05:00https://davewentzel.com/content/CLV/Data-Driven Customer Lifetime Value2021-07-21T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comCustomer Lifetime Value is one of those key metrics every business needs to know. But traditionally that's been difficult to do.<p>Business is changing and the customer is the focal point now more than ever. Customers understand they have access to greater choices than they did 20 years ago. If they don’t like some aspect of your offering they will take their business elsewhere. So now businesses are fighting for a greater share of the TAM (Total Addressable Market), trying to grow the customer base. Smart companies also have an eye toward customer retention. In the past, decisions around customer acquisition and retention were done by intuition. Forward-thinking executives are using better data and analytics to guide these decisions. To do that, good metrics are needed and one of those is <em>Customer Lifetime Value</em> (usually abbreviated CLV, or sometimes LTV). In this article I’ll show you how the Microsoft Technology Center (MTC) decision architects help customers with thorny analytics problems like CLV. I’ll show you different approaches and definitions of CLV so you can confidently start a customer-driven analytics project.</p>
<h2 id="the-business-problem">The Business Problem</h2>
<p>Customer Lifetime Value is a simple metric based on math.</p>
<p>Conceptually, CLV has been around for years but most companies didn’t have all of the necessary input data to create a trustworthy CLV number. That’s changing, but I still find most of my customers don’t have all of the data they would <em>like</em> to have (especially around customer behavioral data) to have a more accurate CLV metric. Don’t let that frighten you. It’s ok to publish a CLV number that is a work-in-progress.</p>
<blockquote>
<p>Don’t let the “perfect” be the enemy of the “good enough”.</p>
</blockquote>
<p>CLV works in conjunction with other “customer value” metrics like CAC (customer acquisition cost). So what’s CAC? If you offer introductory promotions or leverage loss leader (described below) to attract new customers, you need to understand how much that will affect profits <em>over time</em> . How much money can you lose, and for how long, based on how much revenue you think the customer will generate over the customer lifecycle? Some examples:</p>
<ul>
<li>introductory low/no fee credit card offers. Finance companies will take an initial loss because they know this is profitable over the lifetime of the customer.</li>
<li>Printer manufacturers sell printers at a loss because they know more profits can be made in ink cartridges over the printer’s lifetime than can be made on the printer itself.</li>
</ul>
<p>CLV can be used to determine what a reasonable CAC should be. An aggressive marketing campaign that substantially raises CAC might not be financially sustainable if the CLV doesn’t also increase. Again, in the past, this balancing act was done by gut instincts. Even with an imperfect model we can now look at CAC and CLV in a more quantifiable manner.</p>
<p>If your product is a “one time purchase” item (think long-lived capital purchases like washing machines) then the CLV may be that one-time purchase. But, if your product is a subscription or is repeatedly purchased (think razor blades) then each subsequent purchase event adds to the revenue over the customer’s lifetime, which means that an additional CAC investment might be warranted.</p>
<blockquote>
<p>The above is a gross over-simplification. Some long-lived product-based companies, like those in the automotive industry, take different approaches to CLV and CAC. Luxury automobile makers value CLV much higher than economy car makers. They understand they are selling an aspirational experience and the satisfied customer will be a repeat buyer, possibly over DECADES. The TAM for luxury autos is smaller than economy cars and they can’t afford the churn. Economy car makers have a higher TAM and aren’t as concerned with CLV and brand loyalty (because there usually isn’t any). Based on how you are treated when you walk into a dealership is indicative of the value they place in repeat business.</p>
</blockquote>
<p>Each of your customers may have a different CLV (and CAC). Sometimes companies try to generate a unique CLV for each and every customer. That can certainly work, but it’s usually better to create <em>customer segments</em> and apply a CLV calculation to each segment. Each segment may have different loyalties and profit profiles. Customer segmentation is beyond the scope of this article but let’s just say that something called “unsupervised machine learning” has been proven invaluable in creating very accurate segments given a series of data inputs.</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/clv9.jpg" /></p>
<p>Retaining customers has a cost too. Customer churn occurs when customers take their business elsewhere. The naïve reaction is to retain all existing customers as long as the retention costs are less than the CLV. Well, maybe. In the past a low churn rate was the goal. Today, that is changing and businesses are considering intangibles like on-going support for deprecated products and the expectation of on-going discounts in their decisions to “fire their customer”. Likewise, some customer segments will always be “one-time buyers”. If these customers generate razor-thin margins, wouldn’t it be better to allow them to churn if we can replace them with customers with higher CLVs?</p>
<h2 id="the-clv-calculation">The CLV Calculation</h2>
<p>Here’s the thing, there is no standard, generally-recognized calculation for CLV. It will be different for every industry and every business, based on the weightings that are important to them. Similarly, the calculation can change over time as you learn new things about your business model and customer mix.</p>
<blockquote>
<p>To be fully transparent, most CRMs will provide you with “their” CLV calculation. Be skeptical. Their formula is usually opaque. They market it as “value-added AI” as an up-sell. CLV shouldn’t be a black box AI model. You need visibility to tweak it. Next, a good CLV will take a lot of (if available) behavioral data into consideration. Your CRM won’t have access to that data. Remember that the more data YOU can gather about your customers will aid in not just a better CLV metric, but a better <em>understanding</em> of your customers.</p>
</blockquote>
<p>It’s always best to start out with a simple CLV calculation using the “historical sales” approach (using our existing data as our guide). No need to boil the ocean. It starts with revenue estimation. Here’s the process:</p>
<ul>
<li>Step 1: What is the average revenue per purchase? This is fairly easy to do using your existing sales data. This can get tricky, quickly. How much historical data should you look at? If you have purchase data for that last 30 years, should you really be using it? Are the sales patterns from even 5 years ago relevant today?</li>
<li>Step 2: What is the frequency of purchases? Every industry and company is different, but think about the <em>purchase cycle</em> of your customer. Let’s say you sell printers and ink. A good purchase cycle <em>might</em> be 3 years, assuming a new printer is purchased every 3 years. Within that 3 years a customer may make 3-5 ink purchases per year. If you are a convenience store the purchase cycle could be one week with purchase frequency of 3-5 days/week (if your customer is a daily commuter that needs a morning coffee fix). If you follow a subscription-based business model this is much easier to calculate since the recurring revenues are easier to predict. (This is why every business is trying to determine how to stop selling capital goods and move to a recurring revenue model).</li>
<li>Step 3: Avg Revenue per Purchase x Purchase Frequency: this gives you the revenue over the purchase cycle period.</li>
</ul>
<p>All of the above numbers we can gather from our existing sales databases and apply some simple math. Nothing controversial.</p>
<p>Now it starts to get a little qualitative.</p>
<ul>
<li>Step 4: Calculate “Customer Lifetime”. How long does your average customer continue to do business with you? If the bulk of your customers are “one-time purchasers” then the answer is simple. If your customer stays with you for 10 years, on average, then that is much different. The “lifetime” can become contentious but with some creative thinking we can generally agree on a good lifetime metric. The lifetime metric can change over time with changes in your business model and industry. Think of the cable TV industry. 20 years ago no one was predicting “cord cutting” and the industry bet heavily in “bundling” your TV with your phone and internet. Today, phone service is something many households are foregoing. A landline customer’s lifetime was historically calculated in DECADES…today it’s substantially shorter…and getting shorter every day.</li>
<li>Step 5: Purchase Cycle Revenue X Lifetime: this gives you the basic CLV</li>
</ul>
<p><img align="center" width="500" src="https://davewentzel.com/images/clv0.png" /></p>
<p>Let’s look at an example: Let’s say you run a convenience store and we want to determine the CLV for our morning commuters. Feel free to adjust my assumptions:<br />
<img align="center" width="500" src="https://davewentzel.com/images/clv2.png" /></p>
<p>That’s the basics.</p>
<p>Now we need to get more advanced in our calculations.</p>
<p>First, <strong>segment your customers</strong>. Instead of averaging all of your customer’s together to determine revenue per purchase, try segmenting your customers. I’ll cover customer segmentation in another article but let’s just say that segmenting your customers is often a qualitative endeavor. Each “cohort” is designed based on what you think is important. Does gender affect purchasing decisions? If so, that would be a cohort. Or you could segment by geography, avg revenue size, transaction date…basically, whatever you think makes up a good cohort. This is where we can begin to really customize CLV for our purposes.</p>
<p>Second, <strong>determine the profit margin</strong>. Should you be basing CLV on <strong>revenue</strong> (sales income BEFORE expenses…aka the top line) or profit (income after expenses…aka the bottom line)? Some companies want to see CLV based on revenue, some on profits. There are pros and cons to each. Revenue-based CLV isn’t affected by changes in costs. If you decide you want a profit-based CLV then just determine your profit margin and multiply the existing CLV by it.</p>
<blockquote>
<p>In theory, without factoring in the profit margin we have really just calculated CLR (customer lifetime <em>revenue</em>). The “value” in “CLV” really should be factoring in costs at this point, but everyone has a different opinion on this.</p>
</blockquote>
<p>How do you calculate the profit margin?</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/clv3.png" /></p>
<p>Remember, the profit margin may be different for each product you offer…you should start to see how the CLV calculation is getting more complex.</p>
<p>But, to keep it simple, here is our updated CLV formula:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/clv1.png" /></p>
<p>Next, <strong>churn rate</strong> is a factor. If 20% of your customers “churn” during a given purchase cycle, well…we need to know that. You might be able to determine the churn rate by looking at your existing sales data if it is available for longer time periods.</p>
<p>The new formula:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/clv4.png" /></p>
<p>That’s the basic CLV calculation using historical data. There are a few things we can do to make the calculation a bit more advanced:</p>
<ul>
<li><strong>determine the discount rate.</strong> This is wonky but in economics there is a time preference of money…money “in pocket” today is worth more than that same amount of money in the future. The future has unknowns (including inflation) and tomorrow’s revenue streams cannot be guaranteed. The discount rate is a simple percentage that has a much larger impact on CLV if the lifetime is measured in decades rather than years. In finance terms, the discount rate gives use the NPV of future cash flows.</li>
<li><strong>use probabilistic modeling</strong>. Using probabilities we can apply better estimates to each transaction’s monetary value and transaction count. We can build RFM models to help (again, too much for this article). This is fairly complex and something you can research on your own…just note that a good data scientist will know how to do this and there are python packages that can help to simplify the math. In data science lingo we are further “featurizing” the data…meaning we are transforming the data from raw numbers to something that is more valuable.</li>
</ul>
<h2 id="machine-learning-approaches">Machine Learning Approaches</h2>
<p>A good data scientist can look at all of the input data mentioned above and build a model algorithm that determines CLV by fitting regression curves to the historical data.</p>
<p>But, we can really do that ourselves in Excel without ML. Where ML shines is we can add additional data points which would be difficult to model in a straight equation, allowing the ML algorithm to determine the weightings for each of these inputs based on historical data. Examples:</p>
<ul>
<li>behavioral data: Companies use sentiment analysis against social media to determine if a customer is a churn risk. What else can you think of that can be used to model your customer’s behaviors?</li>
<li>external data: Any dataset that helps you understand your customers better can be used as a feature in an ML model. Credit scores, credit card transaction histories, product return histories, call center metrics…use your imagination</li>
<li>browsing history/cookies: Yep, we can determine which customers “look like” our most profitable customer segments and use ML to determine <em>who</em> we should focus on.</li>
<li>inventory value. In every industry, inventory carry has a cost. The less inventory we need to warehouse, the better. Inventory fluctuates over time and that will affect cost, which will affect profit, which will impact CLV.</li>
<li>anything else that affects the quality of the input variables. What else can you think of that would be important to you?</li>
</ul>
<h2 id="why-is-clv-so-important">Why is CLV so important?</h2>
<p>Here are some reasons why this calculation is so important:</p>
<ul>
<li>We need to focus our customer retention and acquisition efforts on our most profitable customers. CLV on customer segments is how we do this quantitatively vs instinctively.</li>
<li>Businesses usually assume that customer loyalty and retention are important. After calculating CLV you might find you are incentivizing your worst customers to be loyal when you should be focusing your energy on attracting new customers.</li>
<li>Who do we market to and how can we structure those campaigns? Just because we’ve used CLV to determine our most profitable customers doesn’t mean it is cost-effective to market to them.</li>
<li>CLV can help determine marketing campaign ROI. If we know the marketing costs over the purchase lifecycle and we can determine the number of new customers acquired over that same time period, we should be able to determine the economic success of the marketing campaign. Here’s a formula:</li>
</ul>
<p><img align="center" width="500" src="https://davewentzel.com/images/clv5.png" /></p>
<p>It’s easier to describe with an example:</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/clv6.png" /></p>
<p>In this example the ROI is $11.50. Said differently, <strong>for every $1 spent on this campaign we generated $11.50 in profits over the lifetime of this customer.</strong></p>
<ul>
<li>We can use similar math to determine if incentives, one-time-offers, introductory rates, and other tricks will actually have a positive ROI</li>
<li>CLV can help justify higher CAC</li>
<li>CLV can guide product development efforts. Which existing products lead to higher CLV and how can we leverage that to create more compelling offerings?
We can make sales team compensation decisions based on CLV. I’ll let you use your imagination on this one.</li>
<li>Should we leverage “loss leader” strategies? Loss leading is a strategy where a business sells at a loss to attract new customers in new market segments.</li>
<li>Improve customer loyalty. You can use CLV to empower your customer service reps to offer refunds and credits for high-value customers. Ultimately, this drives loyalty.</li>
</ul>
<h2 id="how-can-the-mtc-help">How can the MTC help?</h2>
<p>Everyone talks about “Digital Transformation”. But what is that exactly? At the Microsoft Technology Center (MTC) we believe Digital Transformation is about monetizing your data…and that simply means we are using data in new ways to make money. Customer Lifetime Value is nothing new, but historically it’s been difficult to quantify that number. The data wasn’t always available, or it wasn’t available in a timely manner. That’s no longer the case in 2021. We can do amazing things with data and compress time-to-value. What took weeks or months to do in the past can now be done in hours if we utilize tech smartly.</p>
<p>Are you convinced you are leveraging CLV to drive your revenue goals? Can you measure your CAC and marketing campaign ROI? We can help you.</p>
<p>The MTC mandate is to be the Trusted Advisor for our customers. We do that by showing how data can add business value. The tech is easy, what’s hard is understanding the processes that work. We can help your people build CLV prototypes in just a few days. We’ll work with you to think through the inputs of a CLV calculation that will add value to your business in your industry, and then help you build it. In 2021 we should be taking more “small bets” and calculated risks with our data projects versus creating months and years-long capital projects with high fail rates. Building CLV models, quickly, by leveraging the Azure Cloud, is perfect to determine if <em>it really is different this time.</em></p>
<p>Does that sound like a good investment of your time? Contact me or your account team today.</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-07-21T00:00:00-05:00https://davewentzel.com/content/OutsourceAnalytics/Do This Before You Outsource Your Next Analytics Project2021-07-20T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comAre you convinced that your outsourced analytics project will succeed? Every day at the Microsoft Technology Center I talk to customers that are underwhelmed with their consultant partners. I wrote an article where I show you THE ONE THING that you should do to guarantee success in your next outsourced analytics project. To my consultancy friends: read on to see what you should be doing to bring your practices into the Prescriptive Analytics Age.<p>Were you satisfied with your last outsourced data and analytics project? Did it provide the value you were hoping? When I talk to executives the answer is a resounding NO. But when I probe and ask them why, I don’t get definitive answers. They can’t put their finger on exactly why they are underwhelmed. Well, I know the big reason and I’m gonna share it with you now.</p>
<p>NEVER sign a fixed scope/fixed fee data analytics project SOW. Always demand a time and materials engagement contract, sometimes known as a “<a href="https://en.wikipedia.org/wiki/Staff_augmentation">staff aug</a>” contract.</p>
<p>Why?</p>
<p>Well, let’s walk through a typical project lifecycle.</p>
<p>You have a thorny analytics project…let’s use a canonical example…You are an IT director and your CMO comes to you and wants to know if your company should invest in an Instagram marketing campaign given that you already spend $10 million annually on existing Facebook marketing campaigns. You know that you are going to need a lot of supporting data that isn’t in your analytics sandbox already. But you aren’t really a marketing guru and your staff is swamped. You decide the best approach is to outsource this project. You contact a few different data consultancies and arrange for Discovery calls where you describe the problem in as much detail as you could get from your CMO.</p>
<p>What happens next? Each consultancy writes a brief proposal outlining what they heard to be the crux of the project. Each will propose a different cost for the project. Some will say 3 months, some will say 6 months…some will say $250K, others will say $500K. Each will definitely propose a bunch of dashboards that they’ll need to create, each backed by a data warehouse with fact and dimension tables.</p>
<p>Your gut tells you this project should be about a month but you received no proposals for that short of a duration. You also don’t think you need <em>another</em> data warehouse or another bunch of dashboards. You just need the data, the calculation, and the answer. What can you do though? You sign with the cheapest consultancy with the shortest duration. You try to haggle the duration (you aren’t really too concerned with the price…but you need to get the answer quickly).</p>
<blockquote>
<p>Remember: Price is only one aspect of cost.</p>
</blockquote>
<p>The project begins and the consultants are wonderful!! They seem to be asking all of the right questions and your marketing analysts are thrilled with the progress being made. Around week 3 or 4 things start to change. The consultants are diligently working by themselves and you are receiving daily status updates at your standups…but the collaboration seems to have tailed off. Fewer questions are being answered and you are seeing tangible work product updates less frequently. You think that by now there would be MORE collaboration, not less. You raise your concerns to the engagement lead who assures you they are diligently building ETL pipelines, star schemas, and dashboards. By the last few weeks of the project the consultants are making all kinds of excuses as to why they haven’t provided an answer to the original question (“should we spend money on an Instagram campaign”). The trend seems to be they are spending time on dashboards and “facts and dims” but you still do not understand what value those items will bring. At the end of the project it seems like the marketing team is underwhelmed at the results given. You feel the project was a success (after all, the terms of the SOW were met), you did get <em>an</em> answer, but something tells you this could’ve been done differently.</p>
<h2 id="heres-how-i-would-do-this-project">Here’s how I would do this project</h2>
<p>On the Discovery call I would listen intently and ask questions. By the end I would realize that this is not a typical “descriptive” analytics project, this is a “prescriptive” analytics project. I would tell you this and explain to you how success prescriptive analytics projects work.</p>
<blockquote>
<p>In the past most analytics projects were “descriptive”. This means “rearview mirror”, historical reporting. These are the types of projects that benefit most from dashboards and data warehouses. Today, most analytics projects are “predictive” meaning we are leveraging historical data and ML to forecast events in the future. But in reality, most business leaders are looking for “prescriptive analytics”. Think of this as questions like “what do I do next”. These answers are often qualitative in nature but we use data to augment our gut instincts. An example: “Should I create an Instagram campaign even though I have a Facebook campaign?” Said differently, “what do I do next?” In 2021 this is what business leaders really want: using data to provide insights and guide strategy.</p>
</blockquote>
<p><img align="center" width="500" src="https://davewentzel.com/images/analytics71.png" /></p>
<p>I can’t provide you with a fixed scope, fixed fee project. I mean, I could, but it would be a complete guess. And, I would need to pad my estimates to ensure I can achieve the scope and get paid. And trust me, the scope will be written such that I’ll always be able to deliver and get paid. But there are so many unknowns right now (for instance, your marketing team can’t tell me exactly what the formula should be) that I can’t give a thoughtful estimate. It feels like I should be able to do this in 2 weeks but I can’t possibly estimate for such a short project and still make a profit. This is why the minimum estimate you received from the other bidders is 3 months.</p>
<blockquote>
<p>“Analytics work expands so as to fill the time available for its completion”. This is my version of Parkinson’s Law. This means that if you give me 3 months to do the project it will take 3 months. If you only give me 2 weeks I’ll likely still get you the same business outcome.</p>
</blockquote>
<p>I can tell you through experience that if I provide you with any estimate, whether it’s a week or a year, my team will end up spending a lot of time doing tasks that truly aren’t adding value or getting you closer to answering the question. We’ll spend a lot of time creating CI/CD processes, building ETL automation, and creating unit tests…we’ll only dedicate the final 5% of time to analyzing the data. Invariably my team would say, “if we only had more time, we ran out of budget.” I hated doing that, which is why I won’t do it anymore.</p>
<blockquote>
<p>Hofstadter’s law: “It always takes longer than you expect, even when you take into account Hofstadter’s Law.”</p>
</blockquote>
<p>But ETL, star schemas, unit tests, and automation…none of those things get you closer to an answer to your simple question. In fact, for this project, I’m not sure what data I would put into a data warehouse or what the dashboard would even look like. Maybe something like this?</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/analytics72.jpg" /></p>
<p>You want a simple YES/NO answer…but really you want to know what the ROI would be for an Instagram campaign given existing campaigns. ROI is a simple calculation. Do you really need the numerics in a fact table? Do you even need dashboards? Probably not. What you really need is a simple visualization where we can turn the knobs on the input variables and see the effect on ROI. I can do this with python and Jupyter notebooks in a couple of minutes.</p>
<p>You ask me: “I’ve heard nothing like this from our other proposals. I’m interested in hiring you. So, I know you won’t give me an estimate…but I really REALLY need one.”</p>
<p>I can’t give you that. There are so many input variables that we _ could_ use. I need to sit with your marketing analysts and maybe even your CMO, and together we need to look at those input variables until we collectively come up with the right set of inputs that everyone can agree on. In my past projects this can sometimes take awhile and get contentious. Everyone will want to add their opinions and inputs into the formulas. We won’t just talk about these formulas, we will actually look at the formulas using real data. We’ll tune the knobs together. We’ll have sessions where we learn about the data together.</p>
<p>Sometimes leaders will want to ingest more and more data, trying to add lift and get a better outcome…but that isn’t always needed. More data doesn’t always equal a better answer. So I’ll help your marketers understand where the cutoff is.</p>
<blockquote>
<p>“Don’t let the perfect be the enemy of the good enough.” Getting an answer faster <em>might</em> be better than continuing to add more features.</p>
</blockquote>
<p>Your marketing team also says they want a simple YES or NO answer. That’s likely not want they want. They’ll likely want a range of numbers where the Instagram campaign is showing positive ROI. But again, this is all very subjective. How will your marketing team know they didn’t miss an important variable in the formula? They will likely want to try a small A/B test to see if Instagram campaigns are bringing in new customers and the affect on CAC (customer acquisition costs). I can help them with DoE (Design of Experiments) and measurements. In my experience they’ll likely learn that the answer is gray. They may find that, <em>in general</em>, an Instagram campaign doesn’t provide a positive ROI given existing campaigns, but they may determine there is a segment of customers where it WILL provide lift. We will learn that together.</p>
<p><img align="right" width="250" src="https://davewentzel.com/images/analytics75.jpg" />But, the real issue is we will NEVER know the formulas or data inputs upfront. This can’t be learned by a static requirements document. We will only learn these as we experiment with the data and talk through ideas together, while turning the knobs on the formulas. We need to do some deductive reasoning and hypothesis-building together. These working sessions may be multiple hours for multiple days. I won’t monopolize the marketing team’s schedule, but they MUST realize that if they want to see success that they must take a vested interest in this project. This isn’t a typical IT project. Once I have enough information from your team then I will work independently until I’m ready to ask more questions or present my rapid prototype.</p>
<blockquote>
<p>What I just described is truly what <em>data science</em> really is. This is how successful data science, data, and analytics projects are structured. These projects are NOT structured like typical IT projects and “data science” is NOT always about building predictive ML models.</p>
</blockquote>
<p>You: “Wow, this is different from how we run IT projects. We always start with well documented requirements documents. Without those, how do you get started?”</p>
<p><img align="left" width="400" src="https://davewentzel.com/images/dt.jpg" />What we’ll do is start the project with a 1 or 2 day <a href="https://davewentzel.com/content/DesignThinking/">Design Thinking session</a>. These are wildly fun and are geared that way to ensure attendees recognize this is a “safe space” and they can express their ideas without judgment. We definitely need the marketing team but I suggest you send some IT folks so they can listen in and help us understand our data needs. I structure these sessions so that we all agree on the problem as well as various ways to solve the problem. By the end we sometimes even build a rapid prototype so attendees can visualize what we are doing. For a project like this what we are really doing is getting an initial estimate of the data we want to use, whether we have that data already in our analytics sandbox (data lake), and what everyone initially <em>thinks</em> the ROI formula should be.</p>
<p>You ask: “So, you can give me an estimate after that Design Thinking session?”</p>
<p>Maybe. We’ll definitely know a lot more about the project and any estimates I would give would be at least a little more accurate and thoughtful. But there will still be a lot of variability.</p>
<p>Here’s what I recommend: I’ll give you a thoughtful estimate after that DT session and if you decide to proceed I’ll begin work. I’ll allow you to back out of the contract if you think my approach won’t work. Then, at the end of every 2 weeks we’ll have a retrospective. I’ll show you and marketing team what I’ve done and what I think still needs to be done…and, here’s the important part…I’ll let you know if I want to continue working with you. And, please, you tell me if you want me to continue working for you. I may determine that I don’t like working for you and your team, I don’t like your expectations, or maybe I don’t think I can ever provide value to you and answer your business question. Maybe we can’t get the data we need. I became a consultant because I don’t want to punch a clock anymore. Conversely, you may decide I’m a horrible data scientist or you may think I’m milking this cash cow engagement. Honestly, the biggest reason my clients end their relationship with me is I solved their business problem. The second biggest reason is we both agreed that maybe I can’t solve this business problem. If so, you can end the relationship. This incents me to show value quickly. This is why I structure these engagements as “open-ended staff aug” and not fixed fee/fixed scope.</p>
<p>You: “That seems like a lot of risk for you.”</p>
<p>It is. At any point you could end the engagement and now I have to find another client. So, I’m going to charge you a 25% premium over what those fixed fee, fixed scope consultancies will charge you. Listen, I’m not concerned about finding more clients or having my utilization rate above 80%. I know I’m a good analyst and my customers are always happy. I just want to consult for companies I admire, working with people I respect, doing interesting things with data that I love. Good work will always find me, regardless of economic conditions. Does that sound like someone you can trust?</p>
<blockquote>
<p>Dave’s 4th Law of Consulting: To be able to say YES to yourself and your consulting business, you have to be able to say NO to any of your clients.</p>
</blockquote>
<p>There’s another reason consultancies avoid “staff aug” engagements. <strong>Their employees hate them.</strong> A really good data scientist working on a long term contract has concerns about her future. Will she be pigeon-holed into working for the same company for years just because it is a profitable, lucrative client? Will she be stuck working for a manager she despises? Will her skills atrophy? Furthermore, isn’t <em>staff aug</em> for menial IT tasks, not for important data science work? We need our consultants to understand that at any point they can say NO to continuing a contract, regardless of how profitable and lucrative it is. We want our consultants to know they are in control. We value their skills and time and we understand their concerns.</p>
<p>I’m going to be honest. I left a lucrative role as the Chief Technology Officer for a consultancy because I was pigeon-holed working part-time on an engagement I hated. It was an extremely profitable engagement and I was offered bonuses and additional perks. I still quit.</p>
<p>You: “I understand, when can you start?”</p>
<blockquote>
<p>Dave’s 5th Law of Consulting: How can you tell an experienced consultant from a new consultant? The new consultant says, “I need more clients”. The experienced consultant says, “I need more time.”</p>
</blockquote>
<h2 id="lessons-learned">Lessons Learned</h2>
<p>The above allegory is how all prescriptive analytics projects should be run, regardless of whether you outsource. These same Lean Principles can be used to run your internal analytics projects too. Prescriptive analytics is using potentially lots of data sources to drive strategic decision-making while answering broad questions like “what do we do next?”. The answers are not always quantifiably provable. There is often a bunch of qualitative features that need to be taken into consideration. Data scientists have tricks to turn these subjective attributes into something measurable that can be plugged into a formula. Prescriptive analytics is much harder than predictive analytics (building ML algorithms to forecast the future). Prescriptive analytics is where data scientists shine. We have the know-how to ask the right questions, perform deductive reasoning, design experiments, and present data in a compelling, “guided analytics” and storytelling approach such that executives can make data-driven decisions.</p>
<p><img align="right" width="300" src="https://davewentzel.com/images/analytics74.png" />We aren’t building an e-commerce website or building clearly defined tabular reports (descriptive analytics). These are examples of IT projects that have a fixed scope with little risk and known outcomes. These projects can be run as standard scrum/agile/Kanban projects. Prescriptive analytics (and data science) projects don’t always have a known end-state. We have to run a lot of experiments to determine what to do next. <strong>These are risky projects and need to be run to control risk</strong>. This means they need to follow a more fail-fast, iterative, Lean paradigm.</p>
<p>Consultancies don’t like to run projects this way. It’s difficult to forecast headcount and revenue. Instead they will focus on fixed scope that has a lot of fluff and weasel words so they can always claim success at the end of the engagement. And you may be happy, but you are usually underwhelmed with the output. Instead, find a consultancy that will run a true iterative, Lean analytics project. Then pay them a little more for the risk they are taking. Design short “experimentation” sprints and listen carefully at each retrospective. Evaluate what they are saying:</p>
<ul>
<li>Do they appear to be on the right track to solving the problem?</li>
<li>Do they appear to be spinning their wheels?</li>
<li>Are they focused on things that are not adding to the value-stream such as building data warehouses and dashboards?</li>
<li>Do you even need data warehouses and dashboards?</li>
<li>Are they building DevOps processes when this is really a one-off “what do I do next” analytics project with potentially throw-away data?</li>
</ul>
<p>If you are going to outsource consider outsourcing the easy stuff that is easy to scope.</p>
<ul>
<li>Data acquisition/ingestion. This is sometimes called “ELT”. If you know the data sources you’ll need this can be easy to outsource.</li>
<li>Dashboards. Once you have the answers, if you really do want visualizations against the data then hire someone skilled to do this.</li>
<li>Data warehouse/ETL development: Again, once you’ve answered your business problem and have the raw data, ask yourself whether you need that data in a formal data warehouse. If you do, then outsource the creation of those facts and dims to someone else.</li>
</ul>
<blockquote>
<p>Never outsource your core competency. Only you know how to build the formulas that will answer the prescriptive analytics questions you have, like “should I invest in another marketing campaign?” You can’t outsource that knowledge to a data consultancy.</p>
</blockquote>
<blockquote>
<p>Know-How pays much less than Know-When, Know-What, or Know-Why</p>
</blockquote>
<h2 id="how-we-can-help-you-at-the-mtc">How we can help YOU at the MTC</h2>
<p><img align="right" width="250" src="https://davewentzel.com/images/mtc1.jpg" />I no longer do independent consulting as a data scientist. I work for the Microsoft Technology Center as a Decision Architect, consulting with our customers solving difficult problems with data in short rapid prototype sprints (usually 3 days). I hate to admit this but a common statement I hear from my customers is: “I wish we would’ve talked to you BEFORE we outsourced this analytics project. We are learning so much and we can see that our current vendor partner is not focused on the right things.”</p>
<p>Listen, data projects are risky, and difficult. Prescriptive analytics (using data to answer the question: <em>what do we do next?</em>) is nothing new, but savvy business people are demanding this more than traditional dashboards and reports. Not every data professional is ready to build solutions where the scope and outcome are not clearly defined up-front. Analytics in 2021 requires thinking differently, asking questions, experimentation, and fail-fast mentalities.</p>
<p>The MTC is staffed with data professionals that understand this. We are viewed as Trusted Advisors by our customers. We are usually booked out for 45-60 days because we are in demand. We are also honest, we will tell you if (or when) a problem doesn’t seem solvable. It happens. We can also work with your data consultants if they need assistance with certain aspects of the project…especially Design Thinking. We’ve helped to rescue at-risk and failing analytics projects simply by asking different questions. We are Thought Leaders and present at industry conferences. We live and breathe this stuff.</p>
<p>Does that sound like someone your team can trust?</p>
<p>Contact me on LinkedIn, or your Microsoft account team, and we can start our journey together.</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-07-20T00:00:00-05:00https://davewentzel.com/content/AltData3/altdata Ideas You Can Leverage Today2021-06-23T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comHere are some actionable ideas to get you thinking about how to leverage altdata<blockquote>
<p>This is Part 3 of a series of articles on leveraging alternative datasets to provide lift. <a href="/content/AltData/">Part 1</a> is an overview of altdata and <a href="/content/AltData2/">Part 2</a> is how data sharing is displacing ETL processes and shrinking time-to-value in data projects.</p>
</blockquote>
<p>Data-driven companies are leveraging alternative datasets (I call it <code class="highlighter-rouge">altdata</code>) to provide “lift” to their analytics processes. In this article I’ll give you some altdata ideas that you may want to integrate into your value-stream. Every day at the Microsoft Technology Center (MTC) we help our customers ingest altdata-sets and determine quickly if they provide value. If you are a Microsoft customer we can help you too, contact me on LinkedIn. These ideas are in no particular order.</p>
<h2 id="altdata-ideas-for-any-industry">altdata Ideas for Any Industry</h2>
<ul>
<li>social media sentiment analysis
<ul>
<li>most social media outlets allow you to programmatically query real-time feeds. We can do simple things like sentiment analysis against a hashtag or we can look at general trends that may be of interest</li>
</ul>
</li>
<li>scrapes from websites
<ul>
<li>websites contain lots of interesting information in an unstructured format. We can scrape that data and do Natural Language Processing (NLP) against it. For instance, we can look for terms or key phrases.</li>
<li>another common use case is doing a web search for an entity or person and then performing sentiment analysis against the first page of search results. This can tell us a lot of interesting things about the entity/person. Is the entity in the news? If so, for what?</li>
</ul>
</li>
<li>satellite data
<ul>
<li>this data can give you a lot of valuable information about your customer. Let’s say you sell pool supplies in your local community. You could do a mass mailing to the region, or you could target specific houses where you could identify pools from the satellite imagery. I actually did this work for a customer once. From a “ten thousand foot level” I could quickly determine what localities to target:</li>
</ul>
</li>
</ul>
<p><img align="center" width="500" src="https://davewentzel.com/images/alt31.png" /></p>
<p>If I zoom in to the street level you can see that I used our <a href="https://azure.microsoft.com/en-us/services/cognitive-services/">Azure Cognitive Services</a> and AI to identify the pools and correlate them with <a href="https://www.bing.com/maps">Bing Maps</a> to get their addresses. With the addresses I could quickly do a lookup to my sales database to determine if the address was already a customer (the yellow square) or not a customer (the red square). Note: it’s not perfect, it clearly missed a pool…but this is something I created in just a few days.</p>
<p><img align="center" width="500" src="https://davewentzel.com/images/alt32.png" /></p>
<ul>
<li>hobbyist drone data (or any other image data)
<ul>
<li>similar to the satellite data, there are lots of use cases where a birds-eye view of a space can provide you with value</li>
</ul>
</li>
</ul>
<blockquote>
<p>There are lots of stories about hedge funds using drone data to determine <a href="https://www.cnn.com/2019/07/10/investing/hedge-fund-drones-alternative-data/index.html">retail store traffic</a> or <a href="https://www.impactlab.com/2017/10/18/how-satellites-drones-and-planes-are-making-hedge-funds-money/">the number of container ships in Chinese ports</a>. This can provide broad strokes on the health of a region or the economy in general, or we can laser focus to a particular address.</p>
</blockquote>
<ul>
<li>images
<ul>
<li>image data can come from anywhere: web searches, smart phones.</li>
<li>many times the images have embedded metadata in <a href="https://developer.here.com/blog/getting-started-with-geocoding-exif-image-metadata-in-python3">Exif format</a> (a 20 year old standard) that will tell you interesting things about the image: the geotag, any comments, the make/model of the phone. All of this information <em>might</em> be valuable to your use case.</li>
</ul>
</li>
<li>weather data
<ul>
<li>There are so many ways weather may provide lift. A quick internet search will give you lots of ideas.</li>
</ul>
</li>
<li>real estate data. What could you do with real estate data? There are so many use cases.
<ul>
<li>The <a href="http://www.mls.com/">MLS</a> (Multiple Listing System) has datasets available for residential real estate. They hold a monopoly on their data and charge accordingly. But if your company sold backyard pool supplies, wouldn’t it be interesting to find all of the local homes for sale with in-ground pools?</li>
<li>There is no equivalent MLS for commercial real estate. This would be an excellent business model for a startup to capitalize on. In the interim, you could create proxies for commercial real estate activity by looking for other altdata-sets.</li>
<li>home-sharing firms like airbnb and vrbo have datasets available for purchase. This may help determine regional trends and economic activity.</li>
</ul>
</li>
</ul>
<h2 id="the-app-ecosystem">The App Ecosystem</h2>
<p>Think of all the apps you run on your smartphone. On the surface it would seem the business model for many of them is ad-driven. They monetize the ad impressions and are carefully targeting the CTR (clickthrough rate). But that’s only part of the story. Many are selling altdata to businesses to semantically enrich their existing data.</p>
<blockquote>
<p>Thought experiment: out of the apps you use, which ones might be tracking users in a way that would provide lift to your business?</p>
</blockquote>
<p>Here’s one example: Uber. With the user’s permission, Uber (likely Lyft too, but I’m not sure) can sell location data to food and retail industry players. Other companies can leverage this data to provide discounts and promotions personalized to the specific customer.</p>
<h2 id="companies-that-have-already-monetized-their-data">Companies that have already monetized their data</h2>
<blockquote>
<p>What is Digital Transformation? My definition is simple. A digitally-transformed company has learned how to monetize its data. That could mean leveraging its data to control costs are increase revenue, but at the extreme it means the company is selling its value data assets to others.</p>
</blockquote>
<p>Thought experiment: If you had access to any one single company’s data assets, which one would it be and why? Now, go research if that company has monetized its data.</p>
<p>Here are some companies that have:</p>
<ul>
<li>SmartTV manufacturers have been capturing the IoT-style data from each TV. Every time you change the channel, turn the device ON/OFF, change the volume, etc, the manufacturer knows it. <strong>These manufacturers make more money selling the data YOU generate for them than they make selling the actual hardware.</strong> This is a major reason why prices are falling precipitously. The manufacturers NEED you to upgrade to even smarter units with more built-in apps. <strong>Did you know that apps like Roku, Netflix, and Amazon Prime pay the manufacturers to have their apps installed in the factory?</strong> The data is so valuable. How can you leverage this altdata?</li>
<li><a href="http://techcrunch.com/2012/08/18/payment-data-is-more-valuable-than-payment-fees/">Payment card processors</a> make more money monetizing their datasets than they do on the transaction fees. I would think this would bring transaction fees down in the US, which have the highest fees in the world, but it hasn’t.</li>
<li><a href="https://www.forbes.com/sites/douglaslaney/2020/07/22/your-companys-data-may-be-worth-more-than-your-company/?sh=45025a7a634c">American Airlines’ data</a> is now valued higher than the airline itself! This astounds me. With all of their capital assets (which are tangible assets on their balance sheet) the intangible assets are worth more.</li>
</ul>
<blockquote>
<p>There is a general trend in the worldwide economy where intangible assets are a higher percentage of the balance sheet than ever. The biggest factor is likely monetized data, but someone should do some research to confirm this. What is really interesting is that tangible assets depreciate over time. Intangible assets, like data, don’t. How can you leverage this asset class?</p>
</blockquote>
<h2 id="financial-services-use-cases">Financial Services Use Cases</h2>
<p>The Financial Services industry loves seeking “alpha” (the industry’s equivalent term for “lift”) in altdata sources. Some interesting altadata ideas for the finance industry:</p>
<p><img align="right" width="300" src="https://davewentzel.com/images/altdata31.jpg" /></p>
<ul>
<li>financial reports and SEC filings
<ul>
<li>these documents are available freely from many government websites, for free. The easiest to use is <a href="https://www.sec.gov/edgar/searchedgar/companysearch.html">EDGAR</a>. You can find various filings in pdf format. The data can be scraped and added to a data lake where we can do interesting analytics like <code class="highlighter-rouge">Named Entity Recognition</code> or simple sentiment analysis. We can also look for specific phrases and terms.</li>
</ul>
</li>
<li>private company data. <a href="https://www.dnb.com">Dun & Bradstreet</a> is the de facto standard on private company data and commercial credit.</li>
<li>carbon footprint measurement. ESG investing is hot right now and every company us trying to change its perception that it is environmentally-friendly. Even BP changed their log to appear more “green”. How can we measure the carbon footprint of a company given the available altdata in the marketplace?</li>
</ul>
<p><img align="center" width="500" src="https://davewentzel.com/images/alt33.png" /></p>
<ul>
<li>LinkedIn provides lots of datasets. The investment industry leverages this data for simple use cases like monitoring employee counts and openings. How could you leverage LinkedIn data to semantically-enrich your data?</li>
</ul>
<h2 id="risk-analytics">Risk Analytics</h2>
<p>Customers always ask me for interesting ideas for altdata in their industry. Fact is, I don’t know your industry as well as you do. Any ideas I may have, you’ve probably researched. My response is to think of use cases that your competition might not also be researching. A big area to focus on is Risk Management. Finding altdata that can mitigate risk should provide lift. Every industry has different risk management profiles, but let’s look at an example to get you thinking creatively.</p>
<p><a href="https://www.cmtelematics.com/">Cambridge Mobile Telematics</a> recently acquired TrueMotion. Both companies provide vehicle telematics data to auto insurers to reduce risk. Well, why couldn’t you leverage similar data? Traditional auto insurance risk rating factors such as age, gender, credit score, zip code data, moving violations, and type of vehicle are less predictive of accident risk than actually looking at driver behavior…via OBDII on-board vehicle telematics. Those traditional risk rating factors are just proxies for likely driver behavior. Younger drivers tend to be more risky, as are middle aged men driving red sports cars. Or, that’s the theory.</p>
<blockquote>
<p>I will NEVER install a telematics device in my vehicle that will send data to my insurer. I can assure you that my risk profile using the traditional rating factors is much, Much, MUCH better than my actual driving behaviors. (I probably shouldn’t admit that).</p>
</blockquote>
<p>CMT will likely create additional datasets to monetize for other industries than just auto insurance. You might be able to glean valuable insights about your customer if you knew their driving habits. How can knowing my customers’ risky behaviors provide me with competitive advantage? The bulk of CMT’s employees are data professionals, I’m sure they are dreaming up new data monetization avenues.</p>
<p>You can acquire telematic driving altdata from lots of vendors.</p>
<ul>
<li>mobile-phone apps ask for your location data (and are likely tracking it)</li>
<li>insurance company-provided dongles</li>
<li>aftermarket blackboxes and in-car video</li>
</ul>
<blockquote>
<p>Thought experiment: Who better to provide auto insurance that the auto manufacturers that have access to all of your vehicle telematics, service history, credit, etc? <a href="https://plants.gm.com/media/us/en/gm/home.detail.html/content/Pages/news/us/en/2020/nov/1118-onstar.html">General Motors has announced</a> they are planning to offer their own auto insurance that they will bundle with OnStar. Brill-yunt! They are monetizing their data. That is Digital Transformation!</p>
</blockquote>
<p>Banking and insurance are highly-regulated industries and tend to be slow-to-change based on necessity. This has allowed innovators from micro-lenders to payment processors to leverage data and invest heavily in digital services. One of the enablers of this trend is better risk management from altdata.</p>
<p>These companies are leveraging altdata like:</p>
<ul>
<li>prescription-drug histories</li>
<li>EHR/EMR records</li>
<li>DMV records</li>
<li>property records</li>
<li>life insurance clearinghouse data from people’s previous applications.</li>
</ul>
<p>Yep, all of this data, in some de-identified fashion, is available to many industries. Surprising, isn’t it?</p>
<h2 id="consumption-data-analytics">Consumption Data Analytics</h2>
<p>Consumption data is its own category of altdata. Right now this is huge in financial services but its potential is enormous. Quite simply, <em>consumption data</em> is business transaction-related information that can augment your predictive analytics.</p>
<p><code class="highlighter-rouge">Consumption Data Analytics</code> is the aggregating of online and offline (brick-and-mortar) consumer purchase activity, merged with consumer behavioral datasets, geolocation data (where was your smart phone when you made that online purchase), and other point-of-sale vendor data (also available for a fee).</p>
<p>Where can you get offline purchase activity? Well, the credit card companies (among many others) provide various levels of aggregated datasets for sale. This includes offline purchase activity.</p>
<p>Consumption Data Analytics in 2021 focuses on consumer consumption. I expect that to slowly shift to B2B consumption behaviors. An example: right now we have a global computer chip shortage. There are theories as to why that is, but if I am an automobile manufacturer that relies on certain chips for my vehicles, I want to know if my chip supplier is themselves experience supply chain issues so I can plan accordingly.</p>
<h2 id="data-exhaust">Data Exhaust</h2>
<p><code class="highlighter-rouge">Data exhaust</code> is the trail of data that remains after a business activity has occurred on a computing system. Data exhaust provides valuable insights. Some examples:</p>
<p><img align="right" width="200" src="https://davewentzel.com/images/altdata39.jpg" />* web server logs: this can tell you how long a consumer browsed your site before making a purchase, how long an item remained in their shopping cart before it was abandoned, etc.</p>
<ul>
<li>cookie data: both 1st and 3rd party cookie data will provide valuable information about your customers. Did you know that by default your browser throws off so much metadata about you that the average marketer can likely identify you with no additional data? This is called <a href="https://www.avast.com/c-what-is-browser-fingerprinting">fingerprinting</a>.</li>
</ul>
<p>Data exhaust is a great way to understand the behaviors of your customers…and your potential customers.</p>
<blockquote>
<p>Treat your software like IoT data. It is throwing off a lot of interesting browsing events for your users. If you can ingest that data and react to it in real-time you should be able to provide a better experience for your users.</p>
</blockquote>
<h2 id="consumer-profile-data">Consumer-profile data</h2>
<p>If you are a B2C company where your customer is a consumer then you need to know as much about them as possible.</p>
<ul>
<li>credit card transaction data: Who knows more about consumers than credit card companies? Card issuers provide altdata-sets of transaction history that is valuable to determine wants, desires, and trends. The data is always anonymized but you can still gain valuable insights depending on how you slice-and-dice the data.</li>
<li>credit reporting agencies: The Big Three credit reporting agencies will sell you data and services to help you target consumer demographics for your marketing campaigns based on interesting metrics like purchase data. <a href="https://www.experian.com/">Experian</a> will actually provide you with software that performs the consumer targeting, but I’d rather have access to the raw data so I can make my own unique matching algorithms.</li>
<li>data aggregators: <a href="https://www.acxiom.com/">Acxiom</a> is an example of a 3rd party data provider that will license data to you about consumers from various other 3rd party data sources. Then they validate the data and help you enrich your existing consumer profile data.</li>
</ul>
<p>The grocery industry has mastered consumer-profile data and it might be worthwhile to research how they do customer analytics. Grocers and CPG suppliers have been sharing data for years to learn about shopper habits and their shopping journey. Stores are analyzing broad buying trends to prevent shortages like we saw with toilet paper and Lysol during the early days of the pandemic. The CPG companies can leverage the POS data from the grocers to generate better consumer engagement and product offers and determine brand loyalty (which also suffered during the early pandemic).</p>
<h2 id="economy-and-economic-data">Economy and Economic Data</h2>
<p>Economic data that broadly shows the state of the economy and your industry is very valuable. Imagine you are a homebuilder…could you get a competitive advantage by knowing that lumber prices are forecast to rise substantially over the next few years because an invasive bug species is decimating Douglas Fir trees in the Pacific Northwest?</p>
<p>Jobs reports and inflation data are commonly used in many industries. If you are a QSR (Quick Serve Restaurant) it’s valuable to understand the <a href="https://en.wikipedia.org/wiki/Prevailing_wage">prevailing wage</a> in your area. How will this affect your margins?</p>
<h2 id="advertising-data">Advertising Data</h2>
<p><a href="https://www.nielsen.com/us/en/">Nielsen</a> is a century-old research firm that measures TV viewership, among MANY other things. They are a monopoly for this data and they provide different datasets for lots of different use cases. Recently they created a new dataset that allows them to make comparisons of how many people are streaming entertainment vs watching traditional broadcast channels. This could be beneficial to your next marketing campaign.</p>
<p>Advertisers have been using altdata for years (sometimes called <code class="highlighter-rouge">incidental data</code>), they just struggle to integrate it into their value-stream. Usually the integration is done on a one-time basis, usually in Excel. We can do better.</p>
<h3 id="unstructured-data">Unstructured Data</h3>
<p>All data has structure, otherwise it’s worthless, but <code class="highlighter-rouge">unstructured data</code> has come to mean data like images, pdfs, and video where you can extract value creatively. I mentioned above that many images have metadata that you can extract.</p>
<p>Every organization has a wealth of data that doesn’t sit in a traditional database. This means it’s difficult to do analytics on it. I call this <code class="highlighter-rouge">latent data</code>. It has value, but it’s difficult to extract. If you can find this latent data in your organization you can leverage it with your structured data. Examples:</p>
<ul>
<li>PDFs, Word docs, Excel spreadsheets, business forms, etc. <a href="https://azure.microsoft.com/en-us/services/cognitive-services/">Azure’s Cognitive Services</a> can help you extract data from file-based data sources.</li>
<li>handwritten notes, operator logs, user journals, etc. Handwriting recognition is a solved problem.</li>
</ul>
<p>At the MTC we work with a lot of manufacturing companies. Each one has stressed that they have what I call a <code class="highlighter-rouge">shifting demographics</code> problem. They have older workers nearing retirement and the younger generations are not interested in doing those dirty, manual labor jobs anymore. Recently, companies have been deploying IoT solutions to understand how they can automate some of these processes. Another approach is to look at all of the handwritten operator logs that these workers have maintained for decades and may not be digitized even today. <a href="https://azure.microsoft.com/en-us/services/cognitive-services/">Azure’s Cognitive Services</a> can OCR even the worst handwriting, allowing you to use NLP to find the patterns in the notes.</p>
<ul>
<li>web scrapes. Companies scrape webpage data for lots of reasons, but essentially they want to find valuable data that is locked up in the html. Examples:
<ul>
<li>scraping pricing information from your competitors’ website. There are companies that scrape industry-specific websites (they probably scrape your website already) in order to resell the data back to you! Why? They provide analytics that compare your data to your competition and provide valuable intel. Sometimes it’s as simple as telling you what a given price for a particular product should be during a certain time of day. This is called <code class="highlighter-rouge">competitive analysis</code>. These altdata-sets will show industry trends, growth rates, and demographics.</li>
</ul>
</li>
</ul>
<h2 id="the-hottest-altdata-trend-today">The Hottest altdata Trend Today</h2>
<p>Don’t value judge me. I think we are living in the most contentious, politically-charged environment ever. Probably everyone throughout history has said that.</p>
<p><img align="right" width="200" src="https://davewentzel.com/images/altdata38.jpg" />Now, imagine you are targeting me as a potential high value customer lead. My CLV (customer lifetime value) is 2x your average customer lead. You’ve collected all of the common demographics about me using altdata and existing transactional data. Would you agree that you might want to tailor your advertising to me if you knew what my political views were? Well, you can’t know who I voted for in the last Presidential election (supposedly we have a secret ballot), but in most areas you CAN determine my party registration. And voter registration lists are free in most areas. There are aggregator firms that will sell you this data.</p>
<p>Voter registration data, I believe, will be the hottest altdata-set in the near future.</p>
<h2 id="become-data-driven-at-the-mtc">Become Data-Driven at the MTC</h2>
<p>Are you convinced that your company is ready to leverage some of these altdata ideas?</p>
<p>I am a Microsoft Technology Center (MTC) Architect focused on data solutions. The MTCs are a service Microsoft provides to our customers. We strive to be the Trusted Advisors for our customers. Others have Know-How, we have Know-What. We want to understand your business problems and ideas for altdata analytics. Then, we’ll help you ingest and enrich the data using our cloud solutions. Technology alone cannot solve these problems without smart people and processes that work. We offer services ranging from human-centered Design Thinking Workshops – where we help you determine which use cases are the best for altdata – to hackathons where we quickly ingest some altdata, do the semantic enrichment with you, and quickly determine if the altdata provides lift.</p>
<p>Listen, we aren’t experts in your business, but we are great enablers. Within a few days we can build a rapid prototype and show you the Art of the Possible. We’ll show you what it takes to start a data sharing initiative and we’ll help you solve data problems in days that would’ve taken months in the past.</p>
<p>Does that sound compelling? Contact me on LinkedIn and we’ll get you started on your journey.</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-06-23T00:00:00-05:00https://davewentzel.com/content/AltData2/Data Sharing as a Replacement for ETL2021-06-15T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comELT/ETL is dead. Think 'data sharing'.<blockquote>
<p>This is Part 2 of a series of articles on leveraging alternative datasets to provide lift. <a href="/content/AltData/">Part 1</a> is an overview of altdata and <a href="/content/AltData3/">Part 3</a> has examples of altdata sets to pique your creativity.</p>
</blockquote>
<p>Altdata is any non-traditional dataset from a third party that can semantically enrich your existing data to provide competitive differentiation. When we have conversations with Chief Data Officers at the Microsoft Technology Center (MTC) we find that they desperately want to leverage altdata, but they can’t. The consensus is that data ingestion takes too long and the data will ossify too quickly.</p>
<p>In my <a href="/content/AltData/">previous altdata article</a> I showed some methods to shorten the <em>time-to-analytics</em>. Using vendor-provided APIs is one method, but APIs honestly aren’t a great solution when you are dealing with larger datasets that are being updated in real-time. Best practices and patterns like <code class="highlighter-rouge">ELT vs ETL</code> help a bit, but here’s the best solution: <strong>Stop doing ETL</strong>. Just don’t do it. That includes calling APIs and any kind of data ingestion. Just don’t do it.</p>
<p>Anecdotally, ETL, and any of its forms that copy data around, are one of the top reasons why data projects fail. It takes too long to ingest data to a local compute environment before an analyst can evaluate the data to see if, in fact, it does provide competitive differentiation. If we can quickly determine that a given dataset is not adding value we can fail-fast without writing a single line of ETL code.</p>
<p><img align="right" width="230" src="https://davewentzel.com/images/cloud.jpg" />The data community is moving in the direction of “data sharing”. In its simplest form, data sharing is where a 3rd party gives you access to their data lake (or database, or warehouse, etc). The data already has a well-known schema and I simply need to copy it locally where I can begin doing Exploratory Data Analytics (EDA). In more advanced forms of data sharing the 3rd party will allow me to attach their storage directly to my cloud compute thereby skipping the “data copying”. I’m doing EDA in minutes.</p>
<blockquote>
<p>The cloud enables robust data sharing. For a data professional the primary difference between <em>doing data</em> in your data center vs <em>doing data</em> in the cloud is the fact that in the cloud I can separate storage from compute. For example, in a traditional RDBMS the query engine (the compute) is tightly coupled with, and optimized for, the storage engine. In the cloud I can take any compute engine I want and attach my storage (for example, an Azure storage account) to it. This allows a data scientist to use python as the compute engine, a business analyst can use Power BI, and a data engineer can use Spark…all on the same data in the data lake. This is the heart of data sharing.</p>
</blockquote>
<h2 id="why-is-etl-such-a-risky-activity-for-most-data-projects">Why is ETL such a risky activity for most data projects?</h2>
<p>ETL is the process of copying data from one place to another, likely doing some semantic-enrichment in the process. Here are some of the issues with ETL:</p>
<ul>
<li>There aren’t enough ETL developers in most shops. These are specialized skills in high demand and the backlogs are long for most shops. If we can avoid ETL, and hence avoid engaging these precious skillsets, we should be able to eliminate some risk. We do this by querying data where it lives.</li>
<li>Copying data involves multiple data interchange formats and some of them, like csv files, are not well-defined and cause grief and data quality problems. <a href="https://stackoverflow.com/questions/10286204/what-is-the-right-json-date-format">Even JSON, IMO, is a horrible data interchange format</a>…it’s bloated and doesn’t handle dates well, but it’s the backbone of almost all REST-based APIs.</li>
<li>We need to have a destination for the data copy. This becomes contentious in many shops. A typical question: “Where in the data warehouse are we going to stick this new altdata?” Fact is, we haven’t yet done the analysis to see if this data is valuable, we need to put it in temporary storage until we can enrich it and determine its value. We do this using data lakes where we can join multiple datasets together to find the nuggets of gold. If you don’t have a data lake (or don’t have access to a data sandbox) this becomes difficult. Instead, we can query the data where it lives.</li>
<li>ETL coding takes time. With altdata the goal is to do analytics really quickly. In many cases we want to leverage external datasets <em>as the business transaction is occurring</em> to add value. We can’t do that if the ETL process runs as a n overnight batch job.</li>
<li>In most organizations, adding data to a data warehouse (via ETL) requires a security review and data governance activities. But again, we haven’t determined if the new data is valuable. It would be better to profile the data where it lives and defer these decisions.</li>
</ul>
<h2 id="internal-and-external-data-sharing">Internal and External Data Sharing</h2>
<p>There are 2 types of data sharing: internal and external.</p>
<p>External is the simplest: I connect to a dataset outside of my organization. I might need to purchase a subscription to this data or it might be an open dataset (like <a href="https://azure.microsoft.com/en-us/services/open-datasets/">Azure’s Open Datasets</a>).</p>
<p>Internal data sharing is where different lines-of-business provide access to their data marts or data lakes. Yes, we tend to see many of our customers with multiple data lakes. That’s OK in the Age of Data Sharing. If these data sources become data silos then we can’t get the most economic value out of our data.</p>
<h2 id="the-biggest-obstacle-to-data-sharing-culture">The Biggest Obstacle to Data Sharing: Culture</h2>
<p>The biggest obstacle to data sharing, anecdotally, is <em>culture</em>. When we talk to companies at the MTC we often hear that they aren’t yet ready to ingest external data sources, usually due to the reasons listed above. When we dig a little deeper we find a more pervasive problem. We often hear about <em>information hoarding</em> where some business units are afraid to provide other departments with access to data because it may result in the <em>loss of their influence and power</em>.</p>
<p>Wow!</p>
<p><img align="right" width="330" src="https://davewentzel.com/images/silo.jpg" />These companies tend to have many department-level data lakes/marts/warehouses with duplicated data, data with different levels of aggregation, governance, and quality, and no standard data interchange formats. In economics, scarcity leads to demand. But data doesn’t have to be scarce. It’s pretty easy to copy it around. But data <em>is rivalrous</em>, that is, whoever owns the data controls the power. The result: data silos and information hoarding.</p>
<p>This mindset is difficult to overcome, but it can be done. Start with a few use cases where we can quickly prove that data sharing is akin to <em>a rising tide that lifts all boats</em>. One way is to share data talent among teams. We see that some teams have great data scientists and analysts but lack ETL developers, while others have the reverse. By sharing resources we gain cross-training and intra-departmental trust.</p>
<p>Here’s an example: let’s say low-level customer sales attributes sit in a data mart owned by the sales department. Clearly finance and marketing will have access to aggregated sales data, but if the sales department owns the data silo with the raw, valuable data, other teams can’t benefit. With a zero dollar marginal cost those low level sales metrics can be used by marketing to improve <code class="highlighter-rouge">cac</code> (customer acquisition cost) and reduce customer churn. The R&D team can use that same data to determine which new features to prioritize. This creates a virtuous cycle: as more departments see the value of data sharing, more data will be shared. Using data sharing paradigms (connecting remote datastores to your compute) the marginal cost of data reuse is nearly zero and you can quickly measure the value-add.</p>
<blockquote>
<p>We are living in a “sharing economy”. Some business leaders think that protecting their data is a source of power. It probably is. But more power can be gained by sharing it. How Can we do this?</p>
</blockquote>
<p>Here are some methods I’ve seen that will allow you to implement data sharing quickly.</p>
<h2 id="the-data-marketplace--data-should-be-a-shopping-experience">The Data Marketplace: Data should be a shopping experience</h2>
<p>This is the model I like best. A data marketplace is a lot like an e-commerce site. I can shop around to find the data I need. This enables self-service analytics. In some cases the 3rd party will allow “data virtualization” where you can attach directly to the data and query it without bringing it into your data lake (in-place access). If the data is deemed valuable after it is analyzed then we can determine if we want to copy the data into our local data stores.</p>
<p>The problem is: there are no really good, <em>comprehensive</em> data marketplaces for 3rd party data right now. The major cloud vendors have cataloged public datasets already. For paid and subscription altdata, you still have to know where to go to find what you need. It won’t be long before we have cloud-based data marketplaces that will facilitate data interchange.</p>
<p><img align="right" width="500" src="https://davewentzel.com/images/alt06.png" />Internally, you can create your own data marketplace to combat information hoarding. The simplest way to do this is with a good data catalog. If you are a Microsoft shop our <a href="https://azure.microsoft.com/en-us/services/purview/#features">Azure Purview</a> is a great option. It will allow you to tag data sources, search them, create sample data, create documentation, show lineage, and list contact information.</p>
<h2 id="the-data-lake-sharing-model">The Data Lake Sharing Model</h2>
<p>Essentially you distribute keys or tokens to users that would like to access your data (lake). Those users can then connect to your data by mounting it to their compute engine. By far this is the most common model I’ve seen for external data sharing. This is a common pattern in industries where data is commonly shared between business partners.</p>
<p>The only downside to this model is most cloud providers will charge data egress fees to the owner of the storage. This means that the data producer will be charged based on how much data is extracted by the consumer. This could be expensive but can be creatively handled with things like chargeback models and throttling.</p>
<p><img align="right" width="400" src="https://davewentzel.com/images/alt05.png" />Azure Data Share is another Microsoft offering that provides an additional wrapper around your data and will allow you to share additional data assets like data warehouse data and SQL Server data without having to understand the minutiae of SAS tokens.</p>
<h2 id="the-data-as-a-service-model">The Data-as-a-Service Model</h2>
<p>The DaaS model is very close to the Data Marketplace and Data Lake Sharing Models. The DaaS offering goes a step further to getting us to true “Self-Service Analytics”. The data storage implementation is totally abstracted away from the analyst. The analyst is given a query interface directly over all of data sources and is simply writing queries. Data access hassles are abstracted away.</p>
<h2 id="the-enterprise-service-bus-model">The Enterprise Service Bus Model</h2>
<p>In this model the data producer allows you to subscribe to their “events”. It is your responsibility to ingest those events (probably in a data lake) and perform any analytics. This is definitely not as easy to do as standard data sharing, but this method has been around for years. An example: you would like to ingest real-time telemetry about your fleet vehicles from Ford and GM. After you request this data you will be allowed to subscribe to the real-time data hub where you can choose how often you want to pull updates. This is a common enterprise integration pattern but there is no single standard so you can expect to have your IT staff spend some time just ingesting the data to get it ready for your analysts. But, you <em>will</em> have access to real-time data.</p>
<h2 id="an-example-of-data-sharing--weather-data">An Example of Data Sharing: Weather data</h2>
<p>NOAA provides hourly worldwide weather history data for free via Microsoft Azure Synapse Dataset Gallery. The Integrated Surface Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 worldwide stations. Parameters included are: air quality, atmospheric pressure, atmospheric temperature/dew point, atmospheric winds, clouds, precipitation, ocean waves, tides and more. ISD refers to the data contained within the digital database as well as the format in which the hourly, synoptic (3-hourly), and daily weather observations are stored.</p>
<p>Let’s say you quickly want to see if weather data can provide supply chain efficiencies.</p>
<p>The first thing I need to do is EDA. I need to familiarize myself with what is available in the dataset. I don’t want to copy the data locally and I want to see the most up-to-date data. In about 5 minutes I wrote this query:</p>
<p><img align="center" src="https://davewentzel.com/images/alt21.png" /></p>
<p>This gets into the weeds, but notice I am querying the ISDWeather dataset directly and never copied it locally. I added a WHERE filter to show just the most recent data. I can quickly see that I must do some research to determine what <code class="highlighter-rouge">usaf</code> and <code class="highlighter-rouge">wban</code> are used for. I also note that I’m missing <code class="highlighter-rouge">temperature</code> in my sample data. I may need to determine if that will be a problem. I do have the lat/long coordinates so I should be able to use that, as well as the datetime, to marry this data with my supply chain data.</p>
<p>And notice this query returned data in 4 seconds! That’s excellent <em>time-to-analytics</em>!!! This is so much better than doing ETL.</p>
<p>I need to understand this dataset better. But I was able to do all of this in just a few minutes.</p>
<blockquote>
<p>This is the promise of data sharing: Faster time-to-value.</p>
</blockquote>
<h2 id="the-mtc-can-help">The MTC can help</h2>
<p>The Microsoft Technology Center is a service that helps Microsoft customers on their Digital Transformation journey. We know that <img align="right" width="250" src="https://davewentzel.com/images/mtc1.jpg" />successful data projects are less about the technology and more about the process and people. Data sharing is a great way to avoid ETL and put data in the hands of your analysts quickly. At the MTC, we’ve been doing <em>data</em> for years. We are thought leaders, conference speakers, and former consultants and executives. We’ve learned the patterns that will help you execute successful data sharing projects. And with the cloud we can execute in hours-to-days instead of months.</p>
<p>Does this sound compelling? SUCCESS for the MTC is solving challenging problems for respected companies and their talented staff. Does that sound like folks you can trust on your next project? The Digital Transformation is here, and we know how to help. Would you like to engage?</p>
<blockquote>
<p>In my next article I’ll give you some creative ideas for altdata that you can leverage today to provide competitive advantage.</p>
</blockquote>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-06-15T00:00:00-05:00https://davewentzel.com/content/AltData/Gaining Information Edge with AltData2021-06-14T00:00:00-05:00Dave Wentzeldave#davewentzel.comhttp://davewentzel.comWe are living in a data sharing environment. Here is how you can leverage altdata in your data analytics<hr />
<p>“Lift” is something every data scientist and business person strives for. Getting <em>better data</em> is one approach to adding value. Once you’ve added all of your internal, proprietary data to your analytics and you’ve transformed and massaged it all you can, you need to start looking for valuable datasets in non-traditional places. This is called <code class="highlighter-rouge">altdata</code> or <em>alternative data</em>. I am a Microsoft Technology Center data evangelist. Everyday I help forward-thinking companies leverage altdata to gain a profitable edge. Let me show you how I do it and give you some ideas of altdata that you can leverage in your business. We’ll look at multiple verticals and lines of business and … I’ll even let you in on what I think is the <em>hottest</em> alt-dataset secret that you’ll want to leverage.</p>
<h2 id="quick-history">Quick History</h2>
<p><img align="right" width="500" src="https://davewentzel.com/images/alt01.png" /> 3rd party datasets have been around since the advent of data analytics. Within the last 10 years ingesting these datasets has been a lot easier and that’s led to an explosion of business people wanting to leverage this data. This has led to more companies selling their proprietary datasets (called data monetization). This virtuous cycle has attained critical mass recently.</p>
<p>Probably the first wave of this most recent boom was ingesting Twitter data. Twitter provided a free API that was easy enough for anyone to do sentiment and trending analysis to help with things like marketing campaigns and product development.</p>
<p><img align="left" width="150" src="https://davewentzel.com/images/alt02.png" />The next dataset everyone wanted to explore was weather data. “How can I use weather to provide lift to my value stream?” I did a lot of small projects with customers who experimented with weather to predict sales or deal with supply chain issues. In almost every case <strong>weather provided zero lift for the initial intended use case</strong>. But here’s the thing, with modern data ingestion processes and data lake paradigms we could experiment with weather data in a matter of weeks without a big capital software project investment. We could fail-fast, cheaply.</p>
<p>But these projects were not failures. For many of my customers this was their first foray into using altdata and they quickly found out how easy it was to do. For years business analysts (BAs) were clamoring for IT to ingest 3rd party datasets but the effort was always huge, and so was <img align="right" width="150" src="https://davewentzel.com/images/alt02.jpg" />the backlog. With data lakes, ELT (vs ETL) paradigms, and EDA (exploratory data analytics), we could show that the turnaround for basic altdata ingestion to the lake was hours-to-days instead of weeks. Time-to-analytics was radically compressed. Suddenly BAs and data scientists were dreaming up all kinds of experiments with altdata. Even though the first use case for weather data generally failed, suddenly everyone was thinking up other use cases for weather and we already had the data flowing into the lake, in real-time, and it wasn’t difficult to experiment on subsequent use cases. This is anecdotal, but about <strong>50% of subsequent use cases for weather data actually showed lift</strong>. This was encouraging.</p>
<blockquote>
<p>This is the pattern I see: the first foray into altdata isn’t a success, but it lays the foundation for future successes.</p>
</blockquote>
<p>And, once you have a good pattern to follow to ingest raw altdata into your data lake, you’ll find that the marginal cost to ingest the <em>next</em> dataset is reduced. And if any of altdata experiments does provide lift to one use case it will likely provide lift to other use cases in other departments, at a near-zero marginal cost. But, their must be an organizational culture of <em>data sharing</em>, which I’ll talk about in Part 2 of this article.</p>
<h2 id="how-do-you-leverage-altdata"><em>How</em> do you leverage altData</h2>
<p>Leverage altdata by conducting experiments. You have to ingest the data quickly, preferably in real-time, and get it into the hands of the data scientists and analysts. You need a platform that can help you move quickly. The cloud, like <a href="portal.microsoft.com">Microsoft Azure</a>, can help you do small experiments without needing a capital budget. The cloud is a utility…if your experiment doesn’t work, just turn it off. <img align="right" width="200" src="https://davewentzel.com/images/alt03.png" />Here’s a quick list of other tools you should be using or researching:</p>
<ul>
<li>data lakes: these aren’t just an archive of csv files. A proper data lake is structured to allow different users access to the data without the data going through a formal data warehouse modeling and ETL exercise (which takes MONTHS). A good data lake will allow data scientists to use their preferred tools to analyze the data (python, Spark) while allowing business analysts to use their tools-of-choice (SQL or Power BI for instance).</li>
<li>ELT tools with a connector ecosystem to many, many 3rd party datasets: <a href="https://azure.microsoft.com/en-us/services/data-factory/">Azure Data Factory</a> is an ELT tool that allows an analyst with little formal data engineering training to ingest data into a data lake quickly. <strong>Never do ETL, always do ELT. Ingest the data with full fidelity to the data source. Don’t try to massage the data and clean it. Raw, dirty data has a lot of embedded <code class="highlighter-rouge">signal</code> in the <code class="highlighter-rouge">noise</code></strong>.</li>
<li>APIs: most altdata sources still require you to pull data from an API. REST APIs and <code class="highlighter-rouge">odata</code> are great for many things, but copying large amounts of data is not a strong suit. But we have to use what we are given. Ensure your have personnel that understand REST API patterns.</li>
<li>Real-time, streaming data: think of any data you purchase or ingest as having an expiration date. You need to ingest the data as quickly as possible. All data, even if it’s ingested as a batch csv processing on the 1st of every month, should be treated and handled as though it is constantly being generated. Like IoT data. <strong>If you understand IoT data ingestion patterns and treat all data like IoT data then all ingestion activities can follow the same pattern. This simplifies your architecture. KISS.</strong></li>
<li>data sharing concepts: This is where the altdata community is quickly headed. My next article will be devoted entirely to this concept.</li>
</ul>
<p>As long as you understand the foundations of the basic concepts above you should be able to leverage altdata profitably.</p>
<blockquote>
<p>Don’t get hung up on the technologies, it’s much more important to understand the <em>patterns</em> and <em>processes</em>.</p>
</blockquote>
<h2 id="isnt-everyone-doing-this-now--how-do-i-get-information-edge">Isn’t everyone doing this now? How do I get <em>information edge</em>?</h2>
<p>Certainly more and more companies are doing altdata experiments. But just ingesting data isn’t enough. Anyone can buy a dataset and do predictions but if everyone is buying that same dataset, where’s the competitive advantage? <img align="right" width="400" src="https://davewentzel.com/images/alt04.png" />You need to spend time working with the data. This is often called <a href="https://en.wikipedia.org/wiki/Exploratory_data_analysis">Exploratory Data Analysis</a>(or EDA). We are looking for patterns in the data and applying statistical methods and visualizations to understand the data better.</p>
<p>Data scientists call this <code class="highlighter-rouge">feature engineering</code> and it gives you <em>information edge</em>. Data ingestion doesn’t magically provide lift, you need to find the valuable nuggets of gold.</p>
<h2 id="what-does-a-good-altdata-set-look-like">What does a good altdata-set look like?</h2>
<ul>
<li>Focus on datasets that you can get exclusive access to. This means your competition isn’t experimenting with it too. Remember, the raw altdata rarely provides competitive advantage. It is your analysts who will find the nuggets of gold after experimenting with it.</li>
<li>Can you quickly use the altdata to semantically-enrich your existing data? If you can extract more value from the data than your competition, that’s a great altdata-set.</li>
<li>How quickly can you leverage the data before it expires? If you are buying real-time datasets you better be able to leverage them quickly in the value stream.</li>
<li>Can you get access to the altdata’s SMEs? Getting access to SMEs that have a deep knowledge of the data and domain is invaluable. Always ask your data vendor if you can get some of their SME’s valuable time and be prepared to ask them good questions.</li>
</ul>
<blockquote>
<p>I once consulted for a FinTech company that was blindly purchasing credit agency data on its customers. We received almost 2000 data points for each customer in a nightly feed. The dataset’s documentation was hundreds of pages. And I still couldn’t understand where to find value. So I got a hour of their SME’s time and I asked a very simple question: <em>If you were me and were trying to provide lift to our company with this data, where would you start?</em> What followed was a 2 hour conversation that we thankfully recorded and replayed MANY times.</p>
</blockquote>
<h2 id="where-should-i-inject-altdata-into-my-value-stream">Where should I inject altdata into my value stream?</h2>
<p>Every business is different, but every business has what we call <code class="highlighter-rouge">transactional data</code>. Think of the <code class="highlighter-rouge">transaction</code> as what generates revenue. Assume you sell goods on an e-commerce website. Your transactional data is the record of the sales data when your customer submits an order. Generally the transaction in an e-commerce system starts when the user first lands on your homepage, navigates through some products, does a search, adds a product to the shopping cart, and checks out.</p>
<p>There is a LOT of data you likely haven’t captured about this transaction. Some examples:</p>
<ul>
<li>Who was the <code class="highlighter-rouge">referrer</code> that sent the customer to your website? A bing search? A google ad? If we knew this we could tailor our marketing efforts.</li>
<li>Can we attribute the sale to one of our existing marketing campaigns? This will tell us where to focus future campaigns.</li>
<li>What do we know about this customer? Their interests, social media profiles, etc. This will give us a more holistic view of our customer.</li>
<li>Can we follow up and get the customer’s sentiment on their experience with us? This will tell us what we should be doing in the future.</li>
</ul>
<p>This is all peripheral data to the transaction event. We can’t capture this data easily in our e-commerce system, but we can find all of this information in altdata from other sources. This <code class="highlighter-rouge">peripheral data</code> likely adds the most lift.</p>
<blockquote>
<p>Remember, it’s not the data itself that is valuable. It’s the patterns and insights gleaned from the data that adds the value.</p>
</blockquote>
<h2 id="become-data-driven-at-the-mtc">Become Data-Driven at the MTC</h2>
<p>Are you convinced that your company is ready to leverage altdata?</p>
<p>Most companies are taking steps to become more data-driven but still aren’t leveraging external datasets. Anecdotally, when I ask why the overwhelming responses are:</p>
<ul>
<li>“we don’t have the resources that can ingest the datasets our analysts want”</li>
<li>“we don’t have analysts and processes that can leverage this data. There is no playbook on how to gain actionable insights from these data sources.”</li>
</ul>
<p><img align="right" width="250" src="https://davewentzel.com/images/mtc1.jpg" />I am a Microsoft Technology Center (MTC) Architect focused on data solutions. The MTCs are a service Microsoft offers to its customers and we strive to be their Trusted Advisors. We offer services where we can show you how to leverage altdata quickly with a fail-fast mentality. With data lakes, ELT paradigms, and Exploratory Data Analytics we can show your team the simple patterns that have been proven to work.</p>
<p>Come to us with a problem you think might be a good use case for altdata. Within a few days we can build a rapid prototype and show you the Art of the Possible. We’ll show you what it takes to start a successful data initiative and we’ll help you solve problems in days that would’ve taken months just a few years ago.</p>
<p>Does that sound compelling? Contact me on LinkedIn and we’ll get you started on your altdata journey.</p>
<p>In Part 2 of this series we’ll look at how Data Sharing is revolutionizing altdata initiatives at many companies. In Part 3 we’ll look at some concrete use cases for altdata in different verticals.</p>
<hr />
<p>Are you convinced your data or cloud project will be a success?</p>
<p>Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.</p>
<p>Thanks for reading. If you found this interesting please <a href="https://davewentzel.com/blog/feed/">subscribe to my blog</a>.</p>
<h3 class="t60" id="related-posts">Related Posts</h3>
<ul class="side-nav">
<li><a href="https://davewentzel.com/content/DataLiteracySeries/"><strong>Data Literacy Workshops</strong></a></li>
<li><a href="https://davewentzel.com/content/SoftwareDecisionCalc/"><strong>Software Implementation Decision Calculus</strong></a></li>
<li><a href="https://davewentzel.com/content/DSaaS/"><strong>MTC Data Science-as-a-Service</strong></a></li>
<li><a href="https://davewentzel.com/content/Governance/"><strong>Top 10 Data Governance Anti-Patterns for Analytics</strong></a></li>
<li><a href="https://davewentzel.com/content/DeadDashboard/"><strong>The Dashboard is Dead, Probably?</strong></a></li>
</ul>
2021-06-14T00:00:00-05:00