altdata Ideas You Can Leverage Today

altdata Ideas You Can Leverage Today

Here are some actionable ideas to get you thinking about how to leverage altdata

This is Part 3 of a series of articles on leveraging alternative datasets to provide lift. Part 1 is an overview of altdata and Part 2 is how data sharing is displacing ETL processes and shrinking time-to-value in data projects.

Data-driven companies are leveraging alternative datasets (I call it altdata) to provide “lift” to their analytics processes. In this article I’ll give you some altdata ideas that you may want to integrate into your value-stream. Every day at the Microsoft Technology Center (MTC) we help our customers ingest altdata-sets and determine quickly if they provide value. If you are a Microsoft customer we can help you too, contact me on LinkedIn. These ideas are in no particular order.

altdata Ideas for Any Industry

  • social media sentiment analysis
    • most social media outlets allow you to programmatically query real-time feeds. We can do simple things like sentiment analysis against a hashtag or we can look at general trends that may be of interest
  • scrapes from websites
    • websites contain lots of interesting information in an unstructured format. We can scrape that data and do Natural Language Processing (NLP) against it. For instance, we can look for terms or key phrases.
    • another common use case is doing a web search for an entity or person and then performing sentiment analysis against the first page of search results. This can tell us a lot of interesting things about the entity/person. Is the entity in the news? If so, for what?
  • satellite data
    • this data can give you a lot of valuable information about your customer. Let’s say you sell pool supplies in your local community. You could do a mass mailing to the region, or you could target specific houses where you could identify pools from the satellite imagery. I actually did this work for a customer once. From a “ten thousand foot level” I could quickly determine what localities to target:

If I zoom in to the street level you can see that I used our Azure Cognitive Services and AI to identify the pools and correlate them with Bing Maps to get their addresses. With the addresses I could quickly do a lookup to my sales database to determine if the address was already a customer (the yellow square) or not a customer (the red square). Note: it’s not perfect, it clearly missed a pool…but this is something I created in just a few days.

  • hobbyist drone data (or any other image data)
    • similar to the satellite data, there are lots of use cases where a birds-eye view of a space can provide you with value

There are lots of stories about hedge funds using drone data to determine retail store traffic or the number of container ships in Chinese ports. This can provide broad strokes on the health of a region or the economy in general, or we can laser focus to a particular address.

  • images
    • image data can come from anywhere: web searches, smart phones.
    • many times the images have embedded metadata in Exif format (a 20 year old standard) that will tell you interesting things about the image: the geotag, any comments, the make/model of the phone. All of this information might be valuable to your use case.
  • weather data
    • There are so many ways weather may provide lift. A quick internet search will give you lots of ideas.
  • real estate data. What could you do with real estate data? There are so many use cases.
    • The MLS (Multiple Listing System) has datasets available for residential real estate. They hold a monopoly on their data and charge accordingly. But if your company sold backyard pool supplies, wouldn’t it be interesting to find all of the local homes for sale with in-ground pools?
    • There is no equivalent MLS for commercial real estate. This would be an excellent business model for a startup to capitalize on. In the interim, you could create proxies for commercial real estate activity by looking for other altdata-sets.
    • home-sharing firms like airbnb and vrbo have datasets available for purchase. This may help determine regional trends and economic activity.

The App Ecosystem

Think of all the apps you run on your smartphone. On the surface it would seem the business model for many of them is ad-driven. They monetize the ad impressions and are carefully targeting the CTR (clickthrough rate). But that’s only part of the story. Many are selling altdata to businesses to semantically enrich their existing data.

Thought experiment: out of the apps you use, which ones might be tracking users in a way that would provide lift to your business?

Here’s one example: Uber. With the user’s permission, Uber (likely Lyft too, but I’m not sure) can sell location data to food and retail industry players. Other companies can leverage this data to provide discounts and promotions personalized to the specific customer.

Companies that have already monetized their data

What is Digital Transformation? My definition is simple. A digitally-transformed company has learned how to monetize its data. That could mean leveraging its data to control costs are increase revenue, but at the extreme it means the company is selling its value data assets to others.

Thought experiment: If you had access to any one single company’s data assets, which one would it be and why? Now, go research if that company has monetized its data.

Here are some companies that have:

  • SmartTV manufacturers have been capturing the IoT-style data from each TV. Every time you change the channel, turn the device ON/OFF, change the volume, etc, the manufacturer knows it. These manufacturers make more money selling the data YOU generate for them than they make selling the actual hardware. This is a major reason why prices are falling precipitously. The manufacturers NEED you to upgrade to even smarter units with more built-in apps. Did you know that apps like Roku, Netflix, and Amazon Prime pay the manufacturers to have their apps installed in the factory? The data is so valuable. How can you leverage this altdata?
  • Payment card processors make more money monetizing their datasets than they do on the transaction fees. I would think this would bring transaction fees down in the US, which have the highest fees in the world, but it hasn’t.
  • American Airlines’ data is now valued higher than the airline itself! This astounds me. With all of their capital assets (which are tangible assets on their balance sheet) the intangible assets are worth more.

There is a general trend in the worldwide economy where intangible assets are a higher percentage of the balance sheet than ever. The biggest factor is likely monetized data, but someone should do some research to confirm this. What is really interesting is that tangible assets depreciate over time. Intangible assets, like data, don’t. How can you leverage this asset class?

Financial Services Use Cases

The Financial Services industry loves seeking “alpha” (the industry’s equivalent term for “lift”) in altdata sources. Some interesting altadata ideas for the finance industry:

  • financial reports and SEC filings
    • these documents are available freely from many government websites, for free. The easiest to use is EDGAR. You can find various filings in pdf format. The data can be scraped and added to a data lake where we can do interesting analytics like Named Entity Recognition or simple sentiment analysis. We can also look for specific phrases and terms.
  • private company data. Dun & Bradstreet is the de facto standard on private company data and commercial credit.
  • carbon footprint measurement. ESG investing is hot right now and every company us trying to change its perception that it is environmentally-friendly. Even BP changed their log to appear more “green”. How can we measure the carbon footprint of a company given the available altdata in the marketplace?

  • LinkedIn provides lots of datasets. The investment industry leverages this data for simple use cases like monitoring employee counts and openings. How could you leverage LinkedIn data to semantically-enrich your data?

Risk Analytics

Customers always ask me for interesting ideas for altdata in their industry. Fact is, I don’t know your industry as well as you do. Any ideas I may have, you’ve probably researched. My response is to think of use cases that your competition might not also be researching. A big area to focus on is Risk Management. Finding altdata that can mitigate risk should provide lift. Every industry has different risk management profiles, but let’s look at an example to get you thinking creatively.

Cambridge Mobile Telematics recently acquired TrueMotion. Both companies provide vehicle telematics data to auto insurers to reduce risk. Well, why couldn’t you leverage similar data? Traditional auto insurance risk rating factors such as age, gender, credit score, zip code data, moving violations, and type of vehicle are less predictive of accident risk than actually looking at driver behavior…via OBDII on-board vehicle telematics. Those traditional risk rating factors are just proxies for likely driver behavior. Younger drivers tend to be more risky, as are middle aged men driving red sports cars. Or, that’s the theory.

I will NEVER install a telematics device in my vehicle that will send data to my insurer. I can assure you that my risk profile using the traditional rating factors is much, Much, MUCH better than my actual driving behaviors. (I probably shouldn’t admit that).

CMT will likely create additional datasets to monetize for other industries than just auto insurance. You might be able to glean valuable insights about your customer if you knew their driving habits. How can knowing my customers’ risky behaviors provide me with competitive advantage? The bulk of CMT’s employees are data professionals, I’m sure they are dreaming up new data monetization avenues.

You can acquire telematic driving altdata from lots of vendors.

  • mobile-phone apps ask for your location data (and are likely tracking it)
  • insurance company-provided dongles
  • aftermarket blackboxes and in-car video

Thought experiment: Who better to provide auto insurance that the auto manufacturers that have access to all of your vehicle telematics, service history, credit, etc? General Motors has announced they are planning to offer their own auto insurance that they will bundle with OnStar. Brill-yunt! They are monetizing their data. That is Digital Transformation!

Banking and insurance are highly-regulated industries and tend to be slow-to-change based on necessity. This has allowed innovators from micro-lenders to payment processors to leverage data and invest heavily in digital services. One of the enablers of this trend is better risk management from altdata.

These companies are leveraging altdata like:

  • prescription-drug histories
  • EHR/EMR records
  • DMV records
  • property records
  • life insurance clearinghouse data from people’s previous applications.

Yep, all of this data, in some de-identified fashion, is available to many industries. Surprising, isn’t it?

Consumption Data Analytics

Consumption data is its own category of altdata. Right now this is huge in financial services but its potential is enormous. Quite simply, consumption data is business transaction-related information that can augment your predictive analytics.

Consumption Data Analytics is the aggregating of online and offline (brick-and-mortar) consumer purchase activity, merged with consumer behavioral datasets, geolocation data (where was your smart phone when you made that online purchase), and other point-of-sale vendor data (also available for a fee).

Where can you get offline purchase activity? Well, the credit card companies (among many others) provide various levels of aggregated datasets for sale. This includes offline purchase activity.

Consumption Data Analytics in 2021 focuses on consumer consumption. I expect that to slowly shift to B2B consumption behaviors. An example: right now we have a global computer chip shortage. There are theories as to why that is, but if I am an automobile manufacturer that relies on certain chips for my vehicles, I want to know if my chip supplier is themselves experience supply chain issues so I can plan accordingly.

Data Exhaust

Data exhaust is the trail of data that remains after a business activity has occurred on a computing system. Data exhaust provides valuable insights. Some examples:

* web server logs: this can tell you how long a consumer browsed your site before making a purchase, how long an item remained in their shopping cart before it was abandoned, etc.

  • cookie data: both 1st and 3rd party cookie data will provide valuable information about your customers. Did you know that by default your browser throws off so much metadata about you that the average marketer can likely identify you with no additional data? This is called fingerprinting.

Data exhaust is a great way to understand the behaviors of your customers…and your potential customers.

Treat your software like IoT data. It is throwing off a lot of interesting browsing events for your users. If you can ingest that data and react to it in real-time you should be able to provide a better experience for your users.

Consumer-profile data

If you are a B2C company where your customer is a consumer then you need to know as much about them as possible.

  • credit card transaction data: Who knows more about consumers than credit card companies? Card issuers provide altdata-sets of transaction history that is valuable to determine wants, desires, and trends. The data is always anonymized but you can still gain valuable insights depending on how you slice-and-dice the data.
  • credit reporting agencies: The Big Three credit reporting agencies will sell you data and services to help you target consumer demographics for your marketing campaigns based on interesting metrics like purchase data. Experian will actually provide you with software that performs the consumer targeting, but I’d rather have access to the raw data so I can make my own unique matching algorithms.
  • data aggregators: Acxiom is an example of a 3rd party data provider that will license data to you about consumers from various other 3rd party data sources. Then they validate the data and help you enrich your existing consumer profile data.

The grocery industry has mastered consumer-profile data and it might be worthwhile to research how they do customer analytics. Grocers and CPG suppliers have been sharing data for years to learn about shopper habits and their shopping journey. Stores are analyzing broad buying trends to prevent shortages like we saw with toilet paper and Lysol during the early days of the pandemic. The CPG companies can leverage the POS data from the grocers to generate better consumer engagement and product offers and determine brand loyalty (which also suffered during the early pandemic).

Economy and Economic Data

Economic data that broadly shows the state of the economy and your industry is very valuable. Imagine you are a homebuilder…could you get a competitive advantage by knowing that lumber prices are forecast to rise substantially over the next few years because an invasive bug species is decimating Douglas Fir trees in the Pacific Northwest?

Jobs reports and inflation data are commonly used in many industries. If you are a QSR (Quick Serve Restaurant) it’s valuable to understand the prevailing wage in your area. How will this affect your margins?

Advertising Data

Nielsen is a century-old research firm that measures TV viewership, among MANY other things. They are a monopoly for this data and they provide different datasets for lots of different use cases. Recently they created a new dataset that allows them to make comparisons of how many people are streaming entertainment vs watching traditional broadcast channels. This could be beneficial to your next marketing campaign.

Advertisers have been using altdata for years (sometimes called incidental data), they just struggle to integrate it into their value-stream. Usually the integration is done on a one-time basis, usually in Excel. We can do better.

Unstructured Data

All data has structure, otherwise it’s worthless, but unstructured data has come to mean data like images, pdfs, and video where you can extract value creatively. I mentioned above that many images have metadata that you can extract.

Every organization has a wealth of data that doesn’t sit in a traditional database. This means it’s difficult to do analytics on it. I call this latent data. It has value, but it’s difficult to extract. If you can find this latent data in your organization you can leverage it with your structured data. Examples:

  • PDFs, Word docs, Excel spreadsheets, business forms, etc. Azure’s Cognitive Services can help you extract data from file-based data sources.
  • handwritten notes, operator logs, user journals, etc. Handwriting recognition is a solved problem.

At the MTC we work with a lot of manufacturing companies. Each one has stressed that they have what I call a shifting demographics problem. They have older workers nearing retirement and the younger generations are not interested in doing those dirty, manual labor jobs anymore. Recently, companies have been deploying IoT solutions to understand how they can automate some of these processes. Another approach is to look at all of the handwritten operator logs that these workers have maintained for decades and may not be digitized even today. Azure’s Cognitive Services can OCR even the worst handwriting, allowing you to use NLP to find the patterns in the notes.

  • web scrapes. Companies scrape webpage data for lots of reasons, but essentially they want to find valuable data that is locked up in the html. Examples:
    • scraping pricing information from your competitors’ website. There are companies that scrape industry-specific websites (they probably scrape your website already) in order to resell the data back to you! Why? They provide analytics that compare your data to your competition and provide valuable intel. Sometimes it’s as simple as telling you what a given price for a particular product should be during a certain time of day. This is called competitive analysis. These altdata-sets will show industry trends, growth rates, and demographics.

The Hottest altdata Trend Today

Don’t value judge me. I think we are living in the most contentious, politically-charged environment ever. Probably everyone throughout history has said that.

Now, imagine you are targeting me as a potential high value customer lead. My CLV (customer lifetime value) is 2x your average customer lead. You’ve collected all of the common demographics about me using altdata and existing transactional data. Would you agree that you might want to tailor your advertising to me if you knew what my political views were? Well, you can’t know who I voted for in the last Presidential election (supposedly we have a secret ballot), but in most areas you CAN determine my party registration. And voter registration lists are free in most areas. There are aggregator firms that will sell you this data.

Voter registration data, I believe, will be the hottest altdata-set in the near future.

Become Data-Driven at the MTC

Are you convinced that your company is ready to leverage some of these altdata ideas?

I am a Microsoft Technology Center (MTC) Architect focused on data solutions. The MTCs are a service Microsoft provides to our customers. We strive to be the Trusted Advisors for our customers. Others have Know-How, we have Know-What. We want to understand your business problems and ideas for altdata analytics. Then, we’ll help you ingest and enrich the data using our cloud solutions. Technology alone cannot solve these problems without smart people and processes that work. We offer services ranging from human-centered Design Thinking Workshops – where we help you determine which use cases are the best for altdata – to hackathons where we quickly ingest some altdata, do the semantic enrichment with you, and quickly determine if the altdata provides lift.

Listen, we aren’t experts in your business, but we are great enablers. Within a few days we can build a rapid prototype and show you the Art of the Possible. We’ll show you what it takes to start a data sharing initiative and we’ll help you solve data problems in days that would’ve taken months in the past.

Does that sound compelling? Contact me on LinkedIn and we’ll get you started on your journey.

Are you convinced your data or cloud project will be a success?

Most companies aren’t. I have lots of experience with these projects. I speak at conferences, host hackathon events, and am a prolific open source contributor. I love helping companies with Data problems. If that sounds like someone you can trust, contact me.

Thanks for reading. If you found this interesting please subscribe to my blog.

data science Digital Transformation data architecture etl data lake