Moving Past Regression and Into the Future of Complex Data


One of the biggest differences between millennials and prior generations is our comfort with and exposure to high technology from a very early age. When other generations reminisce about their car phones, brick phones, or 14.4 modems on desktop behemoths, I cannot empathize. I grew up in an era, and in a tech heavy household, where technology had already become consumer oriented. Even 15 years ago, we could choose between functions, colors, speed, memory, screen resolution, etc. Manufacturers were already personalizing products to fit your needs and the idea of individual personalization has only accelerated in the subsequent years.

The concept of personalization seems ubiquitous now, except in the world of insurance underwriting. 

The methodologies we are using to underwrite insurance today are much like the technology products of decades past. One size fits all. In this article, we’ll elaborate on why today’s regression-based data modeling techniques lead to overgeneralizations and inaccurate claims cost predictions, the business impact of these overgeneralizations, and how using alternative algorithms can result in more accurate predictive modeling.

Today’s data models for insurance are broken.

While insurance data modeling and risk predictability has vastly improved over the years with advancements in machine learning and big data, the methodologies by which most ML platforms are scoring and creating data for today’s insurance markets are still flawed. 

The modeling methodology, known as regression, may have historically been the leading option for predictive data sciences due to its ability to be readily operationalized, but with its limited ability to accurately predict risk for unique subpopulations, it’s opening the door for more valuable predictive modeling tools and techniques.

Let me elaborate.

Within a regression-based machine learning algorithm, outcomes are scored and then predicted on the linear expressions of the data set, or what holds true across the entire population. For example, let’s examine age and health within insurance claims. It is broadly accurate to say that higher age is predictive of higher medical claims, which holds with conventional thinking and quantification. The obvious flaw in this logic, and likewise the flaw in regression-based modeling, is that what is broadly accurate for many, will be completely inaccurate for others.

Consider alcohol spend and consumption. In some segments of the population, an increase in spending on alcohol linearly correlates to an increased risk of alcoholism, and therefore, claims costs. Common sense, right? But not everyone who buys alcohol exhibits alcohol abuse. Consider wine collectors, liquor connoisseurs, and the average consumer, who may spend large amounts, but do not imbibe in excess. Applying a regression-based understanding to this segment of the population would portend a higher rate of disease and medical claims, when in reality, the opposite tends to more frequently be true. 

In a world where we track and measure a profusion of individual choices and data points, shouldn’t our underwriting capabilities be able to capitalize on them?  By nature, we are accustomed to searching for the “silver bullet” to solve a challenge, but the world of data is rarely that simple. The proviso of more complex data is a new way of modeling.

The future lies in a complex data network. 

An alternative to regression modeling, and what we use at Verikai in our Capture platform, is the utilization of complex data interactions and unique ensemble modeling. Instead of comprehending only a limited number of variables and their global impact within a specific expression, the systems can analyze all data points available and their trillions of unique interactions on the outcomes we actually care about (continuous, binary, financial, medical, etc.). The result allows for the definition of subsets of the data (or population) that share similar characteristics to also be classified even further in accordance with their differences. This facilitates the isolation of smaller and smaller segments of the population, with distinct expressions of risk and opportunity, that can be modeled and quantified more precisely than ever before. 

This concept, a “binary ensemble”, allows us to produce a predictive profile down to the smallest, statistically valid subpopulation possible, as opposed to broad swaths of the general population, where highly predictive variables and their interactions would be washed out.  If modeling is not your native or adopted tongue, the implication is simple: the modeling techniques we are employing at Verikai lead to deeper and more predictive insights on smaller segments of people. We are leading the charge in personalized underwriting.

Compared to models that use linear regression, this unique process yields a group manual based on hundreds of personalized subsegments of the population, whereas historically, the most granular models were limited to generalized demographic information such as age, gender, and location.  To borrow our earlier metaphor, we’re moving the industry from car phones to iPhones.   

Combine this methodology with a robust, trillion-point database (behavior, health, finance, purchasing), and a platform that organically tests and applies up to 150 different ML processes to produce the most accurate results, and you can see why Capture is the superlative in increasing the efficacy of insurance models today.

Why insurance pros should care.

So what does all of this mean to our underwriter, actuary, and broker partners? We’ve all felt the challenges of underwriting a group with limited or inaccurate data, but what hasn’t been as openly discussed are the limitations of data modeling, even in best-case scenarios where your data is complete and accurate, to produce the right outcomes. 

Capture gives you access to cutting-edge technologies that enable you to predict and stratify risk with a level of precision and efficiency that has never before been available. And as you know, more precise risk analysis means more competitive pricing, lower losses, and business growth. Plus, it allows the entire business channel, including brokers, to provide the employer with the confidence and evidence they need to select the best products that control costs, while meeting the needs of their employees.

That’s a win for carriers, a win for underwriters, and a win for brokers. 

The future of data technology is evolving and changing as we speak. We can’t wait to see what comes next. Will you join us for the ride?

Top Posts

Subscribe for insights

Related resources

Ready to unlock the power of data?

Maximize accuracy. Minimize risk.