Capture everything, there is value even in the lies!

In conversations with insurers globally we at Celent are hearing of a new approach to analytics. Perhaps not called Big Data but a different approach, one that seeks to leverage data far more quickly and be more tolerant of the errors in the data. There is a move to understanding that all data is useful, but baby steps so far. Still though, I often hear about truth, fact and consistent data in discussions. When thinking about data this idea of the truth has always bothered me – the idea that system data represents the facts, or the unassailable truth. One of the key activities in establishing classic analytics processes is establishing which data is the truth, there are always arguments about which data is accurate and can be trusted. In this process inaccurate data, or data that doesn't contribute to this truth is ignored or removed, lost. This leads to a negotiation process and the end result is often called the single version of the truth – i.e. the output of a report that all stakeholders agree to. The strange thing about this process is that it observes that there are multiple viewpoints, but seeks a single truth regardless. Relational database design and modern user interfaces push us to this line of thinking, there is only one field to fill in, one answer to each question after all. I suggest that there is value in capturing the half-truths, the out-right lies and technology now let's us analyse these semantically. It’s easy to come up with examples from the insurance industry where we regularly accept that the data is likely flawed. For instance the original quote data says the vehicle is a standard build but the claims adjuster spots the alloy wheels and rear parking sensors. In the case of an accident in many motor claims the insured makes a statement that there was a crash and the other driver was in error. The other driver also makes a similar statement, saying that there was a crash and the insured was at fault. Most modern systems capture all of this data, the different views over time, the different views from different stakeholders - but still most systems and processes assume that at a given point there is one set of valid data, one driver at fault in the last example. Now that customers are posting to social media insurers face more questions – what if what an insured stated at time of purchase is contradicted in their Facebook profile? Was that tweet accurate or just posturing on the part of the customer? How should the insurer, or rather the automated systems analysing this data, treat these contrary positions? There is factual data that is true – the fact that the witness statements were made, the date and time when they were captured, who made them, regarding what case. What of the pertinent data though, the data the humans actually use in determining the case or what should be done next, the data that allows us to reason about the case and to make a judgement? This information is typically stored in free text formats, requiring humans to interpret the data and do what humans do well – establish hypotheses and test them ultimately selecting the one they feel fits best and recording that result as fact. Again, it’s a fact that Bob the claims handler felt on the 1/1/12 that the insured wasn’t at fault – but is that what is recorded? Or is it the assertion that the insured was at fault, recorded as the truth and not a hypothesis - with an audit trail to who updated the system? If one thing the Big Data movement has taught us, the exploits of Google, Amazon, etc. it is that All Data is Useful. Capture everything! Why you ask? One example - there exist algorithms and systems that allow analysis of competing hypotheses, capturing of how credible or likely an assertion is based on the believability of the source of the underlying data. What if your system could highlight how plausible the insured’s data or statement is, or a witnesses testimony, or the data from a third party based on the information at hand? What if your core system presented options rather than an answer derived assuming everything in the system is correct? Truth then is not something best derived from raw data after the fact, but rather something that requires consideration as the data is being collected. Data, knowledge and information collected in the right way will allow future systems to help insurer staff reason about the data and be more effective. The insurance industry is, however, sat on a gold mine of raw data and as the insurance industry starts to mine it's data, to leverage it for new insights, I suggest insurers will seek new models to better understand the knowledge therein. Those insurers that will emerge as leaders will capture all the data that they can, will understand that some of that data is contradictory and model it in such a way that software can support decisions about the data rather than leave the grey areas to the human operators. Capture everything, there is value even in the lies! What’s your view – is there a single version of the truth in insurance? How are you dealing with contrary data? Have you already solved this? For those clients interested in Big Data and Semantic Technologies these reports may be of interest:

Author

Craig Beattie

Research & Advisory