The Opportunity
- An employer of tens of thousands of employees maintains a hotline for employees to anonymously report violations of the law or of company policies. These reports may cover issues such as discrimination or harassment, workplace violence, violation of environmental, health, or safety laws, theft of, or improper use of company property, the use of drugs or alcohol in the workplace, fraud, and so on.
- The company’s legal department wanted to see if they could correlate the data from this system to understand and project trends in litigation. They wanted to identify patterns of cases that could be indicators of future legal cases or complaints filed with government entities.
Our Approach
- The client provided us with detailed information about past legal cases. The data from the hotline was unstructured text data, so the challenge was in making a connection between these two.
- Using text mining tools, we organized the text of the hotline reports into topics which were then clustered into groupings, such as “sexual harassment”, “smoking policy violation”, “inappropriate comments” and “violation of safety rules”.
- In conjunction with an overall time series and modeling analysis of cases, we incorporated the themes uncovered in the text mining exercise to produce a risk analysis related with the various themes.
The Impact
The findings and recommendations included:
- We quantified the degree to which the various factors, such as “retaliation” or “injury – medical leave denied” increased or decreased the risk of litigation.
- We showed that the current portfolio of cases carried more risk than the portfolio a year earlier, and that those cases were more likely to be litigated than a year ago.
- We provided a deeper look into the topics and themes to help case experts understand where to focus effort to reduce litigation risk.
- The company had been manually classifying cases previously, but due to changing practices in capturing structured data, we demonstrated that text mining the raw text proved more reliable than manual classifications.