- A transportation company maintains a database of accidents. Along with data about each incident, textual narratives and descriptions of the accidents are included.
- The company wanted to include this “unstructured data” into some of their quantitative analyses for safety.
- We used text mining techniques to uncover the salient topics buried in the narratives, such as excessive speed, or wheel failures. Text mining techniques are adept at taking many alternative words, abbreviations or misspellings and distilling them down to topics.
- Text mining turns the source textual data (sometimes called “unstructured data) into numeric data that we can use in other quantitative analyses. We used the transformed text data in regression analysis to help us assess the text’s ability to improve predictive modeling.
The analysis provided four findings and recommendations:
- The text mining revealed which topics were important and most prominent across all of the reported accidents.
- The outputs from the text mining model were used in a model to estimate the cost of each accident. Inclusion of these outputs more than doubled the accuracy of the model.
- Interestingly, a time series analysis found no changes in the underlying trends in the factors uncovered by the text mining model.
- As part of a development roadmap, we identified additional attributes of each accident report which should be a required inclusion in future accident reports.