We established a safety data repository drawing from a client data lake. These data contained a variety of potentially related factors. This allowed us to go back and reconstruct the facts and context for incidents for when and employee was involved in a derailment or rule violation.
A statistical risk model produced risk scores for employees, showing the likelihood of them being involved in a derailment or rules violation incident in the next 30 days
Factors included discipline events, work schedule, rules exams, attendance and absenteeism, drug and alcohol tests, furlough, and human resource information. We also performed text mining on incident reports to augment this structured data with themes..
The employee-level scores and quantification of the risk factors were built into an interactive PowerBI application.