Predicting and Understanding Adverse Rail Network Events

The Opportunity

Railroads can be severely impacted by interruptions to the flow of trains on their network. This case study represents of one of several projects we have done with our rail clients.

Interruptions can be caused by many reasons, but some of the more common ones are stalls, stops, separations, or derailments. And within categories such as these there can be many types. For example, a locomotive may stall due to mechanical issues.

This can be costly to network performance. The ripple effect of a stall can take many hours or days to work itself out of the system. A key performance metric for the industry is “system velocity” – the average speed at which freight moves across the network.

Our Approach

We compiled a very large dataset of over three million train trips. It included information about the route, the consist of the train, the number and positioning of locomotives, horsepower, the number of loaded and empty cars, throttle position, and so on. To that we appended detailed information about stall incidents. Our approach had these two objectives:

Attribution: understanding the various causal factors and their relative importance.
Prediction: identifying future potential stalls with a high likelihood potential.

The former focused on overall planning, whereas the latter focused on daily operations. We used machine learning to address both objectives. “Explainable AI” was important, as we needed to understand the drivers, not just make predictions.

The Impact

The attribution part of the machine learning model provided a rank-ordered relative importance score for the factors. The top factors, among others, included things such as route, train category, tonnage, length, throttle position, and various aspects of the locomotive.

The model also provided a quantified impact within these factors, such as the routes most likely to increase the probability of stop, and the locomotive models most responsible. Though the client had hunches about problematic locomotives, the model was able to parse out all other contributing factors and isolate those locomotive models most responsible.

The prediction outputs showed in a holdout analysis that in the top 5% of most likely trips to have a stop, we could predict those stops 3.2 times more often than randomly guessing.