At the heart of Protenus Compliance Analytics platform lies the Artificial Intelligence (AI) engine that monitors millions of accesses to patient data every day and determines how likely it is for each to warrant further investigation. This AI engine, or the “classifier" as it’s known in data science, is a relatively little nugget compared to all the different processes our clients’ data go through every day to produce cases which are narratives explaining why certain interactions have been flagged as suspicious. Just a fraction, less than 2%, of a day’s run is spent on classification, while the rest goes to pre-processing, data aggregation, feature calculation, and alert and case generation.
The classifier, however, is the end result of a rigorous and ever-repeating cycle of research and development that involves collaboration among data scientists and data engineers (the usual suspects), but also members of our DevOps and customer success teams. This post describes how our team takes a classifier from research all the way to production (and back).
The classifier needs a set of examples for suspicious and appropriate behavior so it can learn from those examples and accurately categorize new data that we process every day. We have two sources for this training data:
- Set of clients who give us permission to use their case resolutions for research purposes.
- Our experts who use their years of experience working in healthcare workflows to assess the suspiciousness of accesses that did not result in a case.
This second source of labeled data is crucial to the performance of the classifier, as it enhances the diversity of types of accesses and helps us rely not only on resolutions of cases that we have sent to our clients. These two types of resolutions, along with a set of characteristics of those accesses that our research has found to have strong information on suspiciousness, which we call “classifier features,” constitute the dataset that we use to train the classifier.
Classifier training and performance evaluation
Protenus data scientists conduct research on finding the best performing classifier using Python in Jupyter environment, which is part of the standard toolkit for data science. Our team employs a rigorous methodology that uses data science best practices, such as hyperparameter tuning and k-fold stratified cross-validation. The team investigates numerous classifier versions using standardized evaluation plots such as Precision-Recall curves and score density distributions before picking a classifier that delivers the best performance overall and in each case category.
Release testing and preparation
A crucial part of the pre-release evaluation is to test the classifier in a production-like environment where we could monitor how the classifier would have worked if we had released it. Fortunately, Protenus’ DevOps team has set up a development environment that closely mimics our production environment, complete with production data copied over from a few clients who have given us permission to do so. We run a day's analytics in the development environment to make sure that our pipeline runs with the new classifier without any issues. We also check that the distribution of scores across alert categories and across all accesses reflects what we have observed in our research. These sanity checks ensure that we don’t have any surprises once the classifier is released.
The actual release of the classifier is the most straightforward part of this whole process, thanks to the design of Alert Generator and the analytics pipeline. We simply change a small configuration parameter to point our analytics to the new classifier, drop the actual classifier file to a specified production S3 bucket, and — voilà! — a new classifier is now in production.
Post-release performance monitoring
Our job is not done once the classifier is released. We keep a close eye on alert volumes and use a standardized set of graphs and metrics to monitor alert volumes, case resolutions, and false-positive rates across all clients and all alert categories to make sure that the classifier is performing as expected. Protenus’ data science team also collaborates with the customer success team in this process. Customer Success Managers systematically collect any and all feedback from clients about alert quality and bring these to our team’s attention.
Furthermore, the data science team and the customer success team hold biweekly meetings to discuss feedback from customers as well as false positives (what did we get wrong?) as well as false-negative patterns (what did we miss?) and to chart actionable next steps. Even though I summarize performance monitoring as a step here in the release cycle, this process actually never stops. We are always watching how we are doing and always thinking about the feedback from our clients.
New feedback from our clients and new case resolution data that keeps on coming every day prompt a new cycle of research-release-monitor, which, like performance monitoring, never really stops. We are currently working on an about-monthly classifier release cadence, which means that the classifier would always reflect recent data on case resolutions and the feedback from our clients gets regularly incorporated into our analytics.
I hope this article gives you an overview of the steps involved in Protenus classifier research and release process. Stay tuned for future blog posts where we will dive deeper into each of these components.