Detecting the cadence of your client's data.
I’ve seen a similar problem at work and we tried multiple solutions but only one allowed us to not have to scan hundreds of table rows and graphs to identify interruptions in data delivery. The most recent solution has had some success by calculating a score based on the predicted cadence and the amount of time its been since the last data delivery.
Previous solutions attempted to show a graph so the user could see at a glance if a data delivery was late. But this was error prone. Other attempts were made to classify clients as daily/weekly/monthly because these were the terms we used. . Thinking that tracking daily uploads would look something like this.
Weekly: 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0, 1,0,0,0,0,0,0
Monthly: 1,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0, 1,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0
But the data was never this clean. Days would be
skipped 1,1,1,0,1,0,0,1,1,0,1,1,0,0,1,1,1,0,1,0,0,1,1,1,0,1,0,0,1,1,1,0,1,0,0,1,1,1,1,1,0,0,0,0,1,1,1,0,0,1,1,0,0,1,0,0,1,0,1,1,1 or delayed 1,1,1,1,0,1,0,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,0,1,0,1,1,1,1,1,0,0,1,1,1,1,1,0,0,1,1,1,1,0,1,0,1,1,1,1,0,0,1,1,1,1,0,0 or both 1,1,0,1,0,1,0,1,0,1,0,0,1,1,1,0,1,1,1,0,0,1,0,1,1,0,1,0,1,1,0,0,1,0,0,1,1,0,0,1,0,0,1,0,1,1,0,1,0,1,0,0,0,0,0,1,1,1,1,0,0
And few of our clients were sending data this often. Most sent data weekly, but not necessarily on the same day of the week. 1,0,0,0,0,0,0, 0,0,1,0,0,0,0, 1,0,0,0,0,0,0, 0,0,0,0,0,0,1, 1,0,0,1,0,0,0, 0,0,1,0,0,0,0, 0,0,0,1,0,0,0, 0,1,0,0,0,0,0
Eventually we realized that each client had its own cadency and its own level of consistency. We needed a way to identify a client's cadence and consistency. Then use those number to determine if the current gap is normal or an anomoly.
We had data. We knew on what days the client sent us the data. We could identify the average and standard deviation of how long a client would go without sending us data.
So with this method I can calculate the number of standard deviations the gap for that paticular day. This chart is looking back over time with the most recent data points being to the most right. 1,0,0,0,0,0,0, 0,0,1,0,0,0,0, 1,0,0,0,0,0,0, 0,0,0,0,0,0,1, 1,0,0,1,0,0,0, 0,0,1,0,0,0,0, 0,0,0,1,0,0,0, 0,1,0,0,0,0,0 Whenever the blue line crossed went above 1.0 then that means the current gap was larger than 85% of the previous gaps. In our usage we contact the client when the standard deviation hits 2. This means the gap is 97% larger. Sorting the list of clients by this table gives us an idea of which clients need to be contacted.
After 3 months the report has been reasonably successful. Our customer service represenatves are rarely blindsided by calls from clients asking why isnt new data appearing in their accounts. In actuality since we often cal the client we're finding that roughly 50% of the time the client knows their data transfer is down and the other 50% of the time they did not realize that they had stopped sending us data.
Of course this only adds one dimension. There are requests to add data volume to the score. And customer's whose cadence stretches across a period of a year are difficult to predict. We also have to be careful with new clients who we don't have any historical data with. However this report provides the confidence in that our existing client base is covered so we can spend more time getting our new clients brought into our system.
Feel free to comment on the post but keep it clean and on topic.comments powered by Disqus
Software Development for Linux/Unix since 1995. I've done everything from Perl,C/C++,Java,Flash,php, ruby and currently node.js. Always interested in pushing technology one step further than expected.