The judgment machine: How algorithmic software won me over

The New Normal

The judgment machine: How algorithmic software won me over

July 28, 2016, 8:30 am CDT
By D. Casey Flaherty

D. Casey Flaherty

You’d think that lawyers would be attracted to written rules.

Last post, I drew the link between our mental shortcuts and the operation of computer algorithms. In many instances, algorithms are simply the formalization, codification, and accumulation of rules that human beings have derived through experience and our natural facility for pattern recognition.

The accumulation part is key. While machines traditionally struggle with nuance and subtlety—i.e., the rules we can’t formalize—they long ago surpassed us for focused computational power. They can consistently apply large collections of rules to large data sets.

In discussing analytics, I revealed my all-too-human shortcomings in not being able to see patterns in billing data because I was too close to it. Reviewing invoices line-by-line, matter-by-matter, month-by-month made it impossible to discern what was happening across matters and months. I could not see the forest for the trees. But it turned out I was also bad at seeing the trees because of the finite number of heuristics (i.e., mental shortcuts) I had the time and attention to apply.

One of the resource-saving shortcuts I used when reviewing invoices was to focus on where the money was going. Size as a signal. I would direct my finite time and attention to concentrating on bigger blocks of time. To the extent I was alerted to smaller blocks of time, it occurred when a new timekeeper entered the picture. Novelty as a signal. Personal experience has made me sensitive to rotating casts of junior people who need to be brought up to speed on a matter.

These shortcuts are defensible in the abstract. But I’ve recently learned how myopic they made me. I was testing new software that instantly runs hundreds of algorithms against every line item across every available invoice. Because it does not have time or attention constraints, the machine is able to apply almost 600 human-created rules to each narrative entry. To test the application under real-world conditions, we downloaded more than $10 million in billing data from a bankruptcy (i.e., publicly available) that did not have a fee examiner. The data was fresh—the machine had never seen it before, and the vendor had no opportunity to make adjustments to ensure the software looked good.

The machine won. Or, rather, the machine won me over. Instead of drawing conclusions, the machine highlighted problematic entries in red and, when I hovered over them, revealed the reason for the flag—e.g., vague entry, block billing, skills mismatch, repeat entry. The calls were still mine to make. I accepted about half of the suggestions. On the other half, I overruled the machine.

It could be argued that half of my time was wasted. Except the machine only flagged about 20 percent of the total entries. So instead of looking at every entry, I only looked at one-fifth of them. Granted, I spent more time per entry than I would have averaged over the entire data set. But that is because every flagged entry was worth reviewing in detail and in context. Overall, I probably cut my review time by half.

In half the time, I also probably quadrupled my output. If I am being honest, I would have been lucky to identify a quarter of the problematic entries without algorithmic assistance. This is, in part, because the machine was more consistent in applying the same rules over and over. Reviewing 15,000 time entries numbs the brain. But it is also because the machine’s algorithms, which are constantly being refined, looked at items that probably would not have occurred to me but did occur to the lawyers who created the rules. They had spent more time than is healthy thinking about how legal invoices are constructed.

For example, the machine flagged many 0.1-hour entries. Taken one at a time, I would have been inclined to ignore those because of my size heuristic. But the flags were presented both individually and in aggregate. One of the aggregate views revealed that an individual timekeeper had 808 separate entries of 0.1 hour. Over 165 days, the timekeeper averaged almost five 0.1-hour entries per day. Indeed, 0.1-hour entries accounted for more than 80 percent of the timekeeper’s entries. Overall, the timekeeper had only six time entries of more than a single hour and no entries of more than two hours.

From a distance, it sure looks like the timekeeper was a passive bystander who gamed the system by technically playing within the rules—i.e., recording 0.1 hour every time he or she spent 20 seconds to glance at something that came across email. At the very least, a conversation was warranted. It was a conversation I may not have initiated if I had been reviewing these invoices in a traditional manner.

Another example is a lawyer who really did put in work on the case. A lot of work. The timekeeper’s median (1.0-hour) and average (1.7-hour) entries were well within a reasonable range—i.e., no size trigger. But there were 996 entries spread out over just 219 days. The timekeeper actually billed 1,713 hours to a client in a 7-month time period, an average of 7.8 billable hours per day. At a billable rate of $825 per hour, the total exceeded $1.4 million dollars.

For comparison sake, when the billable hour first came into vogue, the suggestion was that lawyers strive to hit a quota of 1,300 hours per year. This lawyer, who only had six days with zero billing to the client, was on pace to bill a single file 2,779 hours and $2.3 million in a one year. A client who was paying attention would have, at the very least, questioned whether some of that work could have been delegated to lower-cost resources. But that alert client would have been applying something other than my novelty heuristic.

The machine also confounded my expectations because these examples are the inverse of the nitpicking I expected. The conversations would have been broad and strategic instead of, as I anticipated, arguing over individual 0.1-hours or whether a 3.2-hour should have been a 2.3-hour. The machine would have provoked conversations. But, of course, the machine could not have conducted those conversations. Just as the judgment calls were still mine to make, the findings would have been mine to communicate and discuss.

On one hand, reducing invoice review time while increasing review quality is an attractive value proposition. On the other hand, it is a bit boring and far from transformative. That’s kind of the point. It’s a tool. The machine serves as an instrument to make work less painful and more productive. Then again, because it is scalable and tractable, the approach has far more potential than saving clients a few bucks off their invoices. More on that in another post.

D. Casey Flaherty is a consultant at Procertas. Casey is an attorney who worked as both outside and inside counsel. He also serves on the advisory board of Nextlaw Labs. He is the primary author of Unless You Ask: A Guide for Law Departments to Get More from External Relationships, written and published in partnership with the ACC Legal Operations Section. Find more of his writing here. Connect with Casey on Twitter and LinkedIn. Or email [email protected].