Technology Assisted Review (TAR), also known as Computer Assisted Review, Predictive Coding, Computer Assisted Coding, and Predictive Ranking, has been around for 50 years, but is now becoming incredibly useful in the legal field. This technology can speed up cases of all kinds and greatly reduce discovery costs for their clients. But how do lawyers...
John Tredennick is the founder and chief executive officer of Catalyst. A nationally known trial lawyer and longtime litigation...
Sharon D. Nelson is president of the digital forensics, information technology, and information security firm Sensei Enterprises. In addition...
John W. Simek is vice president of the digital forensics and security firm Sensei Enterprises. He is a nationally...
Technology Assisted Review (TAR), also known as Computer Assisted Review, Predictive Coding, Computer Assisted Coding, and Predictive Ranking, has been around for 50 years, but is now becoming incredibly useful in the legal field. This technology can speed up cases of all kinds and greatly reduce discovery costs for their clients. But how do lawyers learn about TAR? After all, we’re not dummies.
In this episode of Digital Detectives, Sharon Nelson and John Simek interview John Tredennick, the CEO of Catalyst Repository Systems, about his new book “TAR for Smart People,” what exactly TAR includes, and specific ways it has helped companies reduce discovery costs. Tredennick begins by explaining the three elements of TAR: teaching the computer algorithm, the algorithm orders review documents by estimated relevance, the lawyers decide what to do when the algorithm presents no more relevant documents. In other words, the computer algorithm continues to learn which documents are relevant to the case based on the current reviewers, and puts potentially important ones on the top of the pile, as it were. Tune in to hear Tredennick describe how this works using a Pandora metaphor, explain each project’s process, and discuss the increased effectiveness of what he termed TAR 2.0.
John Tredennick is CEO of Catalyst Repository Systems, which offers the world’s fastest and most powerful document repositories for large-scale discovery and regulatory compliance. Before founding Catalyst, he spent over twenty years as a nationally-known trial lawyer and litigation partner at a major national firm. He is the author or editor of five legal technology books including his latest, “Tar for Smart People,” which he co-authored with Bob Ambrogi.
Advertiser: Welcome to Digital Detectives, reports from the battlefront. We’ll discuss computer forensics, electronic discovery and information security issues and what’s really happening in the trenches. Not theory, but practical information that you could use in your law practice. Right here on the Legal Talk Network.
Sharon D. Nelson: Welcome to the 63rd edition of Digital Detectives, we’re glad to have you with us. I’m Sharon Nelson, president of Sensei Enterprises.
John W. Simek: And I’m John Simek, vice president of Sensei Enterprises. Today on Digital Detectives, our topic is Technology Assisted Review for Smart People. We’re delighted to welcome as today’s guest, John Tredennick, the CEO of Catalyst Repository Systems, which offers the world’s fastest and most powerful document repositories for large-scale discovery and regulatory compliance. Before founding Catalyst, John spent over twenty years as a nationally-known trial lawyer and litigation partner at a major national firm. He is the author or editor of five legal technology books including his latest, TAR for Smart People. Thanks for joining us today, John.
John Tredennick: Well, thank you both, Sharon and John. I’m looking forward to it. It’s a great topic and it’ll be a lot of fun and a get together of old friends.
Sharon D. Nelson: It is a reunion of sorts.
John W. Simek: Long time, isn’t it? Not old?
Sharon D. Nelson: Old is a dirty word, my great aunt always tells me that. John, the first thing I’d like to do is to thank you for sending us a copy of your book, TAR for Smart People, which I’ve had a chance to look at and I just think it’s phenomenal. So I hope that you’ll answer two questions for me because I’d like other people to have the chance to get this book. So please let them know where they can get a copy of the book and how, and where you got the title for the book itself.
John Tredennick: Thank you for asking, it’s kind of a cute title. There are a lot of books out, For Dummies, the whole Dummies series, and I was surprised when I saw a couple of books out by other good companies called Predictive Coding for Dummies. And I thought, gosh, I’ve been in this industry for 30 years, I’m embarrassed to say, or more, and I’ve met very few dummies, rather a lot of smart people who wanted to learn something about technology assisted review. So I told people in the book that I was writing for them, the smart people who want to learn about TAR. And where am I to get this? We did this as a public service. It’s available for free, you can download it off our website or if you write to me, I’ll send you a print copy of the book. But you can get it and I hope you enjoy it.
Sharon Nelson: I know they will, and many of those who are listening are also fans of Bob Ambrogi, and he’s a co author of the book so that’s another good reason to pick it up.
John Tredennick: Well, I’m a fan of Bob Ambrogi. He’s our communications director and we’ve worked together many years. He’s part of the book and he helped tremendously in putting it together.
John W. Simek: That’s why it’s in the title, because he’s one of those smart people.
John Tredennick: He is indeed, and so you are too.
John W. Simek: John, we’ve heard a lot of different terms revolving around all of this stuff. Can you tell our listeners what is technology assisted review and how is it different from predictive coding or computer assisted coding?
John Tredennick: Well, there are a lot of different names for this process. Computer assisted review is what it was called in really the first cases. Predictive coding is a term some used, intelligent review. We always called it predictive ranking because the system always ranks documents. But the industry settled or the Sedona Group has settled on technology assisted review or TAR, and some people get confused and think, “Well, guys, I use technology to research case law and I use it to write my briefs, so everything’s technology assisted.” And that’s true enough, but many in the industry have settled on technology assisted review really to describe that predictive coding process, computer assisted coding.
Sharon Nelson: So we know that we have a lot of smart people that are going to be listening to this, but even the smart people, many of them don’t know a whole lot about technology assisted review. So can you explain what it consists of?
John Tredennick: Sure. As much as some people try to make this process complicated, it’s very simple. Complicated behind the scenes, perhaps, but so is running your car these days, but the process itself is simple. There are really three things involved, and the way it works is stuff we know well. The three things involved are these: point number one, TAR. Predictive coding, predictive ranking. It’s simply a process through which humans work with a computer to teach it to identify relevant documents. Just that, to identify relevance. And by relevance, I’ll take my trial lawyer hat off for a moment. I don’t mean relevant as might be defined for admissibility at trial. I mean something more broad. Relevant to whatever your inquiry is. As I’m sure we’re going to discuss, you use this for a lot of things. Relevant is a digital statement, a yes or no statement. If it is relevant or responsive to my inquiry, that’s a thumbs up and if not, it’s a thumbs down. So it’s a process by which humans work with a computer. And secondly, and most importantly, it is a process by which the computer orders the documents for your review by relevance and does so to make the review far more efficient. And the third element to technology assisted review – and this frankly is option, but it’s the one most people focus on. It is the notion that if we teach the computer about relevance and it gets smart about it, and if it orders the documents placing the relevant ones at the front, then we have the option of stopping our review before we’ve gone through 100% of the documents. We stop the review after we have found a sufficiently high percentage of relevant documents that is reasonable to stop. And the whole key to this savings ties into these three essential features of technology assisted review. If you can order the documents by relevance, you’re looking at the most important ones first and you’re looking at similar kinds of documents, similar themes. So your review is more effective than a linear review, which is bouncing around from a newsletter to a memo on new HR policies to The Smoking Gun to three emails about lunch, et cetera. If The Smoking Gun’s grouped, you’re more efficient on that end. And then if you choose to review everything, you can move more quickly through the irrelevant stuff. But ultimately, the big thing is the courts allow and recognize that it’s reasonable to stop that review after you’ve identified a reasonable percentage of relevance, such as 80%. So those are the three elements of it, and yet the process is so simple, we use it everyday.
John W. Simek: So John, tell us a little about why TAR is so important.
John Tredennick: You bet. So let me give you an example. One of our clients, a bigger client to be sure, but this applies to anybody. This client was a bank and they were responding to a demand and they had collected 2.1 million documents that they had pulled and tried to remove the junk from and done every trick in the book, but they were down to 2.1 million documents. A quick sample of the 2.1 million documents suggested that maybe one out of a hundred of these documents was actually relevant, meaning to be produced and brought to the case, whatever. Yet they were faced with costs of what could be 6, 7 or 8 million dollars of review if you used $2 or 3$ a document of human review. Well, the client was not happy about those kinds of costs merely for production, and no client is. So we suggested, let’s try a different approach than the traditional linear review, the traditional start from the beginning, go to the end and when you’re done, you’re done. We said let’s use this new TAR process, a protocol of TAR we call continuous active learning. In that world, as the review progresses and the team marks documents relevant or not relevant – and as I mentioned, responsive or not responsive, whatever you want to call it – the computer algorithm watches and learns from your behavior. It gets smarter about what terms are most commonly found in the documents you’re marking relevant and what terms are most commonly found in the documents marked irrelevant. And as it’s getting smarter, it is reordering the deck on a continual basis to present the reviewers with the documents it thinks are most likely relevant, and the reviewers give an instant feedback because they are tagging and the algorithm is learning. A lot of people try to make this complicated, but the fact is every one of our listeners who uses something like Pandora internet radio is doing a process very much like TAR. What do you mean by that? Well, Pandora internet radio has one goal in mind. They have tens of millions of songs, and their only goal is to play music you like. How do they know what you like? Well, they could start randomly and start with a polka or symphony, what have you. But instead, they say, “Why don’t you start me out by telling me the artist you like.” So I like Jimmy Buffett, for example, and I might say that. Well, Pandora will oblige, typically, by playing a Jimmy Buffett song; maybe A Pirate Looks at 40, one of my favorites. So I like that and I’m happy. And the next song comes up, and this time it’s by somebody different. The algorithm says, well if he likes Jimmy Buffett, he might like Zac Brown. And those of you who know Zac Brown know he’s a lot like Jimmy Buffett, I think he’s the Jimmy Buffett incarnate and I like Zac Brown. So I now give Pandora some feedback and click on the green thumbs up button and say yes, I like that, play me more Zac Brown on my station. Well, the next song comes up and this time, perhaps it plays a Toby Keith song. And some of you will know that Toby did some duets with Jimmy Buffet and there’s some similarities and differences. But this one doesn’t work for me. To me it’s too much country. Forgive me, this is a hypothetical country music listeners, but I give it a thumbs down, I hit the red button. Now what I’ve done is I’ve educated the algorithm about what I like and don’t like. And with Pandora, as just about everybody these days knows, it continues to cycle it. It might throw in another Buffett song and Zac Brown and it might try a few others. But very quickly, it gets amazingly smart about the kind of music I want to listen to and I can create as many stations as I want. We’ve done the same thing for TAR. So imagine this team with 2.1 million documents, 1 in 100 are relevant, and that means they’d have to click through 100 each time just to find one. But instead, we give it some documents to start, just like seeing Jimmy Buffett. And it could be as many documents as you want and you could find through interviews, searches, you name it. But I feed it in 1 document, 100 documents, 50,000. In some cases, we’ve done where let’s say there’s been an earlier review and they’re already tagged. So we feed those in and I’ve immediately have told the system something about what I’m looking for. Now the review team gets a new batch and, to make it simple, predict our inside product, feeds them 100 of the most relevant documents based on what we’ve learned so far. Well, early on, maybe only five are marked relevant because the algorithm’s learning. But in this case, the team began to see very quickly that 25 and 30 and even up to 35 of the hundred were tagged relevant. And what that means is the review is progressing 25x, 30x, 35x faster. And in a continuous learning process, there comes a point where you stop seeing relevant documents, and this means that the system’s running out. It’s pushed so many to the front that you’ve succeeded, and there are very few left. In our case, when they’ve stopped, they did what you can always do which is do a sample of what’s left behind to try to estimate how many relevant documents you’ve missed that you haven’t found. So they were being conservative and they surveyed over 6,000 documents and found something like two relevant. And what that meant statistically – and I know this is sort of a course on stats but it’s this simple – it meant statistically that they had found 98% of the relevant documents out of the 2.1 million. And yet, they had stopped the review at 6.4% of the way along. In other ways, reviewing like 130,000 documents rather than 2.1 million. So long way to answer why this is important, it’s important for two big reasons. It saves you a lot of money on review. In that case, 93 or 94% didn’t have to be reviewed and it brings the good stuff to the front, which means you learn about your case quicker.
Sharon D. Nelson: Well, John, that really was a very comprehensive answer. And let me ask you something because I know from reading your book that this is not brand new technology, but I would guess that most people think it is. So can you explain why it’s not?
John Tredennick: It’s funny, we did some digging and found a master’s thesis from a computer science department focusing on what they call relevant feedback in automated document retrieval systems. And what was fun about it was two things. One is it was dated in 1969, which is now – I’m embarrassed to say it – but 55 or 56 years ago when they were talking about this process that it is the same as technology assisted review. The academics call it relevant feedback, a process by which humans interact with a computer to give it feedback about relevance, that’s still what they call it today. And they’ve been researching it for longer than that, but that’s almost 60 years. And the other fun one for me is to think back to what the automated document retrieval system was like in 1969, because it certainly wasn’t anything like what we’re using today. But it’s been going and now these kinds of predictive technologies are used in not just Pandora but weather forecasting and image recognition and we can be sure that three letter agencies are using this technology every day, all day. I don’t know how they get the data that they look at but they have a way.
Shannon Nelson: Trust me, you don’t want to know.
John Tredennick: That’s right.
John W. Simek: It’s called worthless gathering. Well, before we move onto the next segment, let’s take a quick commercial break.
Advertiser: In recent years, the legal sector has come under increasing pressure to improve efficiency in client services. Cloudmask enables law firms and solo attorneys to leverage free and low-cost Software as a Service, such as Google Apps and Office 365 to improve efficiency and client service, while reducing cost and strengthening compliance with data privacy laws and ensuring that legal, ethical duties are met. Cloudmask is even certified by 26 governments around the world. Sign up now for your 60 day free account at Cloudmask.com
Looking for a process server you can trust? ServeNow.com is a nationwide network of local, pre-screened process servers. ServeNow works with the most professional process servers in the country. Connect your firm with process servers who embrace technology, have experience with high volume servers and understand the litigation process and rules of properly effectuating service. Find a prescreened process server today. Visit ServeNow.com.
Sharon D. Nelson: Welcome back to Digital Detectives on the Legal Talk Network. Today our topic is Technology Assisted Review for Smart People. Our guest, John Tredennick is the CEO of Catalyst Repository Systems, which offers the world’s fastest and most powerful document repositories for large-scale discovery and regulatory compliance. John, what is the actual process when you’re involved in a TAR project?
John Tredennick: Well, as much as there is deep, algorithmic technology going on, the process itself is simple. There are just a few steps, at least in the world we live in. The first is you have to collect your documents, and of course you always have to do that. And they can be paper, they can be digital, email, you name it. But you collect them together and you load them into the system and the system does what I call shred the document. And by shredding, I don’t mean physically ripping them up, but it is extracting text and analyzing that text so it can understand the frequency with which words appear in certain documents and commonality of words across documents and the like. So step one is you’ve got to get the documents and you’ve got to get them into the system. After that, it’s just like Pandora, again. You start by giving the system an idea of what you’re looking for. And we’re not in a Siri world yet, where we can say, “Siri, I’ve got a breach of contract case and this looks bad. Please get me the relevant documents. But I do the same thing by feeding in and it could be a part of the complaint, it could be a part of the request for production, or it could be 50 or as I said earlier, a thousand documents that are relevant so it could know to go look for them. After that, again, it’s just like Pandora. The system analyzes what I’ve submitted, it ranks the documents. It then presents to me, batch by batch, documents that it thinks are most likely relevant. I look at them, I mark them, I do my review job, and I continue until the system runs out of relevance or I’m satisfied. I’ve met my objective, you can stop any time that you want or you can review to the very end if you want. But that’s the process. That’s all there is to it other than a final test at the end, if you need it, to conform to the quarter otherwise that we’ve reached our mark and we stopped at a fair point. And that’s done quite simply by a random sample of whatever it is you don’t review.
John W. Simek: Well, John, I guess all things with technology have these upgrades and different versions or whatever. So can you tell our listeners what TAR 2.0 is and how that’s different from the first version, TAR 1?
John Tredennick: I sure can. I actually gave it the name TAR 2.0 because I was trying to describe a newer class of TAR algorithms that work very different from the first generation. And while there are many differences, the simplest is this: The TAR 1 products, and they are most of the ones that we know about, were all built around a one time training metaphor. If I go back to Pandora, imagine that Pandora said, “Okay, give me a list of your artists and then I’ll play you music. But after you click the thumbs up or down, we’re done. And after that I’m going to do my best job to play music that you like.” We went into the marketplace and said why would you stop after one time training often done by the senior lawyer subject matter expert? So we pioneered what we call continuous learning. And later some of the scholars called continuous active learning, and just like Pandora, we said why wouldn’t you keep learning? Your review team looks at 10,000 documents and tags them. Why wouldn’t you feed that information back to the algorithm? Well, it turned out we would show our research on what you think would happen happened. The algorithm got smarter and it found the relevant documents quicker and you reviewd far less. Independent researchers have since come out and validated this in tests that had nothing to do with us, no connection, but they showed conclusively that you review far less documents and you get the relevant ones far quicker; dramatically quicker with a continual learning over a TAR 1 one time training process.
Sharon D. Nelson: I know that there’s a lot of ways that people are using TAR, but can you describe some of them for the audience?
John Tredennick: I can, and the question ties nicely into the distinction between TAR 2 and TAR 1. In the TAR 1 world, you had to bring in a senior lawyer who had to look at many thousands of documents; 2 thousand, 3 thousand, 4 thousand, to get the training right, and it had limitations because if you found new documents, you couldn’t just add them in. You had to do a new training and go over and over. In the TAR 1 world, we were all pretty much convinced that while this process was revolutionary and brought about dramatic savings that it only applied to the biggest of cases. In a TAR 2.0 continuous training, you don’t need need the senior expert. Review teams can jump in with one training document or as many as you’ve got. And what it means is you can use this not only in big cases but in small cases. And in a system like ours, you can have as many different issues going as you want. So you can use it for early case assessment,very quickly figuring out what’s important. And you can use it, of course, for outbound productions, but also inbound productions. You can use it for QC, for privilege review, for preparing a deposition for the deposition of a key witness. So the fun of a TAR 2.0 is that it eliminated just about all the limitations and it opened up this process to where I predict, with an integrated system, people are going to use it every time. If you have ten thousand documents to review, and I could show you or convince you that with a TAR 2.0 system, you could be done after reviewing 3,000 documents and still have found 90% of the relevant ones. Everybody understands, $2 a doc or $3 or whatever, the savings. Even on a small case like that, 7,000 times 2 is $14,000. So I believe you’re going to find this used very pervasively in all kinds of cases.
Sharon D. Nelson: Well, John, I think everybody would be asking what’s the cost to enter. It sounds like a great club, but what’s the cost of entry?
John Tredennick: Well, we’re in a competitive market and the prices fluctuate and as we get the technology nailed down, it drops. It’s going close to zero. What I can tell you for certain is a fraction, a small fraction of the review savings in every case. One of the things we did recently to try to help people get over the hump of the newness or the concern or what they’re paying the fees for is we offer an unconditional money back guarantee on every TAR project. You can use it and at least for 90 days so there’s some boundaries, but if you’re not satisfied that you saved multiples of the cost, just say so. We’ll turn it off and you get your money back.
Sharon D. Nelson: That’s a pretty good deal.
John Tredennick: No questions asked, no conditions.
John W. Simek: John, I think you know I do digital forensics and expert witnessing and testifying and all of that and in court so I’m very concerned about a lot of the tools that we use to generate and gather that evidence or whatever. But what are the courts saying about TAR usage?
John Tredennick: Well, that’s a hot question and has been since 2012 when that first case, Judge Peck and Da Silva Moore came out and he expressly endorsed the use of what he called computer assisted review at the time, but TAR review. We’ve had a dozen cases since then, and while I say the courts approved the process in every case, they approved the process either agreeing to a stipulation, working through it, going forward. We don’t have a case on record that said, “You may not use TAR,” or, “TAR’s not reliable.” I will say that the most recent decision of which I’m aware of, ironically is also magistrate Judge Peck’s out of the Southern district, and he was presiding over a TAR 1 case where they were fighting about the TAR 1 protocols. If you have the senior expert reviewing, do I get to look over his or her shoulder and see if I agree with how he tags? Because in a world of one time training, every tag matters and maybe influences ten thousand documents. Judge Peck pointed out something we’ve been saying that in a CAL world, every document is a training document and my thesis is what I produced to you. So what he pointed out in the Rio Tinto case that if they just used CAL, it would have gone a lot easier for them.
Sharon D. Nelson: Well, John, we sure want to thank you for joining us today. You make a compelling case for TAR 2.0, which I’m not surprised about. The book really is fantastic. It’s backed by science and you can read the science, but it makes every attempt to not go too far into the weeds and to make it understandable and it is a book for smart people. This is not something that the average person would probably pick up. If you’re interested in TAR, I think they’re going to find this a very compelling read. So thank you again for being our guest today, John.
John Tredennick: Thank you Sharon, thank you John. I’d just like to say that it’s a book for people who like their coffee black.
Sharon D. Nelson: Perfect description!
John W. Simek: Well, that does it for this edition of Digital Detectives; and remember, you can subscribe to all of the editions of this podcasts at LegalTalkNetwork.com, or in iTunes. if you enjoyed this podcast, please review us on iTunes.
Sharon D. Nelson: And you could find out more about Sensei’s digital forensics, technology and security services at www.senseient.com. We’ll see you next time on Digital Detectives.
Advertiser: Thanks for listening to Digital Detectives on the Legal Talk Network. Check out some of our other podcasts on LegalTalkNetwork.com and in iTunes.
[End of Transcript]
Sharon D. Nelson and John W. Simek invite experts to discuss computer forensics as well as information security issues.
Judy Selby talks about what cyber insurance covers, the different types of coverage, and why it’s an important part of a legal business.
Craig Ball shares what it’s like to have the lawyers of the President of the U.S. use your words in one of his preservation...
Sophia Cope talks about the EFF and ACLU challenge against the government’s warrantless searches of cell phones and other devices at the border.
David Ries talks about whether Kaspersky Lab is safe for lawyers to use, diving into where the controversy started and what the results have...
This legal technology podcast covers the Equifax breach including who was affected, the resulting lawsuits, and whether or not the hack was preventable.
Ben Kusmin talks about the proper handling and format of spreadsheets.