The Case for Predictive Coding
On February 24th, the Southern District of New York issued an opinion that is sending reverberations around the world of eDiscovery. Federal Magistrate Andrew Peck took the deep dive into the use of predictive coding as it relates to large-scale document review. Predictive coding is the practice of using a technology-assisted classifying process in which a human reviewer codes a small fraction of documents as responsive or privileged. Using the results of the human review, the computer then codes the remaining documents in the overall collection for responsiveness, promising a great reduction in the amount of time spent on reviewing entire document collections.
In Da Silva Moore v. Publicis Groupe & MSL Group, 11 Civ. 1279 (ALC) (AJP) (S.D.N.Y. Feb. 24, 2012), both parties are wrestling with the daunting task of reviewing over 3 million emails. While both parties agreed that predictive coding might be useful for the review, neither party could agree on the reliability of the protocol presented.
The protocol proposed by the defendants sought to use a sample of 2,399 documents to determine relevant documents for the "seed set" that will be used to train the predictive coding software. The "seed set" will contain documents that were determined to be relevant at a 95 percent level by defendant's counsel. Defendants agreed to provide all of those documents--with the exception of the privileged documents--to the plaintiffs for their review of the relevance coding. Finally, plaintiffs would provide their own keywords for the review and coding of an additional 4,000 documents.
Despite plaintiffs' argument that the proposed protocol was too unreliable, defendants successfully argued that the savings in time and budget were too great to ignore considering the superiority of a computer-assisted review to the available alternatives. The fact that the magistrate fell in line with the defendants' argument is not surprising in light of his recent presentations on predictive coding.*
While some see this as a monumental step in the direction of computer-assisted reviews, others are more cautious. In an online article on the ReedSmith website, ReedSmith offered that, "Judge Peck's opinion provides a useful precedent supporting attempts to use this potentially cost-saving technology, but also identifies some of the associated risks and complexities. Careful planning, cooperation between counsel, and high-quality review of sufficient sample of seed documents can be critical components of success. Companies facing large-scale document reviews should consider the potential use of computer-assisted review technology to improve speed and reduce costs. eDiscovery counsel who already has experience using such technology can provide valuable guidance in this regard."
Even Magistrate Judge Peck concedes that there is plenty of work to be done before predictive coding takes hold. He asserted, "It is unlikely that courts will be able to determine or approve a party's proposal as to when review and production can stop until the computer-assisted software has been trained and the results are quality-control verified. Only at that point can the parties and the Court see where there is a clear drop off from highly relevant to marginally relevant to not likely to be relevant documents." The magistrate also noted that the use of predictive coding is not suitable for all cases and that clients and their counsel must determine on a case-by-case basis whether predictive coding is appropriate for the matter at hand.
While the opinion is a defined step in the direction of predictive coding, it is still the first step of many before whole sale acceptance. In fact, the protocol has not even been put to the test in the Moore matter since the magistrate has simply allowed the defendants to proceed with testing their protocol. However, the court highlighted four lessons to be learned from Moore: First, the magistrate advised that the party proposing the protocol needs to quality check the protocol before approaching the court. Second, staging discovery by examining more accessible and more relevant sources first is a way to control discovery costs and to get in the court's favor. Third, counsel should become familiar with their client's documents and custodians in order to create a more defensible approach. Finally, the court suggested that it is always helpful to involve electronic discovery service providers in court proceedings to discuss ESI issues.
*Magistrate Judge Peck recently wrote an article for Law Technology News and spoke at Legal Tech 2012 extolling the virtues of predictive coding and forecasting that new case law regarding coding was right around the corner.
David S. Weber is General Counsel for Digital Discovery www.digitaldiscoveryesi.com and serves as a computer forensics consultant and eDiscovery expert to corporations and law firms. He can be reached at firstname.lastname@example.org.