Archive for January, 2015

How Google Scholar Undercuts Jurisdictions Going Digital While It Could as Easily Support them

Monday, January 26th, 2015

Google Scholar’s case law collection has been an enormous boon to this country’s lawyers and all others puzzling over U.S. law.  Not only does it provide free and direct access to a professional quality case database, but it enables legal commentary linked to governing precedent to reside outside a pay wall.  Ironically, this breakthrough electronic research tool remains largely reliant on print source material.  That is for many jurisdictions a direct consequence of the courts themselves being stuck in obsolete publication practices.  But Scholar’s reliance on print holds even for states in which there is a more authoritative digital alternative.  In the case of several state courts that have recently shifted to official online publication, Scholar persists in loading digitized versions of their decisions drawn from the pages of the Thomson Reuters National Reporter System (NRS).  For at least one – Illinois – this is done without preserving the official citation information required in all submissions to that state’s courts.

Exhibit No. 1: Google Scholar’s Treatment of Illinois Decisions

In July 2011, less than two years after Google Scholar unveiled its case law database, Illinois began publishing the official versions of its appellate decisions online.  Print publication of the Illinois Official Reports ceased.  As a consequence the final and official version of the Illinois Supreme Court in Lake County Grading Co. v. Village of Antioch, 2014 IL 115805 (and all other binding decisions rendered by Illinois appellate courts since the switch) is available for anyone, including Google, to download from a public site.  The text’s official status is indicated, and all that one needs to cite that decision to an Illinois court, in whole or in part, is contained in the electronic document.  One could hope, one might expect, that Google Scholar would embrace and leverage this judicial reform.  The change was, after all, prompted by many of the same goals that lie behind the Google initiative.  Yet Scholar continues to digitize the print NRS version of this and other post-2011 Illinois decisions.  Worse, while doing so it drops the medium neutral citations by which Illinois courts identify those decisions and require those invoking them to employ (“2014 IL 115805” in the case of Lake County Grading).  Google’s practice appears to be to harvest Illinois decisions when first released in slip opinion form, to ignore the subsequent “official” electronic version, and ultimately to replace the slip opinion with a digitized copy of the NRS text.  This final case report displays the volume number and page at which the decision is located within the NRS North Eastern Reporter as well as its internal pagination and paragraph numbers.  But critically it omits the official medium-neutral case cite.  For an example take a look at People v. Colyar, 2013 IL 111835.  It can’t be said that Scholar completely ignores the new non-print Illinois citations, for it uses them to index decisions.  As a result Colyar’s citation (“2013 IL 111835”) entered as a search will retrieve the case.  The official cite also appears in Colyar’s listing when the decision is retrieved by a typical word search.  The problem is that it remains absent from the opinion text when displayed on the screen, downloaded, or printed out.

Exhibit No. 2: New Mexico

New Mexico furnishes a second example of Google’s unfortunate print bias.  Like Illinois, New Mexico ceased publishing official print reports in 2011.  Since then the official version of any precedential New Mexico decision is contained in an electronic file retrievable without charge from the New Mexico Compilation Commission siteZhao v. Montoya, 2014-NMSC-025 is one such case.  Ignoring the change, Google Scholar has continued to draw its final text of the state’s appellate decisions from the NRS Pacific Reporter.  However, probably because New Mexico began attaching neutral citations to decisions long before the Scholar case database was conceived or designed, Google’s print-based acquisition process has, from the start, extracted those official citations from the NRS reports and included them within each case.  On the other hand, since Google Scholar relies on the Pacific Reporter for that information, decisions appear without their official citation until they have been published by Thomson Reuters and digitized by Google from that print source.  Compare the official version of Wilkeson v. State Farm Mut. Auto. Ins. Co., 2014-NMCA-077, with that provided by Google Scholar.

Exhibit No. 3: Oklahoma

Scholar’s treatment of Oklahoma decisions demonstrates that this need not be so.  The Oklahoma judiciary declared its online publication of appellate decisions official as of the beginning of 2014.  As with the others this reform did not alter Google Scholar’s reliance on the NRS as the ultimate source of Oklahoma’s case law.  Scholar continues to download Oklahoma decisions from the public site at the time of initial release, ignore the subsequent electronic versions designated as “official”, and replace the original files with digital copies of the texts once they appear in the Pacific Reporter.  There is one important difference.  Each decision’s medium neutral citation (e.g., “2013 OK CIV APP 105”) is displayed at the top from the beginning.

Exhibit No. 4: Arkansas

Official Arkansas case reports have been electronic since 2009.  That same year the Arkansas Supreme Court erased the distinction between published and unpublished decisions.  All decisions of the Arkansas Supreme Court and Court of Appeals now carry precedential weight.  Faced with the resulting surge in the volume of citable Arkansas decisions, Thomson Reuters, refused to publish them all.  Without guidance from the Arkansas courts, the company’s editors now select only a small percentage for print publication (less than 17% of the 2013 Court of Appeals decisions).  Those that appear in S.W.3d are digitized by Google Scholar (complete with internal pagination) from that source and substituted for the prior court-distributed version.  While Google’s digitization process retains the public domain case designations applied by the deciding court (e.g., “2013 Ark. App. 738”) it strips out another crucial citation element.  Although the NRS version displays the page breaks that appear in the official electronic case report, Scholar leaves them out.  For that reason its versions of Arkansas decisions, both those drawn from the official site and those based on the regional reporter, cannot be used to prepare pinpoint citations in the format called for by that state’s appellate rules.

Exhibit No. 5: Ohio

When the Ohio Supreme Court implemented a non-print citation system in 2002 it too removed the prior distinction between “published and unpublished” decisions.  Ten years later it abandoned print publication of all decisions from the Ohio Court of Appeals.  Since July 1, 2012 the official version of any decision of that court has been the authenticated electronic copy released by the Reporter of Decisions.  During 2013 the court’s twelve districts issued over 5,200 such precedential opinions.  Only 360 or so were selected by the NRS editors for publication in the North Eastern Reporter.  As with Arkansas, Google Scholar loads the entire set of Court of Appeals decisions, later adding  volume and page number cites to the indexing data for those decisions that appear in the regional reporter.  It does not, however, display the NRS reporter citation as part of the opinion.  As is true of the official cites in Illinois, these appear only as part of the listing of results delivered in response to a search.  Thus while a search on “992 N.E.2d 453” will retrieve State v. Venes, 2013 Ohio 1891 (Ct. App. 8th Dist.), that NRS citation does not appear within the opinion nor does Scholar show the NRS pagination.

Google Scholar’s Treatment of the Official Print Reports of California, Massachusetts, and New York Demonstrates that It Can Do Better

The Ohio example reveals that Google’s reliance on the Thomson Reuters reports does not reflect its approach to all U.S. jurisdictions, cost-effective though that might be.  After all, economy and efficiency might well argue for acquiring all case data from that single source.  Ohio does not stand alone.  In the case of several states that still publish their own law reports in print (or contracting for their publication) Google digitizes those reports rather than their NRS counterparts.

California, Massachusetts, and New York are among those “official report” states.  Importantly, these three employ distinct formats for internal citations.  To illustrate, as published in New York’s official reports, the New York Court of Appeals decision in De La Cruz v. Caddell Dry Dock & Repair Co., 21 N.Y.3d 530 (2013), cites a prior decision of the court as follows: “Brukhman v Giuliani (94 NY2d 387 [2000])”.  In the Thomson Reuters editions the citation to Brukhman v. Giuliani becomes: “Brukhman v. Giuliani, 94 N.Y.2d 387, 705 N.Y.S.2d 558, 727 N.E.2d 116 (2000)”.  As detailed in a prior post, such citation format differences make it easy to detect whether the decision texts for the jurisdiction have been drawn from its official reports or from the proprietary NRS.

Applied to Google Scholar this analysis establishes that it currently draws New York case data from the official reports.  Have a look at its version of De La Cruz.  Although the volume and page numbers at which that decision appears in the North Eastern Reporter and New York Supplement have been added so that users can extract a parallel cite, the format of the citations contained within Scholar’s version of De La Cruz decision, as well as the page breaks shown within the text, reveal the version to be a digital copy of the official report.  Similar citation analysis reveals that Google Scholar also relies on California and Massachusetts official reports for decisions from those states.  In other words, Google’s data acquisition process does not rest exclusively or consistently on the Thomson Reuters reports.

Drawing on the official reports of California, New York, and Massachusetts necessitates digitizing print.  But with states like Illinois and the others that have moved to official electronic distribution this is unnecessary.  Transposed to them, using the official version of decisions would avoid that costly process and require only two or three steps:

  1. Loading opinions as first released, include all citation data embedded in them (case cites, paragraph numbers, or when necessary, as with Arkansas, internal pagination). Google currently accomplishes this with Oklahoma and Ohio, but fails to do so for Arkansas, Illinois, or New Mexico.
  2. Second, if decisions are initially released in a preliminary or slip form, substituting their final, official versions, once available, again, retaining all citation data. Patently, Google follows this pattern in New York, California, and Massachusetts where that final, official version is brought out in print.
  3. Finally, adding a parallel National Reporter System volume and page number cite to the official medium neutral citation once it becomes available. Google’s process for decisions from New Mexico and Oklahoma, not to speak of the print publication states, New York, California, and Massachusetts, demonstrates that its data systems are capable of this step.

One can hope for the day when all U.S. courts publish their official reports electronically, allowing the full range of legal research services to redistribute final, official, citable copies, adding diverse levels and types of editorial enhancement, including their own citation schemes.  Jurisdictions weighing a shift toward that future ought to be encouraged.  More respectful recognition of the measures taken by states that have already gone digital is an essential first step.  Google Scholar, the dominant free source of U.S. case law, ought to lead the way.