Digitized versions of classic books increase to millions
Last year I wrote about why old books, long-playing records and other non-digital media are gaining value, and why some unlikely specimens are so perennially popular. Another non-digital classic has just been taken out of the dusty treasure chest: – the reclusive Harper Lee has agreed to an ebook version of her 1960 best-seller, To Kill a Mockingbird. She announced on her 88th birthday (she was born April 28, 1926), that the novel will finally be released as ebook and downloadable audiobook. On 8 July 2014, the book was available, published by Cornerstone in the UK. The audiobook is a downloadable edition of the existing CD narrated by Sissy Spacek. In a rare public statement released through her publisher, HarperCollins, Lee said:
“I’m still old-fashioned. I love dusty old books and libraries. I am amazed and humbled that Mockingbird has survived this long. This is Mockingbird for a new generation.”
The announcement came almost a year after she sued her former literary agent Samuel Pinkus to regain rights to her novel. Lee claimed she had been duped into signing over the copyright. Lee’s attorney, Gloria Phares, said at the time that the case had been resolved in Sept. 2013 to the author’s satisfaction, with “her copyright secured to her”. Authors like JK Rowling and Ray Bradbury had changed their minds over the past few years about having their books digitized. Lee’s novel had ranked with JD Salinger’s Catcher in the Rye as one of the top books unavailable in e-book format. To Kill a Mockingbird has sold more than 30-million copies worldwide, and that total is climbing by more than 1-million copies a year, according to HarperCollins.
HarperCollins is also releasing an enhanced e-book that will feature additional material (to specified at this stage) which will be on sale on 21 October 2014.
Other works still unavailable as e-books include the autobiography of Malcolm X and Gabriel García Márquez’s One Hundred Years of Solitude (though the latter is available as an audiobook on iTunes). While books are only available as hard copies, they will remain inaccessible to entire generations of readers, and first editions and books of small print runs will become ever rarer and more expensive.
The global store of e-books available for free or to buy online is growing, though they are a mere fraction of all the existing printed books in the world. According to Leonid Taycher, a Google software engineer who works on the Google Books project, 129,864,880 books have been published in print in the world, as of 2010. So where are these e-books to be found?
About 20 percent of the world’s books are in the public domain, and about 10 to 15 percent of these books are in print. The remaining books — the vast majority of all titles — are still under copyright but out of print. To come up with these numbers, which are reasonable approximations, Google started by compiling book information from multiple cataloging systems, such as the International Standard Book Numbers (ISBN). But the problem is that ISBNs have only been assigned to books since the 1960s, and tend to be only used in the Western countries. And books have been around in printed format even before the invention of the printing press as we know it in the West.
The world’s first known movable type printing technology was invented and developed in China by the Han Chinese printer Bi Sheng between the years 1041 and 1048. In Korea, the movable metal type printing technique was invented in the early 13th century during the Goryeo Dynasty. The Goryeo Dynasty saw the printing of Jikji – a Korean Buddhist document – in 1377, about 72 years before the invention of an improved movable type mechanical printing technology in Europe, credited to the German printer Johannes Gutenberg in approximately 1450.
The Royal Library of Alexandria
This was a quantum leap from the highly flammable papyrus scrolls used before. Even so, consider the Royal – or Ancient – Library of Alexandria, in Alexandria, Egypt, one of the largest and most significant libraries of the ancient world. The library was famous for having been destroyed in a fire, resulting in the loss of many scrolls and “books” (collections of scrolls), and has become a symbol of the destruction of cultural knowledge. There may have been many fires and the dates differ, but the partial or complete destruction of the Library of Alexandria include a fire set by Julius Caesar in 48 BC, an attack by Aurelian in the A.D. 270s, the decree of Coptic Pope Theophilus in A.D. 391, and the decree of the second caliph Omar ibn Al-khattāb in A.D. 640. It is now impossible to determine the size of the library’s collection of papyrus scrolls in any era, though King Ptolemy II Philadelphus (309–246 BC) is said to have set 500,000 scrolls as an objective for the library and Mark Antony supposedly gave Cleopatra over 200,000 scrolls for the library as a wedding gift, taken from the great Library of Pergamum. Five hundred scrolls that constituted much of what had been written in the Ancient World? It is a mere drop in the pail of books available in on-line libraries today.
Sources of digital books, by the numbers
For instance, the Internet Archive offers over 6,000,000 fully accessible public domain eBooks. This includes a special modern collection of over 500,000 eBooks for users with print disabilities, and a very interesting curated, modern collection for the world at large.
Open Library, an initiative of The Internet Archive, offers over 1,000,000 free ebooks of classic literature. The Open Library’s aim is “One web page for every book ever published.” To date, the Open Library has gathered over 20 million records from a variety of large catalogs as well as single contributions, with more on the way. Like Wikipedia, Open Library relies on the public to help them by uploading digital books, and editing information on books.
OCLC WorldCat is another organization that allows users to find and search libraries around the world for specific books. A major difference between OCLC and the Open Library is that OCLC is building a catalog to share among libraries, while Open Library is building a catalog to share freely and openly with the public, with the hope that this will get more people involved in using libraries and, in the long run, generate new data that will be useful to the library community. Open Library links to the WorldCat catalog for any editions for which there is either an ISBN number or an OCLC identifier. Looking for a book in your university library in Perth while you’re sitting in Vancouver? No problem – go to WorldCat. (I found it fascinating to find books I wrote under my maiden name, Marthe le Roux, on the database.)
Project Gutenberg offers over 45,000 free high quality e-books, all previously published by bona fide publishers, and then digitized and proofread with the help of volunteers. In other words, they offer a selection of books. On the other hand, Open Library’s goal is to list every book — whether in-print or out-of-print, available at a bookstore or a library, scanned or typed in as text. Open Library provides access to all of Project Gutenberg’s books but have hundreds of thousands of others as well.
All the pages in the world
Compare these numbers to the number of pieces of documentation that Google has added to its database so that they can be accessed digitally: Google search function relies on the millions of individuals posting links on websites to help determine which other sites offer content of value. Google uses more than 200 signals and a variety of techniques, including its patented PageRank™ algorithm, which analyzes which sites have been marked as the best sources of information by other pages across the web. The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark.By 2013, +_ 48 Billion – Webpages had been indexed by Google. Google’s index is well over 100,000,000 gigabytes, and the company has spent over one million computing hours to build it. Now that Google has indexed more of the HTML pages on the Internet than any other search service, it focused on adding the ability for users to search news archives, patents, academic journals, billions of images and millions of books: “…Our researchers continue looking into ways to bring all the world’s information to people seeking answers.”
At April 2013 Google’s database encompassed more than 30 million scanned books. Many of the books are scanned using the Elphel 323 camera at a rate of 1,000 pages per hour. Google Books provides digital copies of which viewers can see the basic details and a page preview – and where they can be found to buy or borrow. Google obtains digital copies of books through their Publishers’ and Authors’ Google Books Partner Program the Google Books Library Project. The latter is an enhanced card catalog (or listing) of the books from more than forty of the leading libraries and academic institutions around the world, which is made possible through the scanning of the books, many of which are out of print. These include:
- Austrian National Library
- Bavarian State Library
- Columbia University
- Harvard university
- Cornell University Library
- Ghent University Library
- Keio University Library
- Lyon University Library
- University of California
- National Library of Catalonia
- The New Work Public Library
- Oxford University
- Princeton University
- Stanford University
- University Complutense of Madrid
- University Library of Lausanne
- University of Virginia
- University of Texas at Austin
- University of Wisconsin-Madison
- University of Michigan
Copyright and e-books
One of the most important, if not the most important, United States copyright cases decided in 2013 was The Authors Guild, Inc. v Google Inc. (2013 WL 6017130 (S.D.N.Y. Nov.14, 2013). The case was appealed to the Second Circuit Court of Appeals by The Authors Guild and an outcome of that is still pending. The case raises issues of such significance to copyright holders and online service providers that it may well end up as a landmark precedent case. The Judge ruled that the Google Book project did not infringe copyright as it was covered by a fair use, and that there is no question that, in the absence of fair use, Google would have been liable for copyright infringement. Google digitally reproduces millions of copyrighted books. But, it makes digital copies available for its Library Project partners to download for any uses that did not violate copyright laws. This, and displaying snippets from the books to the public was all done without license or permission from the copyright owners. Google’s only defence was fair use. It was a victory for Google and for the millions of users who want access to digital copies of books, but a setback for The Authors Guild that is concerned with the protection of authors’ original works.
The battle over copyright of digital books is far from over. Yet, I agree with Google that Internet access to the world’s books is critical to educate and liberate people, and digital book repositories are important to “ways to bring all the world’s information to people seeking answers”. Now, we can all toddle off to the iTunes store, or Amazon or Kindle, and get our own copies of To Kill a Mockingbird.