A Description of the Project

The last one hundred years of scholarship in the history of the English book trade have been dominated by catalogues: of books, of watermarks, of printer's ornaments and title-page borders. At the same time considerable effort has gone into the transcription and publication of primary documents such as those found in company archives and government repositories. Little research has been carried out, however, in support of the tools of quantitative analysis developed by historians and social scientists. The paucity of quantitative-based research is due primarily to the lack of hard data upon which to work. As a result, while we know for the most part what books were published and something about the official lives of the men and women who worked in the printing houses and bookshops, we know very little about the measurable physical, economic and material circumstances of the trade itself.

The Early English Booktrade Database (EEBD) will be the first networked electronic resource devoted to the organization and dissemination of physical and descriptive bibliographical statistics. The EEBD's goal is to collect and describe material evidence related to English printing and publishing 1475-1640 (also know as the STC period, after the Pollard and Redgrave Short-Title Catalogue of Books Printed in England, Scotland & Ireland 1475-1640). The assembled data will for the first time enable large-scale quantitative analyses of historical, industrial, sociological and literary aspects of the early modern print culture. At its heart is a set of digital files constructed in XML and accompanied by a suite of analytical and data-representation tools. It is also designed to be used in conjunction with the electronic English Short-Title Catalogue (ESTC) and British Book Trade Index (BBTI). Using the methods of quantitative history, often called cliometrics, scholars will be able to explore the nuances of the English book trade at a level never before possible. For example, a book historian will be able to chart in detail the disappearance of black letter printing during the reign of Elizabeth, while a divinity scholar might investigate the flourishing trade in printed sermons and its impact on popular religious beliefs.

Project Details

At its heart, the EEBD consists of two distinct classes of information: new material gathered from the close physical examination of every title printed during the STC period; and relevant data from existing resources such as the STC and ESTC that have been revised and recompiled to be used analytically in correlation with the freshly gathered evidence. New material includes:

  1. Edition Sheet. One of the main stumbling blocks to a deeper understanding of the book trade is the lack of data detailing the productive capacity of printing houses. Most studies to date have relied upon lists of titles, an approach that doesn't distinguish between a 1500-page folio history and a single-sheet broadside ballad. The EEBD will employ a unit of measure called the edition sheet, or the number of sheets in an exemplar volume used as a measure of the relative amount of work required to produce the complete run of that volume. By compiling edition sheet totals, the EEBD will provide a more accurate assessment of the amount of work involved in machining every book in the STC period.
  2. Composition. Measuring the linear amount of type used in a particular volume provides an estimate amount of composition and proofreading work involved in producing it. In conjunction with edition sheets, this information allows us to create much more sophisticated evaluations of the productive capacity of individual printing houses as well as the entire trade.
  3. Typography. This category includes both the face and body of the types used to produce a particular volume. Such data provides long-term insights into the changing cultural fashions of the book trade, in particular the complex relationship between the subject of a book, the chosen format (folio, quarto, octavo, etc.), the size and quality of paper used, the type face and body in which it was set, and the business practices of the printer who impressed it.
  4. Paper. This fundamental building block of the book has resisted detailed analysis due to the varied and shifting forms it takes as well as the sheer volume of paper consumed by the trade. However, research has established a strong correlation between watermarks used by the paper producer (when present) and the size and class of the sheets. The EEBD will identify the dominant watermark type and corresponding paper group used to print each title employing a classification system based upon work by leading bibliographical scholars.
  5. Subject. On-line catalogues such as WorldCat and the ESTC employ a library-based set of subject headings designed to aid users in locating books of interest. The EEBD will instead classify volumes according to a strict taxonomy of subject classes derived from existing historical studies and designed to support a variety of analyses, providing scholars with new ways to evaluate the business practices of printers and booksellers. One might ask, for example, which printing houses specialized in literary as opposed to sacred texts, or which booksellers funded the publication of popular romances and travel narratives.
  6. Main Text Layout. The project will classify the gross design features of the main body of the text, i.e. simple header and text, text with marginal notes, ruled compartments, etc. Like typography, this data provides long-term insights into the changing cultural fashions of the book trade, such as the gradual addition of the scholarly apparatus to literary works.
  7. Paratext. Just as books are not simply containers for texts, neither is the paratextual framework present in most books from this period merely padding. Expanding on Franklin Williams's 1962 "Index of Dedications and Commendatory Verses," the EEBD will classify the variety of dedications, prefaces, introductions and errata as well as creating a table of personal names.

Existing data to be revised and compiled includes:

  1. Edition-Issue-Variant. The building block for virtually all studies of book production and circulation is the edition, i.e. a volume produced from newly set type. The first editors of the STC often assigned multiple record numbers to issues and minor variants of a single edition, a practice that has continued in subsequent incarnations of the resource. Tests run on sample pages indicate that the blurring of differences among these different classes of publication has bloated the second edition of the STC (and consequently the ESTC) by roughly 2-3000 records. The EEBD will link multiple issues and variants of a single edition with a unique ID reference to enable edition-level analyses.
  2. Multiple Editions. Equally important to an understanding of the English book trade is the ability to identify frequently printed works. In order to support large-scale market analyses as well as enable users to interpolate gaps in the historical record, the EEBD will link multiple editions of the same title with a unique ID reference.
  3. Shared Printing and Publishing. The ESTC follows the practice of the STC by presenting title and imprint information in a modified version of its original spelling. In order to make such data useable in digital search and display routines, the EEBD will create an authority table identifying and listing the printers and booksellers involved in the trade with a standardized spelling of names and places. Each individual record will have linked cross-references to the ESTC items in which s/he had a hand as well as a numeric estimate of the proportional responsibility when sharing work. Additionally, the EEBD will cross-link its data with the biographical information held in the BBTI at the University of Birmingham.
  4. Format and Collation. The STC and ESTC include collation formulae (i.e. a precise, condensed description of how a book is physically assembled) for some of the titles, but the majority of the entries have only format designations. The EEBD will add structural collation formulae to all records.

These disparate classes of data will form the core records of the EEBD. However, this ground-breaking compilation of evidence surrounding the early book trade requires an equally dynamic technical base. To support scholars from widely divergent disciplines, we will design the database structure, tool collection, and distribution network underlying the EEBD with an eye toward maximum flexibility and utility.

Perhaps the four most important challenges facing a digital resource designed to be used collaboratively are power, portability, transparency, and preservation. A research collection of the scope and sophistication envisioned by the EEBD will require a powerful query-and-analysis engine supporting it. However, as no software package can offer every approach to analysis, all data must have the ability to be extracted partially or fully from their native environment and imported into a new one. Furthermore, such extraction must take place simply and without any lose of information such as special characters or complex relationships among records. Finally, the data and accompanying structures must be based on a universally recognized industry standard that will survive changes in fashion and technology. The IATH research and development team, with support from the information technology specialists at team members' institutions, will assess the nature of the data to be collected and select the most appropriate technologies for representing and exploiting it in machine-readable form. Preliminary analysis of the data suggests that the optimum maintenance environment will be object-relational database technology and XML for communication between the database and other applications, for example statistical analysis software, and between the database and end-users. XML appears to be the most effective technology for integrating and communicating related data from EEBD, ESTC, and BBTI.

As far as analytical instruments are concerned, many powerful packages such as SPSS and SAS are currently available for use on common computer platforms such as Windows, Macintosh and UNIX; indeed, the EEBD relies on XML technologies in part to support the seamless transfer of data extracts to these popular statistical packages. Nonetheless, many scholars won't require the sophisticated routines offered by SPSS and SAS (nor would they have the time to learn how these packages function). A great deal can be learned from simple descriptive analyses that display types of central tendencies (e.g. averages) or measures of dispersion (e.g. distribution histograms), especially when applied to the EEBD's rich evidence base. In order to encourage the widest use of the EEBD data, we include as a long-term goal the creation a suite of tools specifically designed to exploit the physical and descriptive information it contains. Accompanying these tools will be a set of display strategies based upon Edward Tufte's principles of data compression and density. With such an integrated set of query, analysis, and display strategies, researchers might map the shifting kin and financial alliances of booksellers across a diachronic representation of London.

Finally, the utility of the EEBD will increase exponentially when linked with the ESTC and BBTI. First conceived nearly thirty years ago, the ESTC is an enumerative listing of English printing 1470-1800, containing quasi-regularized title-page information and an annotated census of known copies for each of its records. The BBTI offers biographical and trade details for every person known to have worked in the British book trade up to 1851. These two resources complement perfectly the deep statistical data embedded within the EEBD, forming a tripartite anthology of facts, figures, names, dates, and places describing the volatile world of early printing. In order to exploit fully the potential of this strategic arrangement, the EEBD will build a triangulating mechanism that will allow scholars to use one or more of the resources in whatever combination they choose. For example, the potential researchers mentioned in the previous paragraph might add biographical data from the BBTI to enrich and extend their data map, then use the ESTC census to locate presentation copies that have authorial inscriptions indicating further relationships between author, printer and bookseller.

Because the ESTC currently licenses its data to the Research Library Group (RLG), which in turn charges a fee for its use, the relationship among the three databases will be asymmetrical for the time being. Users with access to the ESTC will have complete access to both the EEBD and BBTI; users with access to the EEBD and BBTI will have full access to one another but only partial access to the ESTC. While only subscribers to the ESTC will have full use of all three datasets, we will make the analysis and display tools we develop freely available to all.