The HeidelGram Corpus
The HeidelGram corpus is currently being compiled at the chair of diachronic linguistics at the University of Cologne. Upon completion, the corpus will be available for download here.
Corpus Design
The HeidelGram corpus includes all major publications of the genre in full text, i.e. a representative subset of all published British grammars of the 16th to 19th centuries, based on bibliographic lists of publications (Finegan 1998, Görlach 1998, Leitner 1986, 1991, Linn 2006, Michael 1987, Tieken-Boon van Ostade 2008), numbers of editions, information in book catalogues and advertisements, and contemporaries’ comments on grammars in use, as found, for instance, in literary genres.
The texts were selected based on their number of published editions, their distribution, common use, prevalence in contemporary books and reports, school and college curricula, as well as their prominence in the secondary literature on grammar writing. Additionally, variety in function, audience, and text/grammar type (e.g. teaching grammars and reference grammars) were considered as selection criteria. Grammars of the scholarly tradition are intentionally overrepresented numerically at the expense of a balanced corpus, assuming that developments in grammar writing become manifest first and foremost in reference grammars.
The corpus texts will be made available as plain texts as well as TEI-adjacent XML. The decision was made to use a custom XML annotation scheme, as it provides more flexibility and simplicity while still being compatible with TEI software tools and platforms. The annotation process so far has shown that the wide variability in the texts across time have made flexibility in annotation an absolute necessity, while the simplicity of using a reduced number of elements and attributes makes the scheme more accessible and user-friendly for the annotators.