Orthographic Variation in the Samaritan Pentateuch: A Statistical Analysis

Currently (2018-) a very satisfying scholarly edition of the Samaritan Pentateuch (SP) is being prepared and published by prof. dr. Stefan Schorch, University of Halle, at De Gruyter Publishing. Thanks to his digitization of Ms. Dublin Chester Beatty Library 751 (1225 AD), the oldest complete SP manuscript, our Copenhagen research group has been able to publish a linguistically annotated dataset of this manuscript at https://github.com/DT-UCPH/sp. Little research has been done on SP orthography, and most of what is known is stated in general terms: SP has fuller orthography than the Masoretic Text (MT), i.e., SP will much more often use vowel letters for historically long vowels. This lack of research is probably partly explainable by reference to the flawed character of older editions of SP. Our research group is working on a large-scale project analyzing ancient Hebrew orthography on the basis of more evidence than earlier, mostly MT-based studies. We thus take into account the spelling of both MT, DSS and SP, and we are trying to figure out more precisely which factors influence the use of vowel letters, such a diachrony, ‘Qumran Scribal Practice’, presence of affixes, etc. Our orthography project is a case study in digital humanities (DH). We are trying to figure out how to apply machine learning and advanced Bayesian statistics so as to learn new things about ancient languages. The paper will introduce the SP and will present our findings with respect to SP spelling. We will also present the DH methods that we use.