DESCRIPTION OF THE RESEARCH PROJECT(S)
In this project we look at the way certain Tibetan and Newar varieties express the perspective of the speaker in the sentence. In Lhasa Tibetan, for example, the auxiliary verb 'yin' can be used in sentences where the speaker is the subject (nga em-chi yin '*I'm* a doctor'), if the speaker wants to identify their personal relation or possession ('di nga'i bu-mo yin 'This is *my* daughter') or if the speaker chooses to emphasise who performed an action ('di khyed-rang-gi gsol-ja yin 'This is your tea [that *I* have made for you]'). Other Tibetan varieties that are found in remote regions also exhibit egophoric markers, but to a lesser extent and not always in the same contexts. Similare observations have been made for varieties of Newar. Finally, in older stages of both Tibetan and Newar varieties, this egophoric marking cannot yet be found. The central question that this project aims to answer is how and why specific grammatical markers to indicate the speaker's involvement emerge over time in ways that slightly differ, even in closely related languages. We compare the historical development of related languages in remote mountainous regions to those in areas of intense contact, e.g. the Kathmandu valley.
PRINCIPAL RELEVANT PUBLICATIONS
Faggionato, Christian, Nathan Hill & Marieke Meelen (2022). NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties in Proceedings of The Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference, 1-6, https://aclanthology.org/2022.eurali-1.1/
O'Neill, Alexander (2022). OCR model for Pracalit for Sanskrit and Newar MSS 16th to 19th C., Ground Truth (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6967421 O’Neill,
A. J., & Hill, N. (2022). Text Recognition for Nepalese Manuscripts in Pracalit Script. Journal of Open Humanities Data, 8: 26, pp. 1–6. DOI:https://doi.org/10.5334/johd.90
WHY/HOW DOES YOUR PROJECT ADVANCE OUR KNOWLEDGE ON BILINGUALISM/MULTILINGUALISM/CONTACT?
In this project we investigate the impact of multilingualism and contact on language change, specifically on grammaticalisation and pragmaticalisation of egophoric and evidential markers.
WHAT IS THE SOCIAL IMPACT OF YOUR PROJECT?
Emergence of Egophoricity
Speakers of the Newar varieties will be able to access the invaluable new data on their languages. This spoken and written record of their languages will allow them to achieve the rights of other minority groups as the Nepali government requires written/transcribed evidence to grant this status and will furthermore contribute to their efforts to develop essential teaching materials that will support the survival of these endangered languages.
The linguistic annotation of written materials in present-day Tibetan varieties will in the same way encourage the local speaker communities in Jiri and South Mustang in particular to create educational resources.
All our corpus annotation tools, training data (the manually corrected deeply annotated corpora) and ground truth (for historical Newar transcriptions) will be made available open access as well. These will form invaluable resources for education and translation professionals, since they can be used to further train and optimise automatic NLP tools thus creating opportunities to digitise and annotate any further Tibetan and Newar materials in future. These tools and data sets are an essential starting point of innovative developments in Natural Language Processing.
Through a website and occasional public lectures and workshops in London and Cambridge (e.g. at the Festival of Ideas), we’ll further disseminate our findings that are relevant for Tibetan and Newar language and cultural associations.
LOCATION AND/OR IMPACT OF YOUR PROJECT