Forschungscolloquium Migration und Minderheiten
Europa-Universität Viadrina
Caucasian Urum Language contact and speaker variation
Stavros Skopeteas University of Bielefeld
Frankfurt (Oder), 25.06.2011
1. Preliminaries
1.1 Caucasian Urum
- spoken in the district of Trialeti, Georgia.
- language contact: Anatolian Turkish (basic substrate), exchange with Russian
and Georgian (possibly also Armenian).
- Population: 30 811 people according to the 1979 Population Census of the
Georgian SSR, estimated to 1500 people in 2006 (Wheatley 2006).
- According to the tradition of the community: the Urum people were originally
situated in Eastern Turkey (Kars). Their ancestors moved to the Caucausus at
the beginning of the 19th century.
- The Caucasian Urum language should not be confused with the Urum
language spoken in Ukraine (also known as Greek-Tatar) or with the Urum
people in Turkey.
Caucasian Urum
2
Fig. 1. Tshalka
1.2 Urum documentation project
University of Athens
Athanasios Markopoulos, Eleni Sella-Mazi
University of Bielefeld
Stavros Skopeteas
University of Bremen
Elisabeth Verhoeven
Tbilisi
Violeta Moisidi
Funded by the Latsis foundation (January 2010 – February 2011)
1.3 Objectives
Sections
(a) thematic LEXICON: translation of 1419 concepts (belonging to 24 different
semantic fields) (4 native speakers)
(b) SENTENCE sample: representative sentences for the examination of
different grammatical categories (4 native speakers);
S. Skopeteas
3
(c) TEXT collection containing semi-naturalistic narratives (80 short narratives,
16 native speakers).
(d) documentation of the COMMUNITY: sociolinguistic questionnaires about
the use of language and other languages by the individuals (30 native
speakers).
Method
- This decision follows from the assumption that linguistic properties vary in at
least two dimensions: (a) the variation between speakers (pervasive in an
endangered language); (b) the variation between linguistic objects.
- Repeated observations in language documentation.
Fig. 2. Urum in the Web (www.urum.lili.uni-bielefeld.de)
2. Words
Research questions
• What are the sources of the Urum vocabulary?
• Which (phonological, morphological, semantic) deviations from the
Eastern varieties of Turkish may be observed in Urum?
• What is the influence of the contact languages (Georgian, Russian, Pontic
Greek) to the vocabulary? How is this influence manifested in particular
semantic fields?
Caucasian Urum
4
Method
- version of the World Loanword Database (WOLD), inventory of lexical
concepts (see Haspelmath & Tadmor 2009).
Table 1. Basic vocabulary in semantic fields
semantic field illustrative examples n
sense perception smell, bitter, hear, etc. 47
spatial relations remain, pick up, in front of, left, etc. 71
body head, eye, bone, cheek, etc. 138
kinship mother, father, sister, younger sister, etc. 82
motion fall, throw, swim, carry on the back, etc. 76
physical world land, soil, mud, mountain, etc. 71
emotions and values heavy, happy, cry, proud, etc. 54
quantity fifteen, count, few, empty, etc. 39
time slow, sometime, soon, year, etc. 56
actions and technology cut, pull, build, hammer, etc. 64
cognition study, teach, pupil, doubt, etc. 51
speech and language tell, speech, paper, pen, etc. 42
animals cow, sheep, goat, chicken, etc. 104
possession give, find, pay, price, etc. 47
warfare and hunting army, soldier, victory, defeat, etc. 35
social and political relations queen, Russian, servant, command, etc. 56
food and drink oven, bowl, soup, bean, etc. 109
agriculture shovel, flower, tree, orange, etc. 68
law accuse, guilty, prison, thief, etc. 20
house door, window, chimney, bed, etc. 39
clothing glove, leather, skirt, shoe, etc. 52
religion and belief bishop, hymn, marriage, Muslim, etc. 33
modern world bomb, plastic, workshop, film, etc. 51
miscellaneous same, nothing, without, that, etc. 14
TOTAL 1419
S. Skopeteas
5
The native speaker was then presented a sentential example in the contact language
(Russian) that contains the target word. The sentential examples are developed with
the following rules:
A. Non-relational entities are encoded in the contact language as subjects.
Target concept: “goat”
Sentential example: “The goat is clever.”
B. Relational entities are encoded in the contact language as subjects possessed by a
third person.
Target concept: “nose”
Sentential example: “Her nose is beautiful.”
C. Properties and events are encoded in the contact language as predicates.
Target concept: “big”
Sentential example: “The cow is big.”
Fig. 3. Entry in Lexicon
Caucasian Urum
6
Illustrative Results.
Fig. 4. Likelihood of borrowing per semantic field:
Urum likelihood calculated in a total of 5273 translations (4 speakers)
48 languages from the same words in the WOLD sample
0
0,2
0,4
0,6
0,8
1
sens
e pe
rcep
tion
spat
ial r
elat
ions
body
kins
hip
mot
ion
phys
ical
wor
ldem
otio
nsqu
antit
ytim
eac
tions
and
tech
nolo
gyco
gnitio
nsp
eech
anim
als
poss
essi
onw
arfa
re a
nd h
untin
gso
cial
and
pol
itical
rela
tions
food
and
drin
kag
ricul
ture law
hous
ecl
othi
ngre
ligio
n an
d be
lief
mod
ern
wor
ld
Urum48 languages
Comments
- the proportion of borrowings in Urum, i.e., 23,7% (aggregated per field), is
smaller than the corresponding proportion of the same words in the 48-
languages sample, i.e., 28,6% (WOLD).
- some outliners (e.g., kinship, time, warfare and hunting)
- general pattern of proportions across semantic fields is similar to the cross-
linguistic pattern (Pearson r = .84).
S. Skopeteas
7
Fig. 5. Origin of borrowed words per semantic field
(preliminary decoding)
0%
25%
50%
75%
100%
sens
e pe
rcep
tion
spat
ial r
elat
ions
body
kins
hip
mot
ion
phys
ical
wor
ld
emot
ions
quan
tity
time
actio
ns a
nd te
chno
logy
cogn
ition
spee
ch
anim
als
poss
essi
on
war
fare
and
hun
ting
soci
al a
nd p
oliti
cal r
elat
ions
food
and
drin
k
agric
ultu
re law
hous
e
clot
hing
relig
ion
and
belie
f
mod
ern
wor
ld
UrumRussianGeorgianGreek
Comments
- majority of borrowings from Russian: 1037 translations (i.e., 24.1%).
- less borrowings from Georgian (in particular semantic fields, e.g., food and
drinking): 77 translations (i.e., 1.8%);
- very few borrowings from Greek (in highly culture-specific fields, e.g.,
religion): 10 translations (i.e., .2%).
The subset of “Urum” words contains diverse groups:
(a) Turkish words (1935 out of 2254 decoded tokens, i.e., 85,8%)
(b) Old Turkish words (73 out of 2254 decoded tokens, i.e., 3,2%)
(c) not yet identified (246 out of 2254 decoded tokens, i.e., 10,9%)
Based on the cross-linguistic data of the WOLD database, we obtain an index of
borrowability for each concept (n of languages in which this concept is encoded
through a borrowing/n total). E.g.,
Caucasian Urum
8
(1) a. brother 0,06
b. fish 0,15
c. mouse 0,18
d. potato 0,42
e. trousers 0,56
f. car 0,79
The index of borrowability gives us a tool for the estimation of the developments in
the conservative/innovative parts of the lexicon.
Fig. 6. Likelihood of borrowing and origin of Urum words
(2606 translated words; words occuring in both languages excluded)
0
25
50
75
100
0 - ,2] ,2 - ,4] ,4 - ,6] ,6 - ,8] ,8 - 1]
likelihood of borrowing in 48 languages
% o
f bor
row
ed w
ords
out
of n
wor
ds
RussianTurkish
3. Sentences
Research questions
• In which clausal environment do Urum speakers select a particular
inflectional category?
• What are the basic syntactic properties of Urum syntax?
• What are the similarities and differences between the Urum clause
structure and the clause structure in the other languages at issue (Turkish,
Georgian, Russian, Pontic Greek)?
S. Skopeteas
9
Method
- Sentence list by Suarez (820 sentences).
- 4 native speakers
Fig. 7. Entry in Sentences
4. Texts
Research questions
• How do speakers select words and syntactic structures in naturalistic data
(narratives)?
• What can we learn about the frequencies of particular linguistic properties
in discourse?
• Is there variation between speakers?
Method
Cheese story
Instruction: Please tell me how you make cheese in Tshalka. (Do not worry if there are
some details that you do not know just tell me everything you consider necessary.)
Path description
Instruction: Please describe the path to go from Beshtasheni to Hadik to me. Please
give exact descriptions, so that we can recognize the path that we have to follow (by
telling me about all the important places on the way to Hadik, e.g., characteristic
houses, trees, crossroads, etc.).
Caucasian Urum
10
The story of the ancestors
Instruction: Please tell me the story of how the Urum people came to the Caucasus. It is
not a problem if you are not sure about the historical details. Just tell me the story of
your ancestors as far as you know it and include all the details you consider necessary.
Modern life story
Instruction: Please tell me about the changes in the situation of the Urum people in the
last twenty years. The best way to do this is to start by telling me about your and/or
your families experiences at the time Georgia became independent. Try to remember
those events and tell me about the course of events until today. Please take your time in
doing this and give me all the details you consider important, because I am interested in
everything that is important for you.
Peer story
Instruction: You are going to see a film twice. Please notice what happens in the film
and tell me the story. Try to remember as many details as you can.
Data
- The 5 texts have been recorded with 16 native speakers
- Total: 80 parallel narratives
Fig. 8. Entry in Text
Illustrative generalizations
The plural morpheme is a modifier that is used if it is relevant and not obvious in the
context. Beginning from nouns, when a quantifier/numeral encodes plurality, the
specified noun is most frequently not morphologically marked for plural, see, e.g., (2)
(observed in 14 tokens out of total 16 quantified noun phrases, i.e., 87.5%). This
phenomenon is known for a large array of languages with concatenative morphology
(see Corbett 2004: 211).
S. Skopeteas
11
(2) a. ušax-lar
child-PL
‘children’
b. uč ušax
three child
‘three children’ (speaker 21)
Plural subjects (either with a quantifier/numeral or with a plural suffix) can be cross-
referenced by the plural suffix on the verb, e.g., (3a). However, we observe that the
plural marker is omitted elsewhere in the corpus, e.g., (3b). The crucial question for
the grammatical description is whether the variation observed in these examples is
random (e.g., resulting from the varying choice of the individual speakers) or it is
determined by some grammatically relevant factor.
(3) a. uč ušax-ta gäl-dɴ-lar.
three child-and come-PAST-3.PL
‘and three children came’ (speaker 21)
b. uč ušax-ta gäl-di.
three child-and come-PAST
‘and three children came’ (speaker 28)
Motivated by the established typologies of number distinctions, we hypothesize that
the likelihood of plural marking on the verb depends on the mental representation of
the referents, such that highly individuated referents are more likely to be cross-
referenced by a plural affix on the verb. This hypothesis predicts an animate-
inanimate asymmetry in the marking of plurality (see Smith-Stark 1974; Lucy 1992;
Corbett 2004: 70).
The empirical investigation of our corpus confirms our expectations: 39 out of 63
animate plural subjects (61,9%) are cross-referenced by a plural affix on the verb,
while this is the case only for 3 out of 16 (18,8%) inanimate subjects. Moreover, our
corpus allows us to examine whether this empirical difference is independent from the
Caucasian Urum
12
speaker variation. We found tokens of both conditions at issue (animate plural
subjects, inanimate plural subjects) in twelve speakers and were able to run a paired-
samples t-test, which revealed that the observed difference is beyond the chance level
(t11 = 3,7, p < .003).
Fig. 9. Percentages of plural agreement for animates and inanimates
(Y-bars indicate standard error of the mean values)
0
25
50
75
100
animates inanimates
% o
f plu
ral a
gree
men
t
However, plotting the production of plural agreement per speaker reveals that our data
are not canonically distributed. The distribution is bimodal, i.e., the sample is
distributed around two central values: a group of speakers is distributed around the
33,3-50% segment of plural agreement proportions and a second group of speakers is
distributed around the 83,3-100% segment.
S. Skopeteas
13
Fig. 10. Proportions of plural agreement and n of speakers
0
1
2
3
4
5
6
0-16,6% 16,6-33,3% 33,3-50% 50-66,6% 66,6-83,3% 83,3-100%
% of plural agreement
n of
spea
kers
Fig. 10 implies that our sample contains two groups of individuals speaking two
different grammars: the former grammar has optional plural marking while the latter
has near obligatory plural marking. This means that Fig. 9 involves a confounding
between two types of information: the difference depending on the type of entity
(animate vs. inanimate) and the difference between groups. I.e., the question ‘is there
a difference between the production of plural agreement between animates and
inanimates’ has to be answered for each subgroup of speakers separately, as in Fig.
11. Indeed this view on the data reveals a quite different empirical situation. There is a
group A of speakers that produce plural optionally, and a group B for which plural
agreement is near obligatory. The crucial issue is that the Group A speakers did never
produced plural agreement with inanimate entities, i.e., the gradient difference
between animates and inanimates in our data (see Fig. 9) does not result from the
lower likelihood of plural with inanimates, but from the lower proportion of speakers
of Group B in our speaker sample (if we had only speakers of the B-group in our
sample the gradience would not be visible). Group A speakers display a categorical
pattern that is not visible in the average result. They optionally produce plural
agreement with animates and never produce plural agreement with inanimates.
Caucasian Urum
14
Fig. 11. Percentages of plural agreement per speaker group
(Y-bars indicate standard error of the mean values)
(a) Speakers’ Group A
0
25
50
75
100
animates inanimates
% o
f plu
ral a
gree
men
t
(b) Speakers’ Group B
0
25
50
75
100
animates inanimates%
of p
lura
l agr
eem
ent
5. Community
(by Eleni Sella)
Research questions
• What are the sociolinguistic properties of the Urum community?
• Which second and third languages do Urum people speak?
• In which communicative situations do Urum people use their language?
Method
In order to answer these questions, we developed a detailed sociolinguistic
questionnaire. This questionnaire contains:
• Biographical details;
• Language competence;
• Use of the language in several fields of communication;
• Attitude towards the language;
• Self estimation of the fluency in Urum.
30 native speakers were interviewed with this questionnaire.
S. Skopeteas
15
Data
The questionnaire was designed as a set of multiple-choice questions allowing for
selection of more than one option. The interviews were conducted in Urum. E.g.,
You are using Urum:
Говорите на Урум
a. with the parents
С родителями,
b. with the grandparents
С дедушкой, бабушкой
c. with your children
С вашими детьми
d. with the neighbours
С соседями
e. at work
На работе
f. with your friends
С вашими друзьями
g. in other occasions. Where?
В другом месте. Где?
Fig. 12. Primary language in social interactions
(percentages of 30 native speakers’ estimations)
0%
25%
50%
75%
100%
grandparents parents siblings spouse children friends colleagues
ArmenianGreekGeorgianRussianUrum
Caucasian Urum
16
Comments
- The frequency of language use decreases across generations:
grandparents > parents > siblings/spouse > children.
- The frequency of language use decreases with social distance:
relatives > friends > colleagues.
References
Bowern, Claire 2008, Linguistic Fieldwork: A practical guide. New York: Palgrave
Macmillan.
Corbett, G. 2004, Number, Cambridge: Cambridge University Press.
Gippert, Jost, Nikolaus P. Himmelmann, and Ulrike Mosel (eds.) 2006, Essentials of
language documentation. Berlin: Mouton De Gruyter.
Haspelmath, Martin and Uri Tadmor (eds.) 2009, Loanwords in the World's
languages: A Comparative Handbook. Berlin: Mouton De Gruyter.
Haviland, John 2006, Documenting lexical knowledge. In Gippert et al. (eds.), 129-
162.
Levinson, Stephen C. 2003, Space in Language and Cognition. Explorations in
Cognitive Diversity. Cambridge: Cambridge University Press.
Lucy, J. 1992, Grammatical categories and cognition, Cambridge: Cambridge
University Press.
Mosel, Ulrike 2006, Fieldwork and community language work. In Gippert et al.
(eds.), 67-85.
Roudik, Peter L. 2009, Culture and Customs of the Caucasus. Westport: Greenwood.
Smith-Stark, T. 1974, The plurality split, Chicago Linguistic Society 10, 657-671.
Snider, K. and J. Roberts 2006, SIL Comparative African Wordlist (SILCAWL).
Dallas: SIL International.
Swadesh, Morris 1952, Lexicostatistic dating of prehistoric ethnic contacts.
Proceedings of the American Philosophical Society 96,152-63
Wheatley, J. 2006, Defusing conflict in Tsalka district of Georgia: migration,
international intervention and the role of the state. European Centre for Minority
Issues, Working Paper 36.
Top Related