Post on 28-Jul-2015
Cross-language Wikipedia Editing of Okinawa, Japan
Scott A. HaleOxford Internet Institute
http://www.scotthale.net/pubs/?chi2015
@computermacgyve
20 April 2015
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
How can user-generated content platforms be multilingualwithout fragmenting users too thinly across languages?
Large difference in information available in different languages (Hecht& Gergle, 2010; Hong, Convertino, & Chi, 2011)
Language large barrier to the spread of local information (Sen et al.,2015)
Do multilingual users bridge language divides? (Hale, 2014b)
15% of Wikipedia users edit multiple language editions
Multilingual users are more active than monolingual usersbut mainly more active in their first/primary language
Unclear how much they transfer information between languages(nearly half edit different articles in different languages)
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Research questions
1 What articles do multilinguals edit in their non-primary languages?
2 What types of edits do multilingual users make in their non-primarylanguages?
3 How valuable are the contributions by multilingual users in theirnon-primary languages?
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Data
Full edit histories of Okinawa-related Wikipedia articles
Japanese and English editions
Articles on the same concept connected via inter-language links(WikiData)
Users connected across languages via global accounts (CentralAuthdatabase)
Bots & malicious users removed based on userpage content, usergroups, or being banned within one year from data collection
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Okinawa
Japan
China
SouthKorea
Taiwan
Okinawa
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Independent kingdom until thelate 1800’s
Administered by the US1945–1972
Large number of US military,contractors, and dependentsliving on the islands today
Article landscape
Sample en-only ja-only Both
Geotag 52 185 152Category 156 2,819 707Article link 3,411 9,984 5,567
Table: The number of unique concepts in each sample. The majority of conceptshave an article either only in the English edition or only in the Japanese edition(en-only or ja-only), while a smaller number of concepts have articles in both theEnglish and Japanese editions (Both).
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Data overview
Total users Total articles editedCount % Count %
English editionAnonymous 192,839 73.4% 216,840 46.15%Local account 15,008 5.7% 58,689 12.49%Pri. English 50,038 19.0% 179,951 38.30%Pri. Japanese 466 0.2% 1,488 0.32%Pri. Other 4,341 1.7% 12,911 2.75%Totals 262,692 100.0% 469,879 100.0%
Japanese editionAnonymous 372,852 88.4% 717,608 62.74%Local account 9,945 2.4% 109,765 9.60%Pri. English 558 0.1% 5,531 0.48%Pri. Japanese 37,191 8.8% 301,980 26.40%Pri. Other 1,174 0.3% 8,954 0.78%Totals 421,720 100.0% 1,143,838 100.0%
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Research questions
1 What articles do multilinguals edit in their non-primarylanguages?
2 What types of edits do multilingual users make in their non-primarylanguages?
3 How valuable are the contributions by multilingual users in theirnon-primary languages?
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Multilinguals edit articles with versions in both languages
69%
11%
89%
64%
89%
75%
64%
58%
55% 45%
42%
36%
25%
36%
11%
31%
English users editing the Japanese edition are far less likely than other users to edit articlesthat only appear in Japanese. Similarly, Japanese users editing the English edition are farless likely than other users to edit articles that only appear in English.
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Edit popular articles
# of Japanese users editing English # of English users editing JapaneseEstimate (Standard error) Estimate (Standard error)
Exists in both languages 0.641∗∗∗ (0.024) 3.285∗∗∗ (0.034)Total number of editors 0.001∗∗∗ (0.0001) 0.003∗∗∗ (0.0001)PageRank 0.014∗∗∗ (0.0005) 0.245∗∗∗ (0.006)Number of images 0.003∗∗∗ (0.001) 0.054∗∗∗ (0.002)Number of external links 0.001∗∗∗ (0.0003) −0.0003 (0.0004)Constant 0.008 (0.015) 0.029 (0.019)
Observations 5,441 14,825Adjusted R2 0.348 0.572Residual Std. Error 0.849 (df = 5435) 1.828 (df = 14819)
∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
Table: Linear regression results fitting the number of primary Japanese users editingeach English article and the number of primary English users editing each Japanesearticle.
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Research questions
1 What articles do multilinguals edit in their non-primary languages?2 What types of edits do multilingual users make in their
non-primary languages?1 Edit size2 Content changes
3 How valuable are the contributions by multilingual users in theirnon-primary languages?
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Edit size
Measuring size
More than byte difference
Use sub-score from WikiTrust algorithms (Adler, Chatterjee, et al.,2008; Adler & Alfaro, 2007; Adler, Alfaro, Pye, & Raman, 2008)
Mecab to determine Japanese word boundaries
1 point for each word added or deleted
0.5 point for each edited word
0 < x < 1 point for moving a word a fraction x of the normalized pagelength
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Edit size
Figure: Density plots for non-anonymous users editing articles in the Japanese (left)and English (right) editions grouped by their primary language editions. Verticallines indicate distribution means.
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Content changes: Exploratory qualitative coding of edits
Up to 5 randomly chosen edits from 70 randomly chosen multilingual users(35 primary editors of English and 35 primary editors of Japanese)
Emergent coding of edits into 6 non-exclusive categoriesAddition Adding new text or references to an existing article or
creating a new article
Maintenance Adding, removing, or adjusting templates, categories,links in a “See Also” section, or whitespace changesthat did not alter text
Deletion/reversion Reverting an edit or deleting text from an article
Image-related Adding, altering, or removing an image
Interlanguage-links Altering interlanguage links
Change Edits that changed existing text such as correctingspelling errors or updating facts that had changed likethe latest winner of an annual sports tournament
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Content changes
Edit category Pri. lang. Non-pri. lang. p-val�
Addition 97 31% 47 26% 0.25Maintenance 103 33% 44 24% 0.04Deletion/Reversion 37 12% 11 6% 0.03Image-related 27 9% 32 18% 0.01Interlanguage links 8 3% 32 18% 0.00Change 65 21% 34 19% 0.62
Total edits� 315 181
Table: Exploratory, qualitative coding of edits in users’ primary languages (pri.lang.) and non-primary languages (non-pri. lang.).�p-values are for two-tailed t-tests on difference of percentage means.�Some edits are assigned to multiple categories and, therefore, the column sums aregreater than the total number of edits reported.
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Research questions
1 What articles do multilinguals edit in their non-primary languages?
2 What types of edits do multilingual users make in their non-primarylanguages?
3 How valuable are the contributions by multilingual users in theirnon-primary languages?
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Edit value
Measuring edit value
Many ways users contribute value
Simple, quantitative measure is how much of an edit is retained bysubsequent editors
Computed using WikiTrust edit survival scores for next six edits (Adler,Chatterjee, et al., 2008; Adler & Alfaro, 2007; Adler, Alfaro, et al.,2008)
No significant difference
Text from edits made by non-primary editors survived at a similar rate to thetext from edits made by users who primarily edited each edition.
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Research questions
1 What articles do multilinguals edit in their non-primary languages?
Articles that also exist in their primary languageArticles that have more edits / editors overallArticles with more images
2 What types of edits do multilingual users make in their non-primarylanguages?
Smaller-sized editsMore image-related and interlanguage link-related editsFewer deletion/reversion and maintenance editsSimilar amounts of additions and change editsUnique contributions related to language (e.g., 15% of Japanese edits inEnglish added/corrected Japanese characters/romanizations)
3 How valuable are the contributions by multilingual users in theirnon-primary languages?
No significant differences.
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Implications and future directions
Future
Additional languages, larger-scale classification of edit types
Okinawa is probably a “hard case”
Very different languages, writing systemsJapanese users consistently been observed to engage less withother-language content (Hale, 2014a, 2014b)
Implications
Importance of holistic, cross-language measurement of reputation forawarding badges, etc.
Images/multimedia as good cross-language starter tasks
Discovery of related other-language content is a barrier?
No multilingual search or recommendation on WikipediaMultilingual users are good candidates for recommendation given theyoverlap to some extent with “power users” (Huang, Suh, Hill, & Hsieh,2015)Implications to translation environment being developed by WikimediaFoundation
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Implications and future directions
Future
Additional languages, larger-scale classification of edit types
Okinawa is probably a “hard case”
Very different languages, writing systemsJapanese users consistently been observed to engage less withother-language content (Hale, 2014a, 2014b)
Implications
Importance of holistic, cross-language measurement of reputation forawarding badges, etc.
Images/multimedia as good cross-language starter tasks
Discovery of related other-language content is a barrier?
No multilingual search or recommendation on WikipediaMultilingual users are good candidates for recommendation given theyoverlap to some extent with “power users” (Huang et al., 2015)Implications to translation environment being developed by WikimediaFoundation
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Cross-language Wikipedia Editing of Okinawa, Japan
Scott A. HaleOxford Internet Institute
http://www.scotthale.net/pubs/?chi2015
@computermacgyve
20 April 2015
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
I would like to thank Eric T. Meyer, Taha Yasseri, and Alolita Sharma as well as theanonymous CHI reviewers who provided helpful comments on previous versions of thisarticle.
Adler, B. T., & Alfaro, L. de. (2007). A content-driven reputation systemfor the Wikipedia. In Proceedings of the 16th international conferenceon world wide web (pp. 261–270). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1242572.1242608
Adler, B. T., Alfaro, L. de, Pye, I., & Raman, V. (2008). Measuring authorcontributions to the Wikipedia. In Proceedings of the 4th internationalsymposium on wikis (pp. 15:1–15:10). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1822258.1822279
Adler, B. T., Chatterjee, K., Alfaro, L. de, Faella, M., Pye, I., & Raman, V.(2008). Assigning trust to Wikipedia content. In Proceedings of the4th international symposium on wikis (pp. 26:1–26:12). New York,NY, USA: ACM. Available fromhttp://doi.acm.org/10.1145/1822258.1822293
Hale, S. A. (2014a). Global connectivity and multilinguals in the Twitternetwork. In Proceedings of the sigchi conference on human factors incomputing systems (pp. 833–842). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/2556288.2557203
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan
Hale, S. A. (2014b). Multilinguals and Wikipedia editing. In Proceedings ofthe 6th annual acm web science conference. New York, NY, USA:ACM. Available from http://arxiv.org/abs/1312.0976
Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0:User-generated content and its applications in a multilingual context.In Proceedings of the 28th international conference on human factorsin computing systems (pp. 291–300). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1753326.1753370
Hong, L., Convertino, G., & Chi, E. (2011). Language matters in Twitter:A large scale study. In International AAAI conference on weblogs andsocial media (pp. 518–521). Available from http://www.aaai.org/
ocs/index.php/ICWSM/ICWSM11/paper/view/2856
Huang, S.-W., Suh, M., Hill, B. M., & Hsieh, G. (2015). How activists areboth born and made: An analysis of users on change.org. InProceedings of the 29th international conference on human factors incomputing systems.
Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan