Digital Humanities 101 - 2013/2014 - Course 9
Digital Humanities Laboratory
Frederic Kaplan
Semester 1 : Content of each course
• (1) 19.09 Introduction to the course / Live Tweeting and Collective note
taking
• (2) 25.09 Introduction to Digital Humanities / Wordpress / First assignment
• (3) 2.10 Introduction to the Venice Time Machine project / Zotero
•9.10 No course
• (4) 16.10 Digitization techniques / Deadline first assignment
• (5) 23.10 Datafication / Presentation of projects
• (6) 30.10 Semantic modelling / RDF / Deadline peer-reviewing of first
assignment
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 2o
Semester 1 : Content of each course
• (7) 6.11 Pattern recognition / OCR / Semantic disambiguation
• (8) 13.11 Historical Geographic Information Systems, Procedural modeling /
City Engine / Deadline Project selection
• (9) 20.11 Crowdsourcing / Gamefication / Wikipedia
• (10) 27.11 Cultural heritage interfaces and visualisation / Museographic
experiences
•4.12 Group work on the projects
•11.12 Oral exam / Presentation of projects / Deadline Project blog
•18.12 Oral exam / Presentation of projects
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 3o
Today’s course
•Objective of the course : Answering two questions : Why do projects rely on
crowdsourcing ? Why do people participate in crowdsourced projects ?
•Why do projects rely on crowdsourcing ?
•Case study : Transcribing handwritten texts using mechanical turk
•Case study : Crowdfunding a scientific project
•Why do people participate in crowdsourced projects ?
•Case study : Climbing the Wikipedia pyramid
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 4o
Crowdsourcing
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 5o
FromWikipedia
•”Crowdsourcing is the practice of
obtaining needed services, ideas, or
content by soliciting contributions from
a large group of people, and especially
from an online community, rather than
from traditional employees or suppliers”
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 6o
FromWikipedia
• ’The term was coined in 2006 by Jeff
Howe in a Wired article, The Rise of
Crowdsourcing. http://www.wired.
com/wired/archive/14.06/crowds.
html?pg=1&topic=crowds
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 7o
Why do projects rely on crowdsourcing ? Why do peopleparticipate in crowdsourced projects ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 8o
Why do projects rely on crowdsourcing ?
•Because its free or cheap (cf. Amazon’s Mechanical Turk)
•Because it permits to have a better engagement of users (or leaners in the
case of peer-grading)•Because it permits to harness the wisdom of the crowds• cf. Claire Ross, Social media for digital humanities and community engagement, in Warwick,
Terras, Nyhan, Digital Humanities in Practice, Facet Publishing, 2012.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 9o
The wisdom of the crowds
•Surowiecki’s four criterias (2004)
•Diversity : Each participant has different
background and perspectives
• Independence : Each participant makes
their own decision
•Decentralization : Descision are local, no
central planner
•Aggregation : A way to turn individual
judgements into collective decisions.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 10o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 11o
A case study : crowdsourced transcription
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 12o
UCL Transcribe Betham
•60 000 manuscripts of Jeremy Bentham
(1748-1832)
•20 000 already transcribed using
traditoinal approach, 40 000 to go
•TEI Encoding. Use MediaWiki
•5 000 manuscripts transcribed (06-2013)
•33 000 volunteers but a very limited
number of very productive and dedicated
users
•Crowdsifting instead of crowdsourcing
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 13o
ReCaptcha : A free anti-bot service
•From http://www.google.com/
recaptcha/learnmore
•200+ million CAPTCHAs are solved by
humans around the world every day.
•10 s / CAPTCHA
•150 000 hours of work each day
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 14o
ReCaptcha : A free anti-bot service
• reCAPTCHA improves the process of
digitizing books by sending words that
cannot be read by computers to the
Web in the form of CAPTCHAs for
humans to decipher.
•But if a computer can’t read such a
CAPTCHA, how does the system know
the correct answer to the puzzle ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 15o
ReCaptcha : A free anti-bot service
•Each new word that cannot be read
correctly by OCR is given to a user in
conjunction with another word for which
the answer is already known.
• If they solve the one for which the
answer is known, the system assumes
their answer is correct for the new one.
The system then gives the new image to
a number of other people to determine,
with higher confidence, whether the
original answer was correct.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 16o
Canwe useMechanical Turk to do this ?
•Who knows where the name Mechanical Turk comes from ?
•Mechanical Turk permits to perform Human Intelligence Tasks (HITs)
•A requester is presented with many different templates from which to choose
in the design of a HIT which include a writing, survey, translation,
categorization, and other templates.
•500 000 workers from over 190 countries in January 2011.
•Payments are done with Amazon Payments. Requesters pay 10 % of the price
of successfully completed HITs to Amazon
•The average wage is about one dollar an hour (each task averaging a few
cents). Some have criticized Mechanical Turk as a digital sweatshop. We will
discuss this more at the end of this lecture.
•Problem for us : You need an american address to use Mechanical Turk.my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 17o
Crowdflower : ameta-engine for crowdsourcing
•Crowdflower plays the role of meta-engine or interface to several
crowdsourcing services.
•CrowdFlower has over 50 labor channel partners, among them Amazon
Mechanical Turk
•1 billion tasks (small units of work) since it began operation, and presently
does 5 man-years of work daily (Source : Wikipedia 19/11/2013)
•So let’s try it.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 18o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 19o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 20o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 21o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 22o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 23o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 24o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 25o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 26o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 27o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 28o
Combining crowdsourcing and grammatical rules
•Raw crowdsourced words transcriptions are likely to contain many errors
•But we also have a good grammatical model of this venetian dialect (Thanks
to the work of Lorenzo Tomasin) and a lot of venetian transcriptions.
•Many errors could be automatically corrected using these bits of information.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 29o
Survey : Do you want to use crowdsourcing in your nextsemester’s project ? Should the DHLAB sponsor this ?Answer on Framapad
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 30o
What about crowdfunding your research project ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 31o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 32o
Crowdfunding in general
•Kickstarter : 5.2 million people have pledged 882 million, funding 52 000
projects.
•Kiva : 600 000+ lenders have channelled almost 275 million to entrepreneurs
in the developing world.
•Obama’s 2008 election campaign : 780 million, much of it from small online
donations.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 33o
Example of a scientific Kickstarter projecthttp://www.kickstarter.com/projects/1616707907/virtual-prehistoric-worlds
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 34o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 35o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 36o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 37o
Crowdfunding sites
• Indiegogo : http://www.indiegogo.com/
•France : http://www.ulule.com/
•Switzerland : http://wemakeit.ch/
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 38o
After the pause, we will talk about Wikipedia andGamification. In the meantime you can try Wikiracehttp://wikirace.christopherdebeer.com/
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 39o
Why do people participate in crowdsourcing projects ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 40o
OpenStreetMap
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 41o
Haiti’s OSM before and after the earthquake (800+ changes)
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 42o
Muchmore precise than Googlemaps
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 43o
Because the data is open, new layers can be added
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 44o
OSM mappers seem intrinsically motivated for buildingcontent together
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 45o
Is is the same for Wikipedian users ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 46o
Wikipedia demonstration
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 47o
IsWikipedia a good resource ?
•Some academics argue that the use of Wikipedia is not appropriate forscholarly settings, because it is collectively built by amateurs.•Achterman, D. (2005) Surviving Wikipedia : improving student search habits through information
literacy and teacher collaboration, Knowdelge Quest, 33 (5), 38-40
•Black, E. (2007) Wikipedia and Academic Peer Review : Wikipedia as a recognized medium for
scholarly publications ? Online Information Review, 32 (1), 73-88
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 48o
Wikipedia is in perpetual beta, constantly getting better
•Wikipedia is be updated and improved at a much faster rythm that other
scholarly edited encyclopedias.
• It improves all the time.•Several recent studies have shown that Wikipedia can equal or outperform
other traditionally edited encyclopedias in terms of accuracy.•Giles, J. (2005), Internet Encyclopedia go Head to Head, Nature, 438, 900-1
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 49o
Wikipedia creates a diplomatic zone
•Wikipedia manages to create a diplomatic zone, where conflicts betweendifferent perspectives can be solved in search of a common neutral consensus.This is a definitive advantage compared to other static (online or printed)encyclopedias.•For diplomacy in general, see Bruno Latour, Enquetes sur les modes d’existence : Une
anthropologie des modernes, La Decouverte, 2012.
•Bryant, S. et al (2005) Becoming Wikpedian, In Group 05, 1-10, ACM Press
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 50o
Wikipedia is felt as common good
• It is backed-up by many users all over the world
•Therefore, it is one of the rare digital resources that is bound to last.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 51o
Why Wikipedia works ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 52o
Hypothesis : Wikipedia is a game
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 53o
Foursquare is game and amapping service
• In recent years, we have seen several
examples of successful creation of
collective knowledge bases using
addictive games.
•This is a particular case of Gamification
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 54o
Twitter is a game
•One could argue that services for
sharing/constructing collective
knowledge online are also games (even if
they are not presented as such).
•The success of Twitter is linked with its
smooth Onboarding process
•We discussed this case on course 1.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 55o
Quora is a game
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 56o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 57o
Quora’s strategy
•Quora must attract qualified contributors to write high quality answers to
questions.
•Can you imagine some strategy to reach this goal ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 58o
To reach this goal, Quora chose a very clear strategy :personalize the answers, anonymize the questions
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 59o
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 60o
Quora’s strategy
•Questions are not owned by the person who asks them.
•They are immediatly treated as a common goods, that can be updated and
modified by anyone.
•On the contrary, the interface associates strongly the user and his answers.
•The systematic juxtaposition between the id of the user (incl. pictures, name
and short bio) and his answers introduces an equivalence between the value of
an user and the value of his answers.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 61o
Quora’s strategy
• In addition, Quora introduces an explicit ranking system : the best rated
answers are shown first.
•Each question is thus a competition between Quora’s users.
•The one who provides the best answer wins the game.
•Like in Twitter, the user understands Quora’s implicit rules as he plays and
learns what he must do to play well in this particular kind of games.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 62o
What kind of game is Wikipedia ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 63o
Wikipedia is MMORPG (Massively Multiplayer Online RolePlaying Games)
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 64o
Wikipedia
•Onboarding : No need to be identified to start contributing. But this is
necessary to climb the tiers.
•Registering is like reaching level 1
•By registering, the user gets a few new powers. He can have his own webpage.
He can vote.
•These are first steps to motivate him to progressively discover and climb the
levels of the big pyramid associated with each version of Wikipedia.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 65o
Wikipedia
•How can one climb the tiers ? What kind of privilege have the more powerful
users ? The new contributor does not know it yet.
• If he persists he will discover that he can exercice different jobs in the
Wikipedia world.
•Administrators, Bureaucrats, Stewards, Mediators, Judge, Bot creator,
Importator, Oversighter, IP Checkers.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 66o
Administrators
•Administrators are responsible for cleaning particular pages, checking
copyright issues, repair vandalism acts.•All this tasks can be done by a normal user, but an administrator has access
to special powers• erase non relevant pages
• protect some pages against change
• block certain users
• rename pages
•mask the history of particular pages.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 67o
Administrators
•How does one become an administrator ? He needs to be elected.•The following criteria are recommended :• a very good understanding of the wiki syntax, rules and global functioning of the local version of
Wikipedia.
• participation to maintenance works
• around 3000 participations
• at least one year of significant activity
•The election is set on a given day and the candidate must obtain a clear
majority (this notion is not absolutely well defined in the French version of
Wikipedia)
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 68o
IP Checker
•An IP Checker has access to the check-user function that permits to make
explicit the connection between an user IP and his account. To become an IP
Checker, one must be approved by the arbitration committee.
•Only 5 persons have this privilege on the French version of Wikipedia.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 69o
Oversighter
•Oversighter can mask a username from all the public records
•mask a comment
•mask a version of a page
• suppress a page and mask it even to administrator
• see oversighter’s special records
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 70o
Bots creators
•Among the 30 most active editors on Wikipedia, 2/3 are bots
•Bots perform repetitive tasks and can interact on Wikipedia pages like a real
Wikipedia user (generate article, edit or destroy an article, translate part of an
article, solve homonymy issues, correct vandalism acts)
•Only a bureaucrat or a steward can allow someone to be a bot creator.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 71o
Bureaucrats
•Bureaucrats manage the status of other users (administrators, bots,
bureaucrats).
•Only 8 persons have this privilege on the French version of Wikipedia.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 72o
Stewards are super bureaucrats
•Stewards are appointed by the international comity. They can manage the
status of all the others contributors.
•There are only 3 stewards on the French version of Wikipedia.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 73o
Mediators
•They can intervene during the fights but cannot vote or recommend a
punitive action.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 74o
Judge
•They can impose a punitive action
•The ArbCom (Arbitration Committee) of the English version of Wikipedia has
only 15 members.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 75o
Wikipedia has also its foundational stories
•The Essjay’s controversy : Essjay was an eminent member of the Wikicratia,
cumulating the functions of administrators, bureaucrats, judge and mediators.
He was caught lying on his bio in this Wikipedia personal page and was
banned.
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 76o
World of Warcraft is so boring compared to Wikipedia
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 77o
World ofWarcraft is so boring compared toWikipedia
•Ordinary clercks during the day, Wikipedian during the night.
•On Wikipedia, with time and perseverance each player can have a double life,
masked behind his pseudo. He can earn new powers as hardly obtained as one
of a big magician in role playing heroic fantasy games.
•When I wrote a first blog post on this issue, a French Wikipedia Bureaucrat
pointed to me a relatively well hidden page describing Wikipedia as
MMORPG. http://fr.wikipedia.org/wiki/Wikipedia:MMORPG
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 78o
Open Debate : is crowdsourcing and gamificiationethical ?
my header
Digital Humanities 101 - 2013/2014 - Course 9 | 2013 79o
Top Related