Wikipedia as controlled vocabulary
-
Upload
guest2c797e -
Category
Technology
-
view
9.137 -
download
0
description
Transcript of Wikipedia as controlled vocabulary
![Page 1: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/1.jpg)
Chris SizemoreSilver OliverBBC
Wikipedia as controlled vocabulary
![Page 2: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/2.jpg)
I’m about ‘Victorians’
![Page 3: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/3.jpg)
BBC Topic Page
I’m about ‘Victorian
s’
Outside the BBC
BBC silo #1 BBC silo #3
BBC silo #2
![Page 4: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/4.jpg)
BBC Topic Page
I’m about ‘Victorian
s’
viktorianisch
V잊도 r 이안
Ελληνικά
NY Times, flickr,
wikipedia
Outside the BBC
BBC silo #1 BBC silo #3
BBC silo #2
![Page 5: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/5.jpg)
An index language exists primarily to:
![Page 6: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/6.jpg)
An index language exists primarily to:
• Allow an indexer to represent the subject matter of documents in a consistent way
![Page 7: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/7.jpg)
An index language exists primarily to:
• Allow an indexer to represent the subject matter of documents in a consistent way
• Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
![Page 8: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/8.jpg)
An index language exists primarily to:
• Allow an indexer to represent the subject matter of documents in a consistent way
• Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
• Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
![Page 9: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/9.jpg)
An index language exists primarily to:
• Allow an indexer to represent the subject matter of documents in a consistent way
• Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
• Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
F.W. LancasterVocabulary control for information retrieval
![Page 10: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/10.jpg)
Could Wikipedia be used as a universal
language for identifying subjects?
![Page 11: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/11.jpg)
Story of Wikipedia-as-CV
![Page 12: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/12.jpg)
Story of Wikipedia-as-CV: personal origins
![Page 13: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/13.jpg)
![Page 14: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/14.jpg)
Story of Wikipedia-as-CV: personal origins
We needed a system to categorise movie & TV
reviews
![Page 15: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/15.jpg)
Story of Wikipedia-as-CV: personal origins
So of course we built a categorisation system from scratch -- including its own
controlled vocab
![Page 16: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/16.jpg)
Story of Wikipedia-as-CV: personal origins
And when people saw the system, they always said: “Hey, that reminds me of
Internet Movie Database…”
![Page 17: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/17.jpg)
![Page 18: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/18.jpg)
Story of Wikipedia-as-CV: personal origins
It struck me that the way Internet Movie Database is set up isn’t dissimilar to the structure of a
thesaurus or a very flat taxonomy…
![Page 19: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/19.jpg)
Story of Wikipedia-as-CV: personal origins
But its’s one where the emphasis is on “related to”, not broader/narrower,
synonym, antonym, etc
![Page 20: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/20.jpg)
Story of Wikipedia-as-CV: personal origins
From then, I couldn’t help but be drawn to websites where the structure
is clearly:
![Page 21: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/21.jpg)
Story of Wikipedia-as-CV: personal origins
From then, I couldn’t help but be drawn to websites where the structure
is clearly: “a single primary Concept per page --
and pages for related Concepts link to each other”
![Page 22: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/22.jpg)
Story of Wikipedia-as-CV: personal origins
Could those “one Concept per page” webpages be used as “terms” as in a
controlled vocabulary?
![Page 23: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/23.jpg)
Are some websites actually “indexing
languages” in disguise?
![Page 24: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/24.jpg)
conText --a Wikipedia-as-CV auto-categoriser
prototype
![Page 25: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/25.jpg)
![Page 26: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/26.jpg)
conText -- a Wikipedia-as-CV auto-categoriser
prototype:http://sells.welcomebackstage.com:5000/item/
submit
![Page 27: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/27.jpg)
![Page 28: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/28.jpg)
Demo of conText -- a Wikipedia-as-CV auto-categoriser
prototype
![Page 29: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/29.jpg)
Demo of conText -- a Wikipedia-as-CV auto-categoriser
prototype:
Take text from audience!
![Page 30: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/30.jpg)
Wikipedia is already being used across the Web as a form of
subject identification & disambiguation, in a grassroots
way:
![Page 31: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/31.jpg)
Wikipedia is already being used across the Web as a form of
subject identification & disambiguation, in a grassroots
way:
in the form of hyperlinks embedded by authors in blog
posts, news articles, music reviews, etc everywhere!
![Page 32: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/32.jpg)
http://en.wikipedia.org/wiki/British
http://en.wikipedia.org/wiki/Science_fiction
http://en.wikipedia.org/wiki/BBC
http://en.wikipedia.org/wiki/Time_travel
http://en.wikipedia.org/wiki/Dr_who
http://en.wikipedia.org/wiki/Tardis
![Page 33: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/33.jpg)
These days, by convention, when you link to Wikipedia from your webpage, more than saying “go and have a look at this other
page”, you are more likely giving a definition to a concept referred to in your content…
![Page 34: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/34.jpg)
These days, by convention, when you link to Wikipedia from your webpage, more than saying “go and have a look at this other
page”, you are more likely giving a definition to a concept referred to in your content…
Also used in this way for specific domains are Internet Movie Database (for films & TV
programmes), MySpace (for bands), Amazon (for books), etc
![Page 35: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/35.jpg)
For general knowledge, though,
Wikipedia is becoming the Web’s defacto
controlled vocabulary
![Page 36: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/36.jpg)
http://en.wikipedia.org/wiki/Heerlen
http://en.wikipedia.org/wiki/Beethoven
http://en.wikipedia.org/wiki/Amsterdam
http://en.wikipedia.org/wiki/Van_Gogh_Museum
![Page 37: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/37.jpg)
An index language exists primarily to:
• Allow an indexer to represent the subject matter of documents in a consistent way
• Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
• Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
F.W. LancasterVocabulary control for information retrieval
![Page 38: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/38.jpg)
Wikipedia pages provide the best scope
notes in the world
![Page 39: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/39.jpg)
Wikipedia pages provide the best scope
notes in the worldWikipedia-as-CV benefits from being developed through a social process, maintained and kept
current by the Wikipedia community
![Page 40: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/40.jpg)
Wikipedia pages provide the best scope
notes in the worldWikipedia-as-CV benefits from being developed through a social process, maintained and kept
current by the Wikipedia community
Each concept represents a consensus view and its meaning can be understood simply by reading the
associated Wikipedia page
![Page 41: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/41.jpg)
Wikipedia pages provide the best scope
notes in the world
For each Concept, the document edit history, discussion around concept definition, & debate is
important here…
![Page 42: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/42.jpg)
![Page 43: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/43.jpg)
An index language exists primarily to:
• Allow an indexer to represent the subject matter of documents in a consistent way
• Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
• Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
F.W. LancasterVocabulary control for information retrieval
![Page 44: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/44.jpg)
So, we can tag pretty accurately semi-automatically with globally
unique subject identifiers using this approach…
So what?
![Page 45: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/45.jpg)
So, we can tag pretty accurately semi-automatically with globally
unique subject identifiers using this approach…
So what?
Un-silo your content repository quickly and cheaply, by connecting
it to the Web via Wikipedia
![Page 46: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/46.jpg)
![Page 47: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/47.jpg)
![Page 48: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/48.jpg)
![Page 49: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/49.jpg)
![Page 50: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/50.jpg)
Now playing vs. the Web
![Page 51: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/51.jpg)
![Page 52: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/52.jpg)
![Page 53: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/53.jpg)
Now playing vs. the Web
Why not bring in BBC Archive materials to this service via Wikipedia-as-CV tagging and linked data bridge between Wikipedia & MusicBrainz?
![Page 54: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/54.jpg)
![Page 55: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/55.jpg)
![Page 56: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/56.jpg)
By using Wikipedia-as-CV, you can get your
repository onto this diagram quickly,
for free
![Page 57: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/57.jpg)
![Page 58: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/58.jpg)
An index language exists primarily to:
• Allow an indexer to represent the subject matter of documents in a consistent way
• Bring the vocabulary used by the searcher into coincidence with the vocabulary used by the indexer
• Provide means whereby a searcher can modulate the search strategy to attain comprehensive or selective results as user needs dictate
F.W. LancasterVocabulary control for information retrieval
![Page 59: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/59.jpg)
A Web-scale, globally accessible index language accidentally exists:
![Page 60: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/60.jpg)
A Web-scale, globally accessible index language accidentally exists:
• It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
![Page 61: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/61.jpg)
A Web-scale, globally accessible index language accidentally exists:
• It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
• It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
![Page 62: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/62.jpg)
A Web-scale, globally accessible index language accidentally exists:
• It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
• It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
• It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate
![Page 63: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/63.jpg)
A Web-scale, globally accessible index language accidentally exists:
• It encourages multiple indexers across the Web to represent the subject matter of any content in a consistent way
• It brings the vocabulary used by info seekers into coincidence with the vocabulary used by indexers -- the searchers ARE indexers, and vice versa
• It provides means whereby a searcher can modulate a search and/or browse strategy to attain comprehensive or selective results as user needs dictate
• It adds Web-scale navigation & cross-reference possibilities
![Page 64: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/64.jpg)
Chris SizemoreSilver OliverBBC
Wikipedia as controlled vocabularyWikipedia is a controlled vocabulary
![Page 65: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/65.jpg)
Chris SizemoreSilver OliverBBC
Wikipedia as controlled vocabularyWikipedia is a controlled vocabulary
![Page 66: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/66.jpg)
Chris SizemoreSilver OliverBBC
Wikipedia as controlled vocabulary
Chris SizemoreSilver OliverBBC
Wikipedia is a controlled vocabulary
![Page 67: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/67.jpg)
Chris SizemoreSilver OliverBBC
Wikipedia as controlled vocabulary
Chris SizemoreSilver OliverBBC
Wikipedia is a controlled vocabulary
Much thanks!
Questions, comments, & constructive criticism?
![Page 68: Wikipedia as controlled vocabulary](https://reader035.fdocument.pub/reader035/viewer/2022062405/555136d9b4c905325d8b5388/html5/thumbnails/68.jpg)
Chris SizemoreSilver OliverBBC
Wikipedia as controlled vocabulary
http://flickr.com/photos/deniscollette/1817034358/