519502.prezentacija_webmining
-
Upload
markus-schatten -
Category
Documents
-
view
219 -
download
0
Transcript of 519502.prezentacija_webmining
-
8/2/2019 519502.prezentacija_webmining
1/107
Mining Social and Semantic Network Data on the Web
Markus Schatten, PhD
University of ZagrebFaculty of Organization and Informatics
May 4, 2011
-
8/2/2019 519502.prezentacija_webmining
2/107
Introduction
Web 2.0, Semantic Web, Web 3.0
Network science
Data collection
Semistructured data
XPath
Regular expressionsAjax
APIs
Data analysis
Network matricesFolksonomy model
Logic programming
Case study - CROSBI
2 of 54
-
8/2/2019 519502.prezentacija_webmining
3/107
Outline
Introduction
Web 2.0, Semantic Web, Web 3.0
Network science
Data collection
Data analysis
Case study - CROSBI
3 of 54
-
8/2/2019 519502.prezentacija_webmining
4/107
Questions
Why mining social and semantic network data?
How to collect these data? What are the common obstacles and how to solve them?
How to analyze these data?
4 of 54
-
8/2/2019 519502.prezentacija_webmining
5/107
Outline
Introduction
Web 2.0, Semantic Web, Web 3.0
Network science
Data collection
Data analysis
Case study - CROSBI
5 of 54
-
8/2/2019 519502.prezentacija_webmining
6/107
Web 2.0, Semantic Web, Web 3.0
Why should one bother with these buzzwords?
6 of 54
-
8/2/2019 519502.prezentacija_webmining
7/107
Why mining (social) web data?
Millions of people,
7 of 54
-
8/2/2019 519502.prezentacija_webmining
8/107
Why mining (social) web data?
Millions of people, on thousands of sites,
7 of 54
-
8/2/2019 519502.prezentacija_webmining
9/107
Why mining (social) web data?
Millions of people, on thousands of sites, across hundreds ofdiverse cultures,
7 of 54
-
8/2/2019 519502.prezentacija_webmining
10/107
Why mining (social) web data?
Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails
7 of 54
-
8/2/2019 519502.prezentacija_webmining
11/107
Why mining (social) web data?
Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails about various aspects of their lives...
7 of 54
-
8/2/2019 519502.prezentacija_webmining
12/107
Why mining (social) web data?
Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails about various aspects of their lives...
Social systems are systems of communication
(Niklas Luhman)
7 of 54
-
8/2/2019 519502.prezentacija_webmining
13/107
Why mining (social) web data?
Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails about various aspects of their lives...
Social systems are systems of communication
(Niklas Luhman)
By analyzing the trails (the results of social systems structural
coupling) we can analyze the social system itself:
7 of 54
-
8/2/2019 519502.prezentacija_webmining
14/107
Why mining (social) web data?
Millions of people, on thousands of sites, across hundreds ofdiverse cultures, leave trails about various aspects of their lives...
Social systems are systems of communication
(Niklas Luhman)
By analyzing the trails (the results of social systems structural
coupling) we can analyze the social system itself:politics, culture,
media, business, technology, science ...
7 of 54
-
8/2/2019 519502.prezentacija_webmining
15/107
Outline
Introduction
Web 2.0, Semantic Web, Web 3.0
Network science
Data collection
Data analysis
Case study - CROSBI
8 of 54
-
8/2/2019 519502.prezentacija_webmining
16/107
The new science of networks
source: The New York Times (April 3rd, 1933., p. 17).
source: An Attraction Network in a Fourth Grade Class (Moreno, Who shall survive?, 1934).
9 of 54
-
8/2/2019 519502.prezentacija_webmining
17/107
Political and financial networks
Mark Lombardi (1980s & 1990s)
10 of 54
-
8/2/2019 519502.prezentacija_webmining
18/107
Terrorist networks
11 of 54
-
8/2/2019 519502.prezentacija_webmining
19/107
Board of directors membership networks
source: http://theyrule.net
12 of 54
http://theyrule.net/http://theyrule.net/ -
8/2/2019 519502.prezentacija_webmining
20/107
On-line social networks
13 of 54
-
8/2/2019 519502.prezentacija_webmining
21/107
Internet
source: Bill Cheswick http://www.cheswick.com/ches/map/gallery/index.html
14 of 54
http://www.cheswick.com/ches/map/gallery/index.htmlhttp://www.cheswick.com/ches/map/gallery/index.html -
8/2/2019 519502.prezentacija_webmining
22/107
Air traffic networks
source: Northwest Airlines WorldTraveler Magazine
15 of 54
-
8/2/2019 519502.prezentacija_webmining
23/107
Railway networks
source: TRTA, March 2003 - Tokyo rail map
16 of 54
-
8/2/2019 519502.prezentacija_webmining
24/107
Semantic networks
source: http://wordnet.princeton.edu/man/wnlicens.7WN
17 of 54
http://wordnet.princeton.edu/man/wnlicens.7WNhttp://wordnet.princeton.edu/man/wnlicens.7WN -
8/2/2019 519502.prezentacija_webmining
25/107
Gene networks
source: http://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.en
18 of 54
http://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.enhttp://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.en -
8/2/2019 519502.prezentacija_webmining
26/107
Food chain networks
source: http://marinebio.org/Oceans/Biotic-Structure.asp19 of 54
http://marinebio.org/Oceans/Biotic-Structure.asphttp://marinebio.org/Oceans/Biotic-Structure.asp -
8/2/2019 519502.prezentacija_webmining
27/107
On-line social and semantic networks
Where can we find networks? social networking sites
20 of 54
-
8/2/2019 519502.prezentacija_webmining
28/107
On-line social and semantic networks
Where can we find networks? social networking sites
forums, buletinboards, discussion groups, mailing lists
20 of 54
-
8/2/2019 519502.prezentacija_webmining
29/107
On-line social and semantic networks
Where can we find networks? social networking sites
forums, buletinboards, discussion groups, mailing lists
blogs, microblogs, vlogs
20 of 54
-
8/2/2019 519502.prezentacija_webmining
30/107
On-line social and semantic networks
Where can we find networks? social networking sites
forums, buletinboards, discussion groups, mailing lists
blogs, microblogs, vlogs
wikis, semantic wikis
20 of 54
-
8/2/2019 519502.prezentacija_webmining
31/107
On-line social and semantic networks
Where can we find networks? social networking sites
forums, buletinboards, discussion groups, mailing lists
blogs, microblogs, vlogs
wikis, semantic wikis social tagging
20 of 54
-
8/2/2019 519502.prezentacija_webmining
32/107
On-line social and semantic networks
Where can we find networks? social networking sites
forums, buletinboards, discussion groups, mailing lists
blogs, microblogs, vlogs
wikis, semantic wikis social tagging
RSS feeds, RDF repositories, metadata collections
20 of 54
-
8/2/2019 519502.prezentacija_webmining
33/107
On-line social and semantic networks
Where can we find networks? social networking sites
forums, buletinboards, discussion groups, mailing lists
blogs, microblogs, vlogs
wikis, semantic wikis social tagging
RSS feeds, RDF repositories, metadata collections...
20 of 54
-
8/2/2019 519502.prezentacija_webmining
34/107
On-line social and semantic networks
Where can we find networks? social networking sites
forums, buletinboards, discussion groups, mailing lists
blogs, microblogs, vlogs
wikis, semantic wikis social tagging
RSS feeds, RDF repositories, metadata collections...
A social network can be found on any application where users
communicate.
20 of 54
-
8/2/2019 519502.prezentacija_webmining
35/107
On-line social and semantic networks
Where can we find networks? social networking sites
forums, buletinboards, discussion groups, mailing lists
blogs, microblogs, vlogs
wikis, semantic wikis social tagging
RSS feeds, RDF repositories, metadata collections...
A social network can be found on any application where users
communicate.A semantic network can be found in any specification where
concepts are meaningfully connected.
20 of 54
-
8/2/2019 519502.prezentacija_webmining
36/107
Outline
Introduction
Web 2.0, Semantic Web, Web 3.0
Network science
Data collectionSemistructured dataXPathRegular expressionsAjaxAPIs
Data analysis
Case study - CROSBI21 of 54
-
8/2/2019 519502.prezentacija_webmining
37/107
Common obstacles in data collection
Semi-structured data (HTML,
XHTML, XML, RDF, OWL ...)
22 of 54
-
8/2/2019 519502.prezentacija_webmining
38/107
Common obstacles in data collection
Semi-structured data (HTML,
XHTML, XML, RDF, OWL ...)
The need for pattern matching
22 of 54
-
8/2/2019 519502.prezentacija_webmining
39/107
Common obstacles in data collection
Semi-structured data (HTML,
XHTML, XML, RDF, OWL ...)
The need for pattern matching
Client-side content generation
22 of 54
-
8/2/2019 519502.prezentacija_webmining
40/107
Semistructured data and XPath
query language for semistructured data
part of the W3C standard
uses path expressions to navigate through documents
has a standard library with over 100 useful functions
Example:
/html/body/div[@id=background]/div/div[@id=mainBar]/div/p[last()-1]
23 of 54
-
8/2/2019 519502.prezentacija_webmining
41/107
XPather (Firefox add-on)
24 of 54
-
8/2/2019 519502.prezentacija_webmining
42/107
Regular expressions
Useful for extracting various formatted data and searching for
keywordsExample (date matching)
27-9-2001 [0-9]{1,2}[0-9]{1,2}[1-2][0-9]{3}
25 of 54
-
8/2/2019 519502.prezentacija_webmining
43/107
Client side generated content
Some sites use Ajax to generate data on page load or user
interaction (e.g. mouse click, mouse over etc.)
26 of 54
-
8/2/2019 519502.prezentacija_webmining
44/107
Client side generated content
Some sites use Ajax to generate data on page load or user
interaction (e.g. mouse click, mouse over etc.) In combination with obfuscated JavaScript code this is hard to
scrape!
Current solutions:
Browser scripting (can be slow and cumbersome) - e.g. withselenium
26 of 54
-
8/2/2019 519502.prezentacija_webmining
45/107
Client side generated content
Some sites use Ajax to generate data on page load or user
interaction (e.g. mouse click, mouse over etc.) In combination with obfuscated JavaScript code this is hard to
scrape!
Current solutions:
Browser scripting (can be slow and cumbersome) - e.g. withselenium
Headless browser - e.g. mechanize, spidermonkey, HtmlUnit
26 of 54
-
8/2/2019 519502.prezentacija_webmining
46/107
Client side generated content
Some sites use Ajax to generate data on page load or user
interaction (e.g. mouse click, mouse over etc.) In combination with obfuscated JavaScript code this is hard to
scrape!
Current solutions:
Browser scripting (can be slow and cumbersome) - e.g. withselenium Headless browser - e.g. mechanize, spidermonkey, HtmlUnit
26 of 54
-
8/2/2019 519502.prezentacija_webmining
47/107
Open APIs
Most popular social web sites have an open API that allow you
to connect your application to their site (Facebook, Twitter,
Wikipedia, ...)
27 of 54
O API
-
8/2/2019 519502.prezentacija_webmining
48/107
Open APIs
Most popular social web sites have an open API that allow you
to connect your application to their site (Facebook, Twitter,
Wikipedia, ...)
Still such APIs often have drawbacks for network datacollection:
27 of 54
O API
-
8/2/2019 519502.prezentacija_webmining
49/107
Open APIs
Most popular social web sites have an open API that allow you
to connect your application to their site (Facebook, Twitter,
Wikipedia, ...)
Still such APIs often have drawbacks for network datacollection: Limited access rights (e.g. Facebook)
27 of 54
O API
-
8/2/2019 519502.prezentacija_webmining
50/107
Open APIs
Most popular social web sites have an open API that allow you
to connect your application to their site (Facebook, Twitter,
Wikipedia, ...)
Still such APIs often have drawbacks for network datacollection: Limited access rights (e.g. Facebook) Limited queries per time interval (e.g. Twitter) etc.
27 of 54
O tli
-
8/2/2019 519502.prezentacija_webmining
51/107
Outline
Introduction
Web 2.0, Semantic Web, Web 3.0
Network science
Data collection
Data analysisNetwork matrices
Folksonomy modelLogic programming
Case study - CROSBI
28 of 54
-
8/2/2019 519502.prezentacija_webmining
52/107
Networks are most often analyzed using graph theory:
DefinitionA graphG is the pair(N, E) wherebyN represents the set of verticles ornodes, andE N N the set of edges connecting pairs fromN.
29 of 54
-
8/2/2019 519502.prezentacija_webmining
53/107
Networks are most often analyzed using graph theory:
DefinitionA graphG is the pair(N, E) wherebyN represents the set of verticles ornodes, andE N N the set of edges connecting pairs fromN.
DefinitionA directed graph or digraphG is the pair(N, A), wherebyN represents
the set of nodes, andA N N the set of ordered pairs of elementsfromN that represent the set of graph arcs.
29 of 54
-
8/2/2019 519502.prezentacija_webmining
54/107
Networks are most often analyzed using graph theory:
DefinitionA graphG is the pair(N, E) wherebyN represents the set of verticles ornodes, andE N N the set of edges connecting pairs fromN.
DefinitionA directed graph or digraphG is the pair(N, A), wherebyN represents
the set of nodes, andA N N the set of ordered pairs of elementsfromN that represent the set of graph arcs.
DefinitionA valued or weighted graph GV is the triple(N, A, V) wherebyNrepresents the set of nodes or verticles, A N N the set of pairs of
elements fromN that represent the set of graph arcs, and V : N R afunction that attaches values or weights to nodes.
29 of 54
-
8/2/2019 519502.prezentacija_webmining
55/107
DefinitionA bipartite graphG = (N, E) is a graph for which set of nodes thereexists an partitionN = {X, Y}, such that every arc has one end in X, aand the other in Y .
30 of 54
-
8/2/2019 519502.prezentacija_webmining
56/107
DefinitionA bipartite graphG = (N, E) is a graph for which set of nodes thereexists an partitionN = {X, Y}, such that every arc has one end in X, aand the other in Y .
DefinitionA multigraphG = (N, E) is a graph in whichE is a multiset, e.g. therecan exist multiple edges between two nodes.
30 of 54
Matrix representation
-
8/2/2019 519502.prezentacija_webmining
57/107
Matrix representation
DefinitionLetG be a graph defined with the set of nodes {n1, n2, ..., nm} and edges{e1, e2, ..., el}. For every i,j (1 i m and1 j m) we define
aij =
1, if there is an edge between nodes ni and nj
0, otherwise
Matrix A = [aij] is then the adjacency matrix of graphG. The matrix isymmetric since if there is an edge between nodes ni and nj then clearly
there is also an edge between nj and ni. Thus A = [aij] = [aji].
31 of 54
Simple graph
-
8/2/2019 519502.prezentacija_webmining
58/107
Simple graph
32 of 54
Digraph
-
8/2/2019 519502.prezentacija_webmining
59/107
Digraph
33 of 54
Digraph
-
8/2/2019 519502.prezentacija_webmining
60/107
Digraph
34 of 54
Weighted graph
-
8/2/2019 519502.prezentacija_webmining
61/107
Weighted graph
35 of 54
Bipartite graph
-
8/2/2019 519502.prezentacija_webmining
62/107
Bipartite graph
36 of 54
Multigraph
-
8/2/2019 519502.prezentacija_webmining
63/107
Multigraph
37 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
64/107
g p
Matrices can be transposed, multiplied ...
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
65/107
g p
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
66/107
g p
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
67/107
g p
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
68/107
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
69/107
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers
PT - inversed process flow
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
70/107
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers
PT - inversed process flow
OTO - matrix of workers who are responsible for the same
processes
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
71/107
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers
PT - inversed process flow
OTO - matrix of workers who are responsible for the same
processes
PO - matrix of workers who are responsible for the forthcoming
processes
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
72/107
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers
PT - inversed process flow
OTO - matrix of workers who are responsible for the same
processes
PO - matrix of workers who are responsible for the forthcoming
processesPTO - matrix of workers who are responsible for the previous
processes
38 of 54
Matrix algebra and interpretation
-
8/2/2019 519502.prezentacija_webmining
73/107
Matrices can be transposed, multiplied ...
Examples:P - process flow matrix (p p, p - number of processes)W - co-worker matrix (w w, w - number of workers)O - process responsibility matrix (p z)WW - co-workers of co-workers
PT - inversed process flow
OTO - matrix of workers who are responsible for the same
processes
PO - matrix of workers who are responsible for the forthcoming
processesPTO - matrix of workers who are responsible for the previous
processes
...
38 of 54
ACI model (Mika 2007)
-
8/2/2019 519502.prezentacija_webmining
74/107
A - actors
39 of 54
ACI model (Mika 2007)
-
8/2/2019 519502.prezentacija_webmining
75/107
A - actors C - concepts
39 of 54
ACI model (Mika 2007)
-
8/2/2019 519502.prezentacija_webmining
76/107
A - actors C - concepts
I - instances
39 of 54
ACI model (Mika 2007)
-
8/2/2019 519502.prezentacija_webmining
77/107
A - actors C - concepts
I - instances
T A C I - folksonomy
39 of 54
ACI model (Mika 2007)
-
8/2/2019 519502.prezentacija_webmining
78/107
A - actors C - concepts
I - instances
T A C I - folksonomyA tripartite hypergraph which can be reduced into three bipartite
graphs:
39 of 54
ACI model (Mika 2007)
-
8/2/2019 519502.prezentacija_webmining
79/107
A - actors C - concepts
I - instances
T A C I - folksonomyA tripartite hypergraph which can be reduced into three bipartite
graphs:
AC - network of actors and concepts
39 of 54
ACI model (Mika 2007)
-
8/2/2019 519502.prezentacija_webmining
80/107
A - actors C - concepts
I - instances
T A C I - folksonomyA tripartite hypergraph which can be reduced into three bipartite
graphs:
AC - network of actors and concepts
AI - network of actors and instances
39 of 54
ACI model (Mika 2007)
-
8/2/2019 519502.prezentacija_webmining
81/107
A - actors C - concepts
I - instances
T A C I - folksonomyA tripartite hypergraph which can be reduced into three bipartite
graphs:
AC - network of actors and concepts
AI - network of actors and instances CI - network of concepts and instances
39 of 54
Delicious - analysis
-
8/2/2019 519502.prezentacija_webmining
82/107
40 of 54
Problems with matrix representation
-
8/2/2019 519502.prezentacija_webmining
83/107
41 of 54
Problems with matrix representation
-
8/2/2019 519502.prezentacija_webmining
84/107
Only the 1-s contain information
41 of 54
Sparse matrix
-
8/2/2019 519502.prezentacija_webmining
85/107
42 of 54
Problems with sparse matrix
-
8/2/2019 519502.prezentacija_webmining
86/107
If A is connected to B, then B is connected to A (undirected
graphs)
43 of 54
Problems with sparse matrix
-
8/2/2019 519502.prezentacija_webmining
87/107
If A is connected to B, then B is connected to A (undirected
graphs)
How to find friends of friends?
43 of 54
Problems with sparse matrix
-
8/2/2019 519502.prezentacija_webmining
88/107
If A is connected to B, then B is connected to A (undirected
graphs)
How to find friends of friends? Is there a path from node A to node B? Which path is that?
43 of 54
Problems with sparse matrix
-
8/2/2019 519502.prezentacija_webmining
89/107
If A is connected to B, then B is connected to A (undirected
graphs)
How to find friends of friends? Is there a path from node A to node B? Which path is that?
How to compute the previously outlined graphs?
43 of 54
Problems with sparse matrix
-
8/2/2019 519502.prezentacija_webmining
90/107
If A is connected to B, then B is connected to A (undirected
graphs)
How to find friends of friends? Is there a path from node A to node B? Which path is that?
How to compute the previously outlined graphs?
43 of 54
Logic programming as a solution
-
8/2/2019 519502.prezentacija_webmining
91/107
Deductive languages like Prolog, FLORA-2
44 of 54
Logic programming as a solution
-
8/2/2019 519502.prezentacija_webmining
92/107
Deductive languages like Prolog, FLORA-2 Advantages:
Declarative problem solving (on a higher level) Very appropriate for expressing mathematical structures
44 of 54
Logic programming as a solution
-
8/2/2019 519502.prezentacija_webmining
93/107
Deductive languages like Prolog, FLORA-2 Advantages:
Declarative problem solving (on a higher level) Very appropriate for expressing mathematical structures
Disadvantages: Combinatoric explosion (use heuristics and predicate tabling)
44 of 54
Examples
-
8/2/2019 519502.prezentacija_webmining
94/107
If A is connected to B, then B is connected to A
link( A, B ) :- link( B, A ).
45 of 54
Examples
-
8/2/2019 519502.prezentacija_webmining
95/107
Friend of a friend
ff( A, B ) :- link( A, C ), link( C, B ).
46 of 54
Examples
-
8/2/2019 519502.prezentacija_webmining
96/107
Is there a path from node A to node B?
path( A, B ) :- link( A, B ).path( A, B ) :- link( A, C ), path( C, B ).
47 of 54
Examples
-
8/2/2019 519502.prezentacija_webmining
97/107
Which path is that?
whichpath( X, Y, [ X, Y ] ) :- link( X, Y ).
whichpath( X, Y, [ X, Z | T ] ) :-
link( X, Z ),
whichpath( Z, Y, [ Z | T ] ).
48 of 54
Outline
-
8/2/2019 519502.prezentacija_webmining
98/107
Introduction
Web 2.0, Semantic Web, Web 3.0
Network science
Data collection
Data analysis
Case study - CROSBI
49 of 54
Project outline
-
8/2/2019 519502.prezentacija_webmining
99/107
Croatian Scientific Bibliography (http://bib.irb.hr)
50 of 54
Project outline
http://bib.irb.hr/http://bib.irb.hr/ -
8/2/2019 519502.prezentacija_webmining
100/107
Croatian Scientific Bibliography (http://bib.irb.hr) Scientists enter data about their research activities by them
selves which makes it a social application
50 of 54
Project outline
http://bib.irb.hr/http://bib.irb.hr/ -
8/2/2019 519502.prezentacija_webmining
101/107
Croatian Scientific Bibliography (http://bib.irb.hr) Scientists enter data about their research activities by them
selves which makes it a social application
Research hypotheses:
50 of 54
Project outline
http://bib.irb.hr/http://bib.irb.hr/ -
8/2/2019 519502.prezentacija_webmining
102/107
Croatian Scientific Bibliography (http://bib.irb.hr) Scientists enter data about their research activities by them
selves which makes it a social application
Research hypotheses:
1. analyze two social systems (the Croatian scientific community and
the Croatian public) and see how the most important researchconcepts (as viewed by the scientific community) are interpretedby the public (does the public understand most important conceptsfrom Croatian science in the last decade?).
50 of 54
Project outline
http://bib.irb.hr/http://bib.irb.hr/ -
8/2/2019 519502.prezentacija_webmining
103/107
Croatian Scientific Bibliography (http://bib.irb.hr) Scientists enter data about their research activities by them
selves which makes it a social application
Research hypotheses:
1. analyze two social systems (the Croatian scientific community and
the Croatian public) and see how the most important researchconcepts (as viewed by the scientific community) are interpretedby the public (does the public understand most important conceptsfrom Croatian science in the last decade?).
2. analyze how various scientific fields are conceptually and socially
interrelated (do scientists do interdisciplinary research together ifthey have conceptually connected fields?)
50 of 54
Methodology
http://bib.irb.hr/http://bib.irb.hr/ -
8/2/2019 519502.prezentacija_webmining
104/107
Social systems are represented by their trails: CROSBI as the
Croatian scientific community, Croatian Wikipedia as the
Croatian public
One system can interpret concepts from another if the concept
exists within its conceptual network (e.g. keyword or wiki page)
51 of 54
Data collection
-
8/2/2019 519502.prezentacija_webmining
105/107
CROSBI data has been collected using Scrapy in November
2010 (a total of 285,234 entries has been collected)
Wikipedia data has been collected using the Wikipedia API +Scrapy for parsing
52 of 54
Results
-
8/2/2019 519502.prezentacija_webmining
106/107
53 of 54
Bibliography
-
8/2/2019 519502.prezentacija_webmining
107/107
1. Adamic, L.: Why networks are interesting to study, University of Michigan, School of Information,https://open.umich.edu/education/si/si508/fall2008
2. Barratm M., Barthlemy, M., Vespignani, A.: Dynamical Processes on Complex Networks, Cambridge UniversityPress, 2008.
3. Carley, K.M.: Dynamic Network Analysis, Summary of the NRC workshop on social network modeling andanalysis, Committee on Human Factors, National Research Council, 133145, Eds: Breiger, R., Carley, K.M., andPattison, P., 2003.
4. Divjak, B., Lovrencic, A.: Diskretna matematika s teorijom grafova, TIVA & FOI, 2005
5. Krackhardt, David, and Carley, Kathleen M.: A PCANS Model of Structure in Organization, Proceedings of the1998 International Symposium on Command and Control Research and Technology, Evidence Based Research,Vienna, VA, June 1998
6. Mika, Peter: Ontologies are us: A unified model of social networks and semantics, Web Semantics: Science,Services and Agents on the World Wide Web 5(1), volume 5, 515, March 2007
7. Newman, M., Barabsi, A.-L., Watts, D. j.: The Structure and Dynamics of Networks, Princeton University Press,2006.
8. Various web sites (images, graphs)
54 of 54
https://open.umich.edu/education/si/si508/fall2008https://open.umich.edu/education/si/si508/fall2008