Disambiguating identity web references using Web 2.0 data and semantics

Computing and Communications

Text available via DOI:

https://doi.org/10.1016/j.websem.2010.04.005
Final published version

Keywords

Semantic Web, Web 2.0, Identity disambiguation, Machine learning, Graphs

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Disambiguating identity web references using Web 2.0 data and semantics. / Rowe, Matthew; Ciravegna, Fabio.
In: Journal of Web Semantics, Vol. 8, No. 2, 07.2010, p. 125-142.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Rowe, M & Ciravegna, F 2010, 'Disambiguating identity web references using Web 2.0 data and semantics', Journal of Web Semantics, vol. 8, no. 2, pp. 125-142. https://doi.org/10.1016/j.websem.2010.04.005

APA

Rowe, M., & Ciravegna, F. (2010). Disambiguating identity web references using Web 2.0 data and semantics. Journal of Web Semantics, 8(2), 125-142. https://doi.org/10.1016/j.websem.2010.04.005

Vancouver

Rowe M, Ciravegna F. Disambiguating identity web references using Web 2.0 data and semantics. Journal of Web Semantics. 2010 Jul;8(2):125-142. doi: 10.1016/j.websem.2010.04.005

Author

Rowe, Matthew ; Ciravegna, Fabio. / Disambiguating identity web references using Web 2.0 data and semantics. In: Journal of Web Semantics. 2010 ; Vol. 8, No. 2. pp. 125-142.

Bibtex

@article{2f280c9b8b5646f4b1f24f64b7005094,

title = "Disambiguating identity web references using Web 2.0 data and semantics",

abstract = "As web users disseminate more of their personal information on the web, the possibility of these users becoming victims of lateral surveillance and identity theft increases. Therefore web resources containing this personal information, which we refer to as identity web references must be found and disambiguated to produce a unary set of web resources which refer to a given person. Such is the scale of the web that forcing web users to monitor their identity web references is not feasible, therefore automated approaches are required. However, automated approaches require background knowledge about the person whose identity web references are to be disambiguated. Within this paper we present a detailed approach to monitor the web presence of a given individual by obtaining background knowledge from Web 2.0 platforms to support automated disambiguation processes. We present a methodology for generating this background knowledge by exporting data from multiple Web 2.0 platforms as RDF data models and combining these models together for use as seed data. We present two disambiguation techniques; the first using a semi-supervised machine learning technique known as Self-training and the second using a graph-based technique known as Random Walks, we explain how the semantics of data supports the intrinsic functionalities of these techniques. We compare the performance of our presented disambiguation techniques against several baseline measures including human processing of the same data. We achieve an average precision level of 0.935 for Self-training and an average f-measure level of 0.705 for Random Walks in both cases outperforming several baselines measures.",

keywords = "Semantic Web, Web 2.0, Identity disambiguation, Machine learning, Graphs ",

author = "Matthew Rowe and Fabio Ciravegna",

year = "2010",

month = jul,

doi = "10.1016/j.websem.2010.04.005",

language = "English",

volume = "8",

pages = "125--142",

journal = "Journal of Web Semantics",

issn = "1570-8268",

publisher = "Elsevier",

number = "2",

}

RIS

TY - JOUR

T1 - Disambiguating identity web references using Web 2.0 data and semantics

AU - Rowe, Matthew

AU - Ciravegna, Fabio

PY - 2010/7

Y1 - 2010/7

N2 - As web users disseminate more of their personal information on the web, the possibility of these users becoming victims of lateral surveillance and identity theft increases. Therefore web resources containing this personal information, which we refer to as identity web references must be found and disambiguated to produce a unary set of web resources which refer to a given person. Such is the scale of the web that forcing web users to monitor their identity web references is not feasible, therefore automated approaches are required. However, automated approaches require background knowledge about the person whose identity web references are to be disambiguated. Within this paper we present a detailed approach to monitor the web presence of a given individual by obtaining background knowledge from Web 2.0 platforms to support automated disambiguation processes. We present a methodology for generating this background knowledge by exporting data from multiple Web 2.0 platforms as RDF data models and combining these models together for use as seed data. We present two disambiguation techniques; the first using a semi-supervised machine learning technique known as Self-training and the second using a graph-based technique known as Random Walks, we explain how the semantics of data supports the intrinsic functionalities of these techniques. We compare the performance of our presented disambiguation techniques against several baseline measures including human processing of the same data. We achieve an average precision level of 0.935 for Self-training and an average f-measure level of 0.705 for Random Walks in both cases outperforming several baselines measures.

AB - As web users disseminate more of their personal information on the web, the possibility of these users becoming victims of lateral surveillance and identity theft increases. Therefore web resources containing this personal information, which we refer to as identity web references must be found and disambiguated to produce a unary set of web resources which refer to a given person. Such is the scale of the web that forcing web users to monitor their identity web references is not feasible, therefore automated approaches are required. However, automated approaches require background knowledge about the person whose identity web references are to be disambiguated. Within this paper we present a detailed approach to monitor the web presence of a given individual by obtaining background knowledge from Web 2.0 platforms to support automated disambiguation processes. We present a methodology for generating this background knowledge by exporting data from multiple Web 2.0 platforms as RDF data models and combining these models together for use as seed data. We present two disambiguation techniques; the first using a semi-supervised machine learning technique known as Self-training and the second using a graph-based technique known as Random Walks, we explain how the semantics of data supports the intrinsic functionalities of these techniques. We compare the performance of our presented disambiguation techniques against several baseline measures including human processing of the same data. We achieve an average precision level of 0.935 for Self-training and an average f-measure level of 0.705 for Random Walks in both cases outperforming several baselines measures.

KW - Semantic Web

KW - Web 2.0

KW - Identity disambiguation

KW - Machine learning

KW - Graphs

UR - http://www.scopus.com/inward/record.url?scp=77955228348&partnerID=8YFLogxK

U2 - 10.1016/j.websem.2010.04.005

DO - 10.1016/j.websem.2010.04.005

M3 - Journal article

VL - 8

SP - 125

EP - 142

JO - Journal of Web Semantics

JF - Journal of Web Semantics

SN - 1570-8268

IS - 2

ER -

Research

Links

Text available via DOI:

Keywords