HOPS: Probabilistic Subtree Mining for Small and Large Graphs.

LANCASTER UNIVERSITY LEIPZIG

Text available via DOI:

https://doi.org/10.1145/3394486.3403180
Final published version

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review

Published

Standard

HOPS: Probabilistic Subtree Mining for Small and Large Graphs. / Welke, Pascal; Seiffarth, Florian; Kamp, Michael et al.
2020. 1275-1284 Paper presented at KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, LA, California, United States.

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper › peer-review

Harvard

Welke, P, Seiffarth, F, Kamp, M & Wrobel, S 2020, 'HOPS: Probabilistic Subtree Mining for Small and Large Graphs.', Paper presented at KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, LA, United States, 6/07/20 - 10/07/20 pp. 1275-1284. https://doi.org/10.1145/3394486.3403180

APA

Welke, P., Seiffarth, F., Kamp, M., & Wrobel, S. (2020). HOPS: Probabilistic Subtree Mining for Small and Large Graphs.. 1275-1284. Paper presented at KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, LA, California, United States. https://doi.org/10.1145/3394486.3403180

Vancouver

Welke P, Seiffarth F, Kamp M, Wrobel S. HOPS: Probabilistic Subtree Mining for Small and Large Graphs.. 2020. Paper presented at KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, LA, California, United States. doi: 10.1145/3394486.3403180

Author

Welke, Pascal ; Seiffarth, Florian ; Kamp, Michael et al. / HOPS: Probabilistic Subtree Mining for Small and Large Graphs. Paper presented at KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, LA, California, United States.

Bibtex

@conference{1348e9c4528f42e0b64b5db37a13dfd6,

title = "HOPS: Probabilistic Subtree Mining for Small and Large Graphs.",

abstract = "Frequent subgraph mining, i.e., the identification of relevant patterns in graph databases, is a well-known data mining problem with high practical relevance, since next to summarizing the data, the resulting patterns can also be used to define powerful domain-specific similarity functions for prediction. In recent years, significant progress has been made towards subgraph mining algorithms that scale to complex graphs by focusing on tree patterns and probabilistically allowing a small amount of incompleteness in the result. Nonetheless, the complexity of the pattern matching component used for deciding subtree isomorphism on arbitrary graphs has significantly limited the scalability of existing approaches. In this paper, we adapt sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to medium-size graphs or a single large graph. By restricting on tree patterns, we provide an algorithm that approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sub-linear time with one-sided error. Our empirical evaluation on a range of benchmark graph datasets shows that the novel algorithm substantially outperforms state-of-the-art approaches both in the task of approximate counting of embeddings in single large graphs and in probabilistic frequent subtree mining in large databases of small to medium sized graphs.",

author = "Pascal Welke and Florian Seiffarth and Michael Kamp and Stefan Wrobel",

note = "DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.; KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining ; Conference date: 06-07-2020 Through 10-07-2020",

year = "2020",

month = aug,

day = "20",

doi = "10.1145/3394486.3403180",

language = "Undefined/Unknown",

pages = "1275--1284",

}

RIS

TY - CONF

T1 - HOPS: Probabilistic Subtree Mining for Small and Large Graphs.

AU - Welke, Pascal

AU - Seiffarth, Florian

AU - Kamp, Michael

AU - Wrobel, Stefan

N1 - DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2020/8/20

Y1 - 2020/8/20

N2 - Frequent subgraph mining, i.e., the identification of relevant patterns in graph databases, is a well-known data mining problem with high practical relevance, since next to summarizing the data, the resulting patterns can also be used to define powerful domain-specific similarity functions for prediction. In recent years, significant progress has been made towards subgraph mining algorithms that scale to complex graphs by focusing on tree patterns and probabilistically allowing a small amount of incompleteness in the result. Nonetheless, the complexity of the pattern matching component used for deciding subtree isomorphism on arbitrary graphs has significantly limited the scalability of existing approaches. In this paper, we adapt sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to medium-size graphs or a single large graph. By restricting on tree patterns, we provide an algorithm that approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sub-linear time with one-sided error. Our empirical evaluation on a range of benchmark graph datasets shows that the novel algorithm substantially outperforms state-of-the-art approaches both in the task of approximate counting of embeddings in single large graphs and in probabilistic frequent subtree mining in large databases of small to medium sized graphs.

AB - Frequent subgraph mining, i.e., the identification of relevant patterns in graph databases, is a well-known data mining problem with high practical relevance, since next to summarizing the data, the resulting patterns can also be used to define powerful domain-specific similarity functions for prediction. In recent years, significant progress has been made towards subgraph mining algorithms that scale to complex graphs by focusing on tree patterns and probabilistically allowing a small amount of incompleteness in the result. Nonetheless, the complexity of the pattern matching component used for deciding subtree isomorphism on arbitrary graphs has significantly limited the scalability of existing approaches. In this paper, we adapt sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to medium-size graphs or a single large graph. By restricting on tree patterns, we provide an algorithm that approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sub-linear time with one-sided error. Our empirical evaluation on a range of benchmark graph datasets shows that the novel algorithm substantially outperforms state-of-the-art approaches both in the task of approximate counting of embeddings in single large graphs and in probabilistic frequent subtree mining in large databases of small to medium sized graphs.

U2 - 10.1145/3394486.3403180

DO - 10.1145/3394486.3403180

M3 - Conference paper

SP - 1275

EP - 1284

T2 - KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Y2 - 6 July 2020 through 10 July 2020

ER -

Research

Links

Text available via DOI: