Probabilistic and exact frequent subtree mining in graphs beyond forests

LANCASTER UNIVERSITY LEIPZIG

Text available via DOI:

https://doi.org/10.1007/S10994-019-05779-1
Final published version

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Probabilistic and exact frequent subtree mining in graphs beyond forests. / Welke, Pascal; Horváth, Tamás; Wrobel, Stefan.
In: Machine Learning, Vol. 108, No. 7, 15.07.2019, p. 1137-1164.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Welke, P, Horváth, T & Wrobel, S 2019, 'Probabilistic and exact frequent subtree mining in graphs beyond forests', Machine Learning, vol. 108, no. 7, pp. 1137-1164. https://doi.org/10.1007/S10994-019-05779-1

APA

Welke, P., Horváth, T., & Wrobel, S. (2019). Probabilistic and exact frequent subtree mining in graphs beyond forests. Machine Learning, 108(7), 1137-1164. https://doi.org/10.1007/S10994-019-05779-1

Vancouver

Welke P, Horváth T, Wrobel S. Probabilistic and exact frequent subtree mining in graphs beyond forests. Machine Learning. 2019 Jul 15;108(7):1137-1164. Epub 2019 Feb 15. doi: 10.1007/S10994-019-05779-1

Author

Welke, Pascal ; Horváth, Tamás ; Wrobel, Stefan. / Probabilistic and exact frequent subtree mining in graphs beyond forests. In: Machine Learning. 2019 ; Vol. 108, No. 7. pp. 1137-1164.

Bibtex

@article{85f138b6e1974dfc995e111468f8f227,

title = "Probabilistic and exact frequent subtree mining in graphs beyond forests",

abstract = "Motivated by the impressive predictive power of simple patterns, we consider the problem of mining frequent subtrees in arbitrary graphs. Although the restriction of the pattern language to trees does not resolve the computational complexity of frequent subgraph mining, in a recent work we have shown that it gives rise to an algorithm generating probabilistic frequent subtrees, a random subset of all frequent subtrees, from arbitrary graphs with polynomial delay. It is based on replacing each transaction graph in the input database with a forest formed by a random subset of its spanning trees. This simple technique turned out to be quite powerful on molecule classification tasks. It has, however, the drawback that the number of sampled spanning trees must be bounded by a polynomial of the size of the transaction graphs, resulting in less impressive recall even for slightly more complex structures beyond molecular graphs. To overcome this limitation, in this work we propose an algorithm mining probabilistic frequent subtrees also with polynomial delay, but by replacing each graph with a forest formed by an exponentially large implicit subset of its spanning trees. We demonstrate the superiority of our algorithm over the simple one on threshold graphs used e.g. in spectral clustering. In addition, providing sufficient conditions for the completeness and efficiency of our algorithm, we obtain a positive complexity result on exact frequent subtree mining for a novel, practically and theoretically relevant graph class that is orthogonal to all graph classes defined by some constant bound on monotone graph properties.",

author = "Pascal Welke and Tam{\'a}s Horv{\'a}th and Stefan Wrobel",

year = "2019",

month = jul,

day = "15",

doi = "10.1007/S10994-019-05779-1",

language = "English",

volume = "108",

pages = "1137--1164",

journal = "Machine Learning",

issn = "0885-6125",

publisher = "Springer Netherlands",

number = "7",

}

RIS

TY - JOUR

T1 - Probabilistic and exact frequent subtree mining in graphs beyond forests

AU - Welke, Pascal

AU - Horváth, Tamás

AU - Wrobel, Stefan

PY - 2019/7/15

Y1 - 2019/7/15

N2 - Motivated by the impressive predictive power of simple patterns, we consider the problem of mining frequent subtrees in arbitrary graphs. Although the restriction of the pattern language to trees does not resolve the computational complexity of frequent subgraph mining, in a recent work we have shown that it gives rise to an algorithm generating probabilistic frequent subtrees, a random subset of all frequent subtrees, from arbitrary graphs with polynomial delay. It is based on replacing each transaction graph in the input database with a forest formed by a random subset of its spanning trees. This simple technique turned out to be quite powerful on molecule classification tasks. It has, however, the drawback that the number of sampled spanning trees must be bounded by a polynomial of the size of the transaction graphs, resulting in less impressive recall even for slightly more complex structures beyond molecular graphs. To overcome this limitation, in this work we propose an algorithm mining probabilistic frequent subtrees also with polynomial delay, but by replacing each graph with a forest formed by an exponentially large implicit subset of its spanning trees. We demonstrate the superiority of our algorithm over the simple one on threshold graphs used e.g. in spectral clustering. In addition, providing sufficient conditions for the completeness and efficiency of our algorithm, we obtain a positive complexity result on exact frequent subtree mining for a novel, practically and theoretically relevant graph class that is orthogonal to all graph classes defined by some constant bound on monotone graph properties.

AB - Motivated by the impressive predictive power of simple patterns, we consider the problem of mining frequent subtrees in arbitrary graphs. Although the restriction of the pattern language to trees does not resolve the computational complexity of frequent subgraph mining, in a recent work we have shown that it gives rise to an algorithm generating probabilistic frequent subtrees, a random subset of all frequent subtrees, from arbitrary graphs with polynomial delay. It is based on replacing each transaction graph in the input database with a forest formed by a random subset of its spanning trees. This simple technique turned out to be quite powerful on molecule classification tasks. It has, however, the drawback that the number of sampled spanning trees must be bounded by a polynomial of the size of the transaction graphs, resulting in less impressive recall even for slightly more complex structures beyond molecular graphs. To overcome this limitation, in this work we propose an algorithm mining probabilistic frequent subtrees also with polynomial delay, but by replacing each graph with a forest formed by an exponentially large implicit subset of its spanning trees. We demonstrate the superiority of our algorithm over the simple one on threshold graphs used e.g. in spectral clustering. In addition, providing sufficient conditions for the completeness and efficiency of our algorithm, we obtain a positive complexity result on exact frequent subtree mining for a novel, practically and theoretically relevant graph class that is orthogonal to all graph classes defined by some constant bound on monotone graph properties.

U2 - 10.1007/S10994-019-05779-1

DO - 10.1007/S10994-019-05779-1

M3 - Journal article

VL - 108

SP - 1137

EP - 1164

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 7

ER -

Research

Links

Text available via DOI: