Home > Research > Publications & Outputs > Clustering the Unknown - The Youtube Case

Electronic data

  • icnc2019_akm_accepted_version

    Accepted author manuscript, 485 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Clustering the Unknown - The Youtube Case

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Clustering the Unknown - The Youtube Case. / Dvir, Amit; Marnerides, Angelos; Dubin, Ran et al.

IEEE International Conference on Computing, Networking and Communications (IEEE ICNC 2019). IEEE, 2019. p. 402-407.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Dvir, A, Marnerides, A, Dubin, R & Golan, N 2019, Clustering the Unknown - The Youtube Case. in IEEE International Conference on Computing, Networking and Communications (IEEE ICNC 2019). IEEE, pp. 402-407, IEEE ICNC : Computing, Networking and Communications (ICNC), 2019, International Conference on, Honolulu, Hawaii, United States, 18/02/19. https://doi.org/10.1109/ICCNC.2019.8685364

APA

Dvir, A., Marnerides, A., Dubin, R., & Golan, N. (2019). Clustering the Unknown - The Youtube Case. In IEEE International Conference on Computing, Networking and Communications (IEEE ICNC 2019) (pp. 402-407). IEEE. https://doi.org/10.1109/ICCNC.2019.8685364

Vancouver

Dvir A, Marnerides A, Dubin R, Golan N. Clustering the Unknown - The Youtube Case. In IEEE International Conference on Computing, Networking and Communications (IEEE ICNC 2019). IEEE. 2019. p. 402-407 doi: 10.1109/ICCNC.2019.8685364

Author

Dvir, Amit ; Marnerides, Angelos ; Dubin, Ran et al. / Clustering the Unknown - The Youtube Case. IEEE International Conference on Computing, Networking and Communications (IEEE ICNC 2019). IEEE, 2019. pp. 402-407

Bibtex

@inproceedings{cec98119884145a7b0e5af5cfc1cf75d,
title = "Clustering the Unknown - The Youtube Case",
abstract = "Recent stringent end-user security and privacy requirements caused the dramatic rise of encrypted video streams in which YouTube encrypted traffic is one of the most prevalent. Regardless of their encrypted nature, metadata derived from such traffic flows can be utilized to identify the title of a video, thus enabling the classification of video streams into a single video title using a given video title set. Nonetheless, scenarios where no video title set is present and a supervised approach is not feasible, are both frequent and challenging. In this paper we go beyond previous studies and demonstrate the feasibility of clustering unknown video streams into subgroups although no information is available about the title name. We address this problem by exploring Natural Language Processing (NLP) formulations and Word2vec techniques to compose a novel statistical feature in order to further cluster unknown video streams. Through our experimental results over real datasets we demonstrate that our methodology is capable to cluster 72 video titles out of 100 video titles from a dataset of 10,000 video streams. Thus, we argue that the proposed methodology could sufficiently contribute to the newly rising and demanding domain of encrypted Internet traffic classification.",
author = "Amit Dvir and Angelos Marnerides and Ran Dubin and Nehor Golan",
year = "2019",
month = feb,
day = "18",
doi = "10.1109/ICCNC.2019.8685364",
language = "English",
isbn = "9781538692240",
pages = "402--407",
booktitle = "IEEE International Conference on Computing, Networking and Communications (IEEE ICNC 2019)",
publisher = "IEEE",
note = "IEEE ICNC : Computing, Networking and Communications (ICNC), 2019, International Conference on ; Conference date: 18-02-2019 Through 21-02-2019",

}

RIS

TY - GEN

T1 - Clustering the Unknown - The Youtube Case

AU - Dvir, Amit

AU - Marnerides, Angelos

AU - Dubin, Ran

AU - Golan, Nehor

PY - 2019/2/18

Y1 - 2019/2/18

N2 - Recent stringent end-user security and privacy requirements caused the dramatic rise of encrypted video streams in which YouTube encrypted traffic is one of the most prevalent. Regardless of their encrypted nature, metadata derived from such traffic flows can be utilized to identify the title of a video, thus enabling the classification of video streams into a single video title using a given video title set. Nonetheless, scenarios where no video title set is present and a supervised approach is not feasible, are both frequent and challenging. In this paper we go beyond previous studies and demonstrate the feasibility of clustering unknown video streams into subgroups although no information is available about the title name. We address this problem by exploring Natural Language Processing (NLP) formulations and Word2vec techniques to compose a novel statistical feature in order to further cluster unknown video streams. Through our experimental results over real datasets we demonstrate that our methodology is capable to cluster 72 video titles out of 100 video titles from a dataset of 10,000 video streams. Thus, we argue that the proposed methodology could sufficiently contribute to the newly rising and demanding domain of encrypted Internet traffic classification.

AB - Recent stringent end-user security and privacy requirements caused the dramatic rise of encrypted video streams in which YouTube encrypted traffic is one of the most prevalent. Regardless of their encrypted nature, metadata derived from such traffic flows can be utilized to identify the title of a video, thus enabling the classification of video streams into a single video title using a given video title set. Nonetheless, scenarios where no video title set is present and a supervised approach is not feasible, are both frequent and challenging. In this paper we go beyond previous studies and demonstrate the feasibility of clustering unknown video streams into subgroups although no information is available about the title name. We address this problem by exploring Natural Language Processing (NLP) formulations and Word2vec techniques to compose a novel statistical feature in order to further cluster unknown video streams. Through our experimental results over real datasets we demonstrate that our methodology is capable to cluster 72 video titles out of 100 video titles from a dataset of 10,000 video streams. Thus, we argue that the proposed methodology could sufficiently contribute to the newly rising and demanding domain of encrypted Internet traffic classification.

U2 - 10.1109/ICCNC.2019.8685364

DO - 10.1109/ICCNC.2019.8685364

M3 - Conference contribution/Paper

SN - 9781538692240

SP - 402

EP - 407

BT - IEEE International Conference on Computing, Networking and Communications (IEEE ICNC 2019)

PB - IEEE

T2 - IEEE ICNC : Computing, Networking and Communications (ICNC), 2019, International Conference on

Y2 - 18 February 2019 through 21 February 2019

ER -