Clustering the Unknown - The Youtube Case - Research Portal

Computing and Communications

Associated organisational units

Electronic data

icnc2019_akm_accepted_version
Accepted author manuscript, 485 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1109/ICCNC.2019.8685364
Final published version

View graph of relations

Clustering the Unknown - The Youtube Case

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Amit Dvir
Angelos Marnerides
Ran Dubin
Nehor Golan

More...

Publication date	18/02/2019
Host publication	IEEE International Conference on Computing, Networking and Communications (IEEE ICNC 2019)
Publisher	IEEE
Pages	402-407
Number of pages	6
ISBN (electronic)	9781538692233, 9781538692226
ISBN (print)	9781538692240
<mark>Original language</mark>	English
Event	IEEE ICNC : Computing, Networking and Communications (ICNC), 2019, International Conference on - Honolulu, United States Duration: 18/02/2019 → 21/02/2019

Conference

Conference	IEEE ICNC : Computing, Networking and Communications (ICNC), 2019, International Conference on
Country/Territory	United States
City	Honolulu
Period	18/02/19 → 21/02/19

Conference

Conference	IEEE ICNC : Computing, Networking and Communications (ICNC), 2019, International Conference on
Country/Territory	United States
City	Honolulu
Period	18/02/19 → 21/02/19

Abstract

Recent stringent end-user security and privacy requirements caused the dramatic rise of encrypted video streams in which YouTube encrypted traffic is one of the most prevalent. Regardless of their encrypted nature, metadata derived from such traffic flows can be utilized to identify the title of a video, thus enabling the classification of video streams into a single video title using a given video title set. Nonetheless, scenarios where no video title set is present and a supervised approach is not feasible, are both frequent and challenging. In this paper we go beyond previous studies and demonstrate the feasibility of clustering unknown video streams into subgroups although no information is available about the title name. We address this problem by exploring Natural Language Processing (NLP) formulations and Word2vec techniques to compose a novel statistical feature in order to further cluster unknown video streams. Through our experimental results over real datasets we demonstrate that our methodology is capable to cluster 72 video titles out of 100 video titles from a dataset of 10,000 video streams. Thus, we argue that the proposed methodology could sufficiently contribute to the newly rising and demanding domain of encrypted Internet traffic classification.

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Clustering the Unknown - The Youtube Case

Conference

Conference

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us