Text-driven video acceleration - Research Portal

Computing and Communications

Associated organisational unit

Artificial Intelligence

Electronic data

TextDrivenVideoAcceleration_TPAMI2022
Rights statement: ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Accepted author manuscript, 13 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1109/TPAMI.2022.3157198
Final published version
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Keywords

Cross-modal data, Fast-forward, Instructional video, Reinforcement learning, Reinforcement Learning, Semantics, Social networking (online), Task analysis, Training, Tutorials, Untrimmed videos, Visualization

View graph of relations

Text-driven video acceleration: A weakly-supervised reinforcement learning method

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Text-driven video acceleration: A weakly-supervised reinforcement learning method. / Ramos, W.L.D.S.; Silva, M.M.D.; Araujo, E. et al.
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, No. 2, 28.02.2023, p. 2492-2504.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Ramos, WLDS, Silva, MMD, Araujo, E, Moura, V, Martins de Oliveira, KC, Soriano Marcolino, L & Nascimento, E 2023, 'Text-driven video acceleration: A weakly-supervised reinforcement learning method', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 2492-2504. https://doi.org/10.1109/TPAMI.2022.3157198

APA

Ramos, W. L. D. S., Silva, M. M. D., Araujo, E., Moura, V., Martins de Oliveira, K. C., Soriano Marcolino, L., & Nascimento, E. (2023). Text-driven video acceleration: A weakly-supervised reinforcement learning method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 2492-2504. https://doi.org/10.1109/TPAMI.2022.3157198

Vancouver

Ramos WLDS, Silva MMD, Araujo E, Moura V, Martins de Oliveira KC, Soriano Marcolino L et al. Text-driven video acceleration: A weakly-supervised reinforcement learning method. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023 Feb 28;45(2):2492-2504. Epub 2022 Mar 7. doi: 10.1109/TPAMI.2022.3157198

Author

Ramos, W.L.D.S. ; Silva, M.M.D. ; Araujo, E. et al. / Text-driven video acceleration : A weakly-supervised reinforcement learning method. In: IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023 ; Vol. 45, No. 2. pp. 2492-2504.

Bibtex

@article{dc0bc4d8b2b648d4a84ee1474441fef9,

title = "Text-driven video acceleration: A weakly-supervised reinforcement learning method",

abstract = "The growth of videos in our digital age and the users' limited time raise the demand for processing untrimmed videos to produce shorter versions conveying the same information. Despite the remarkable progress that summarization methods have made, most of them can only select a few frames or skims, creating visual gaps and breaking the video context. This paper presents a novel weakly-supervised methodology based on a reinforcement learning formulation to accelerate instructional videos using text. A novel joint reward function guides our agent to select which frames to remove and reduce the input video to a target length without creating gaps in the final video. We also propose the Extended Visually-guided Document Attention Network (VDAN+), which can generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in Precision, Recall, and F1 Score against the baselines while effectively controlling the video's output length. IEEE",

keywords = "Cross-modal data, Fast-forward, Instructional video, Reinforcement learning, Reinforcement Learning, Semantics, Social networking (online), Task analysis, Training, Tutorials, Untrimmed videos, Visualization",

author = "W.L.D.S. Ramos and M.M.D. Silva and E. Araujo and V. Moura and {Martins de Oliveira}, K.C. and {Soriano Marcolino}, L. and E. Nascimento",

note = "{\textcopyright}2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. ",

year = "2023",

month = feb,

day = "28",

doi = "10.1109/TPAMI.2022.3157198",

language = "English",

volume = "45",

pages = "2492--2504",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "2",

}

RIS

TY - JOUR

T1 - Text-driven video acceleration

T2 - A weakly-supervised reinforcement learning method

AU - Ramos, W.L.D.S.

AU - Silva, M.M.D.

AU - Araujo, E.

AU - Moura, V.

AU - Martins de Oliveira, K.C.

AU - Soriano Marcolino, L.

AU - Nascimento, E.

N1 - ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PY - 2023/2/28

Y1 - 2023/2/28

N2 - The growth of videos in our digital age and the users' limited time raise the demand for processing untrimmed videos to produce shorter versions conveying the same information. Despite the remarkable progress that summarization methods have made, most of them can only select a few frames or skims, creating visual gaps and breaking the video context. This paper presents a novel weakly-supervised methodology based on a reinforcement learning formulation to accelerate instructional videos using text. A novel joint reward function guides our agent to select which frames to remove and reduce the input video to a target length without creating gaps in the final video. We also propose the Extended Visually-guided Document Attention Network (VDAN+), which can generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in Precision, Recall, and F1 Score against the baselines while effectively controlling the video's output length. IEEE

AB - The growth of videos in our digital age and the users' limited time raise the demand for processing untrimmed videos to produce shorter versions conveying the same information. Despite the remarkable progress that summarization methods have made, most of them can only select a few frames or skims, creating visual gaps and breaking the video context. This paper presents a novel weakly-supervised methodology based on a reinforcement learning formulation to accelerate instructional videos using text. A novel joint reward function guides our agent to select which frames to remove and reduce the input video to a target length without creating gaps in the final video. We also propose the Extended Visually-guided Document Attention Network (VDAN+), which can generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in Precision, Recall, and F1 Score against the baselines while effectively controlling the video's output length. IEEE

KW - Cross-modal data

KW - Fast-forward

KW - Instructional video

KW - Reinforcement learning

KW - Reinforcement Learning

KW - Semantics

KW - Social networking (online)

KW - Task analysis

KW - Training

KW - Tutorials

KW - Untrimmed videos

KW - Visualization

U2 - 10.1109/TPAMI.2022.3157198

DO - 10.1109/TPAMI.2022.3157198

M3 - Journal article

VL - 45

SP - 2492

EP - 2504

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

SN - 0162-8828

IS - 2

ER -

Research

Associated organisational unit

Electronic data

Links

Text available via DOI:

Keywords

Text-driven video acceleration: A weakly-supervised reinforcement learning method

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us