Temporal-difference Learning with Sampling Baseline for Image Captioning

Computing and Communications

Associated organisational unit

Data Science

Electronic data

2018-4
Rights statement: Copyright c 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Accepted author manuscript, 1.04 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Keywords

Image captioning, Reinforcement learning, LSTM

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Temporal-difference Learning with Sampling Baseline for Image Captioning. / Chen, Hui; Ding, Guiguang; Zhao, Sicheng et al.
32nd AAAI Conference on Artificial Intelligence 2018. Palo Alto: AAAI, 2018. p. 6706-6713.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Chen, H, Ding, G, Zhao, S & Han, J 2018, Temporal-difference Learning with Sampling Baseline for Image Captioning. in 32nd AAAI Conference on Artificial Intelligence 2018. AAAI, Palo Alto, pp. 6706-6713. <https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16452>

APA

Chen, H., Ding, G., Zhao, S., & Han, J. (2018). Temporal-difference Learning with Sampling Baseline for Image Captioning. In 32nd AAAI Conference on Artificial Intelligence 2018 (pp. 6706-6713). AAAI. https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16452

Vancouver

Chen H, Ding G, Zhao S, Han J. Temporal-difference Learning with Sampling Baseline for Image Captioning. In 32nd AAAI Conference on Artificial Intelligence 2018. Palo Alto: AAAI. 2018. p. 6706-6713

Author

Chen, Hui ; Ding, Guiguang ; Zhao, Sicheng et al. / Temporal-difference Learning with Sampling Baseline for Image Captioning. 32nd AAAI Conference on Artificial Intelligence 2018. Palo Alto : AAAI, 2018. pp. 6706-6713

Bibtex

@inproceedings{0ffca80904ee45e1b2cfabf3afa24441,

title = "Temporal-difference Learning with Sampling Baseline for Image Captioning",

abstract = "The existing methods for image captioning usually train the language model under the cross entropy loss, which results in the exposure bias and inconsistency of evaluation metric. Recent research has shown these two issues can be well addressed by policy gradient method in reinforcement learning domain attributable to its unique capability of directly optimizing the discrete and non-differentiable evaluation metric. In this paper, we utilize reinforcement learning method to train the image captioning model. Specifically, we train our image captioning model to maximize the overall reward of the sentences by adopting the temporal-difference (TD) learning method, which takes the correlation between temporally successive actions into account. In this way, we assign different values to different words in one sampled sentence by a discounted coefficient when back-propagating the gradient with the REINFORCE algorithm, enabling the correlation between actions to be learned. Besides, instead of estimating a {"}baseline{"} to normalize the rewards with another network, we utilize the reward of another Monte-Carlo sample as the {"}baseline{"} to avoid high variance. We show that our proposed method can improve the quality of generated captions and outperforms the state-of-the-art methods on the benchmark dataset MS COCO in terms of seven evaluation metrics.",

keywords = "Image captioning, Reinforcement learning, LSTM",

author = "Hui Chen and Guiguang Ding and Sicheng Zhao and Jungong Han",

year = "2018",

month = feb,

day = "1",

language = "English",

isbn = "9781577358008",

pages = "6706--6713",

booktitle = "32nd AAAI Conference on Artificial Intelligence 2018",

publisher = "AAAI",

}

RIS

TY - GEN

T1 - Temporal-difference Learning with Sampling Baseline for Image Captioning

AU - Chen, Hui

AU - Ding, Guiguang

AU - Zhao, Sicheng

AU - Han, Jungong

PY - 2018/2/1

Y1 - 2018/2/1

N2 - The existing methods for image captioning usually train the language model under the cross entropy loss, which results in the exposure bias and inconsistency of evaluation metric. Recent research has shown these two issues can be well addressed by policy gradient method in reinforcement learning domain attributable to its unique capability of directly optimizing the discrete and non-differentiable evaluation metric. In this paper, we utilize reinforcement learning method to train the image captioning model. Specifically, we train our image captioning model to maximize the overall reward of the sentences by adopting the temporal-difference (TD) learning method, which takes the correlation between temporally successive actions into account. In this way, we assign different values to different words in one sampled sentence by a discounted coefficient when back-propagating the gradient with the REINFORCE algorithm, enabling the correlation between actions to be learned. Besides, instead of estimating a "baseline" to normalize the rewards with another network, we utilize the reward of another Monte-Carlo sample as the "baseline" to avoid high variance. We show that our proposed method can improve the quality of generated captions and outperforms the state-of-the-art methods on the benchmark dataset MS COCO in terms of seven evaluation metrics.

AB - The existing methods for image captioning usually train the language model under the cross entropy loss, which results in the exposure bias and inconsistency of evaluation metric. Recent research has shown these two issues can be well addressed by policy gradient method in reinforcement learning domain attributable to its unique capability of directly optimizing the discrete and non-differentiable evaluation metric. In this paper, we utilize reinforcement learning method to train the image captioning model. Specifically, we train our image captioning model to maximize the overall reward of the sentences by adopting the temporal-difference (TD) learning method, which takes the correlation between temporally successive actions into account. In this way, we assign different values to different words in one sampled sentence by a discounted coefficient when back-propagating the gradient with the REINFORCE algorithm, enabling the correlation between actions to be learned. Besides, instead of estimating a "baseline" to normalize the rewards with another network, we utilize the reward of another Monte-Carlo sample as the "baseline" to avoid high variance. We show that our proposed method can improve the quality of generated captions and outperforms the state-of-the-art methods on the benchmark dataset MS COCO in terms of seven evaluation metrics.

KW - Image captioning

KW - Reinforcement learning

KW - LSTM

M3 - Conference contribution/Paper

SN - 9781577358008

SP - 6706

EP - 6713

BT - 32nd AAAI Conference on Artificial Intelligence 2018

PB - AAAI

CY - Palo Alto

ER -

Research

Associated organisational unit

Electronic data

Links

Keywords