ContextVP: Fully Context-Aware Video Prediction

W Byeon, Q Wang, R K Srivastava, P Koumoutsakos | ECCV 2018 (oral)

[arxiv]  [paper]  [slides]  [video]

  • ECCV oral presentation, Sep. 2018. [slides] [talk]

  • Some parts of this paper were presented at CVPR’18 Workshop (oral): Brave New Ideas for Video Understanding. [program] [paper]

  • The experimental results on Caltech Pedestrian and UCF-101 datasets [zip]

Abstract

Video prediction models based on convolutional networks,recurrent networks, and their combinations often result in blurry pre-dictions. We identify an important contributing factor for imprecise pre-dictions that has not been studied adequately in the literature: blindspots, i.e., lack of access to all relevant past information for accuratelypredicting the future. To address this issue, we introduce a fully context-aware architecture that captures the entire available past context for eachpixel using Parallel Multi-Dimensional LSTM units and aggregates it us-ing blending units. Our model outperforms a strong baseline network of 20 recurrent convolutional layers and yields state-of-the-art performancefor next step prediction on three challenging real-world video datasets:Human 3.6M, Caltech Pedestrian, and UCF-101. Moreover, it does so with fewer parameters than several recently proposed models, and doesnot rely on deep convolutional networks, multi-scale architectures, sepa-ration of background and foreground modeling, motion flow learning, oradversarial training. These results highlight that full awareness of pastcontext is of crucial importance for video prediction.

Multi-Frame Prediction Results

@inproceedings{byeon2018contextvp,
  title={ContextVP: Fully Context-Aware Video Prediction},
  author={Byeon, Wonmin and Wang, Qin and Srivastava, Rupesh Kumar and Koumoutsakos, Petros},
  booktitle = {Proceedings of the European Conference on Computer Vision ({ECCV})},
  year={2018}
}