Filters








223 Hits in 5.8 sec

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification [article]

Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy
2018 arXiv   pre-print
We seek a balance between speed and accuracy by building an effective and efficient video classification system through systematic exploration of critical network design choices.  ...  Rather surprisingly, best result (in both speed and accuracy) is achieved when replacing the 3D convolutions at the bottom of the network, suggesting that temporal representation learning on high-level  ...  We plot the speed vs accuracy of these models in Figure 4 . We see that separable top-heavy models offer the best speed-accuracy trade-off.  ... 
arXiv:1712.04851v2 fatcat:wsnqz7q5cvf7pm63mhh7vntgeq

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification [chapter]

Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, Kevin Murphy
2018 Lecture Notes in Computer Science  
We seek a balance between speed and accuracy by building an effective and efficient video classification system through systematic exploration of critical network design choices.  ...  Rather surprisingly, best result (in both speed and accuracy) is achieved when replacing the 3D convolutions at the bottom of the network, suggesting that temporal representation learning on high-level  ...  We plot the speed vs accuracy of these models in Figure 4 . We see that separable top-heavy models offer the best speed-accuracy trade-off.  ... 
doi:10.1007/978-3-030-01267-0_19 fatcat:h5h4fzn3kzcepg3yggu5yt6g2a

STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation [article]

Zhengkai Jiang, Zhangxuan Gu, Jinlong Peng, Hang Zhou, Liang Liu, Yabiao Wang, Ying Tai, Chengjie Wang, Liqing Zhang
2022 arXiv   pre-print
Video Instance Segmentation (VIS) is a task that simultaneously requires classification, segmentation, and instance association in a video.  ...  To improve instance association accuracy, a novel bi-directional spatio-temporal contrastive learning strategy for tracking embedding across frames is proposed.  ...  Speed-Accuracy trade-off curve on the YouTube-VIS-2019 validation set.  ... 
arXiv:2202.03747v2 fatcat:zltmpnatfrf5hp55dff2csahlm

Affective State-Based Framework for e-Learning Systems [chapter]

Juan Antonio Rodríguez, Joaquim Comas, Xavier Binefa
2021 Frontiers in Artificial Intelligence and Applications  
From these preliminary results, we observe abrupt changes in the LS of the audience when there are abrupt changes in the narrative of the video, indicating that well-structured and bounded information  ...  Virtual learning and education have become crucial during the COVID-19 pandemic, which has forced a rethink by teachers and educators into designing online content and the indirect interaction with students  ...  By fine-tuning the frame rate, we expect to get a result indicating the optimum value, a trade-off between speed and accuracy.  ... 
doi:10.3233/faia210155 fatcat:6pj5pbdkcbhihpoh5hs5uhajcu

ActBERT: Learning Global-Local Video-Text Representations [article]

Linchao Zhu, Yi Yang
2020 arXiv   pre-print
In this paper, we introduce ActBERT for self-supervised learning of joint video-text representations from unlabeled data.  ...  ActBERT significantly outperforms the state-of-the-arts, demonstrating its superiority in video-text representation learning.  ...  Rethinking spatiotemporal feature learning: Rui. Jointly modeling embedding and translation to bridge Speed-accuracy trade-offs in video classification.  ... 
arXiv:2011.07231v1 fatcat:xh6lvxh4cfhylffq6ewlynftlq

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation [article]

Yujia Zhang, Lai-Man Po, Xuyuan Xu, Mengyang Liu, Yexin Wang, Weifeng Ou, Yuzhi Zhao, Wing-Yin Yu
2021 arXiv   pre-print
However, these approaches learn representation by discriminating sampled instances via feature similarity in the latent space while ignoring the intermediate state of the learned representations, which  ...  It stems from the observation that humans are capable of discriminating the overlap rates of videos in space and time.  ...  trade-offs in video classification. In Proc. ECCV, 305–321. Pan, T.; Song, Y.; Yang, T.; Jiang, W.; and Liu, W. 2021.  ... 
arXiv:2112.08913v2 fatcat:c45xi6s74baajhi7tap2gckb7e

Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search [article]

Yifan Jiang, Xinyu Gong, Junru Wu, Humphrey Shi, Zhicheng Yan, Zhangyang Wang
2021 arXiv   pre-print
Evaluations on Kinetics and Something-Something-V2 benchmarks confirm our AutoX3D models outperform existing ones in accuracy up to 1.3% under similar FLOPs, and reduce the computational cost up to x1.74  ...  Efficient video architecture is the key to deploying video recognition systems on devices with limited computing resources.  ...  Rethinking spatiotemporal feature learning: residuals and linear bottlenecks. In Proceedings of the Speed-accuracy trade-offs in video classification.  ... 
arXiv:2112.04710v1 fatcat:p3dxpblcx5f5xlohmas5odwu7e

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification [article]

Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua
2020 arXiv   pre-print
., I3D or S3D, and improve video classification accuracy by more than 2% on both Kinetics-600 and MiT datasets.  ...  We propose a novel search space for spatiotemporal attention cells, which allows the search algorithm to flexibly explore various design choices in the cell.  ...  We thank Guanhang Wu and Yinxiao Li for insightful discussions and the larger Google Cloud Video AI team for the support.  ... 
arXiv:2007.12034v2 fatcat:n2vfegsvyfhelhgfhod3ejtfom

Universal-to-Specific Framework for Complex Action Recognition [article]

Peisen Zhao, Lingxi Xie, Ya Zhang, Qi Tian
2020 arXiv   pre-print
Video-based action recognition has recently attracted much attention in the field of computer vision.  ...  The universal network first learns universal feature representations.  ...  The methods in [25] and [11] involve several 3D structures and make a trade-off between accuracy and efficiency. Thus, we build our network based on that in [11] with its "top-heavy" idea.  ... 
arXiv:2007.06149v1 fatcat:c2eyj7ony5hw7iljwfwccfarpa

Frozen CLIP Models are Efficient Video Learners [article]

Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li
2022 arXiv   pre-print
In this paper, we present Efficient Video Learning (EVL) -- an efficient framework for directly training high-quality video recognition models with frozen CLIP features.  ...  Specifically, we employ a lightweight Transformer decoder and learn a query token to dynamically collect frame-level spatial features from the CLIP image encoder.  ...  Uniformer [28] is a custom fused CNN-Transformer architecture achieving good speed-accuracy trade-off. Yan et al.  ... 
arXiv:2208.03550v1 fatcat:iltgedlcovdwtfh46n6g3f7rbq

Learning Neural Textual Representations for Citation Recommendation

Binh Thanh Kieu, Inigo Jauregi Unanue, Son Bao Pham, Hieu Xuan Phan, Massimo Piccardi
2021 2020 25th International Conference on Pattern Recognition (ICPR)  
Reduction for Data Visualization and Linear Classification, and the Trade-off between Robustness and Classification Accuracy DAY 4 -Jan 15, 2021 Ferrari, Claudio; Berretti, Stefano; Del Bimbo, Alberto  ...  Veysel; Shir, Ofer M.; Baeck, Thomas 2891 Improving Model Accuracy for Imbalanced Image Classification Tasks by Adding a Final Batch Normalization Layer: An Empirical Study Bayesian Active Learning for  ... 
doi:10.1109/icpr48806.2021.9412725 fatcat:3vge2tpd2zf7jcv5btcixnaikm

2021 Index IEEE Transactions on Neural Networks and Learning Systems Vol. 32

2021 IEEE Transactions on Neural Networks and Learning Systems  
-that appeared in this periodical during 2021, and items from previous years that were commented upon or corrected in 2021.  ...  Note that the item title is found only under the primary entry in the Author Index.  ...  Zhang, H., Off-Policy Learning.  ... 
doi:10.1109/tnnls.2021.3134132 fatcat:2e7comcq2fhrziselptjubwjme

An Examination on Autoencoder Designs for Anomaly Detection in Video Surveillance

Ernesto Cruz-Esquivel, Zobeida J. Guzman-Zavaleta
2022 IEEE Access  
Murphy, ‘‘Rethinking spatiotem- Puebla, in 2018, where he is currently pursuing the poral feature learning: Speed-accuracy trade-offs in video classification,’’  ...  INDEX TERMS Anomaly detection, spatiotemporal features, video surveillance. I.  ... 
doi:10.1109/access.2022.3142247 fatcat:3d3pnbul55fdffuxfjraxxuqia

STSM: Spatio-Temporal Shift Module for Efficient Action Recognition [article]

Zhaoqilin Yang, Gaoyun An
2021 arXiv   pre-print
In particular, when the network is 2D CNNs, our STSM module allows the network to learn efficient Spatio-temporal features.  ...  The modeling, computational cost, and accuracy of traditional Spatio-temporal networks are the three most concentrated research topics in video action recognition.  ...  Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification.  ... 
arXiv:2112.02523v1 fatcat:usmfupklojhh5abxwhiwwz6ba4

Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation [article]

William McNally, Kanav Vats, Alexander Wong, John McPhee
2022 arXiv   pre-print
The accuracy-speed trade-off is especially favourable in the practical setting when not using test-time augmentation. Source code: https://github.com/wmcnally/kapao.  ...  In experiments, we observe that KAPAO is faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.  ...  This work was supported in part by Compute Canada, the Canada Research Chairs Program, the Natural Sciences and Engineering Research Council of Canada, a Microsoft Azure Grant, and an NVIDIA Hardware Grant  ... 
arXiv:2111.08557v4 fatcat:kpqm6bxf7ndrtbq4husfjmib24
« Previous Showing results 1 — 15 out of 223 results