Filters








135 Hits in 5.2 sec

MMSys'22 Grand Challenge on AI-based Video Production for Soccer [article]

Cise Midoglu, Steven A. Hicks, Vajira Thambawita, Tomas Kupka, Pål Halvorsen
2022 arXiv   pre-print
In addition, event detection should be thoroughly enhanced by annotation and classification, proper clipping, generating short descriptions, selecting appropriate thumbnails for highlight clips, and finally  ...  In particular, we focus on the enhancement operations that take place after an event has been detected, namely event clipping (Task 1), thumbnail selection (Task 2), and game summarization (Task 3).  ...  We also want to acknowledge Norsk Toppfotball (NTF), the Norwegian association for elite soccer, for making videos and metadata available for the challenge.  ... 
arXiv:2202.01031v1 fatcat:hewdb5t5tfggnp5uewhlvcztge

Comixify: Transform video into a comics [article]

Maciej Pęśko, Adam Svystun, Paweł Andruszkiewicz, Przemysław Rokita, Tomasz Trzciński
2018 arXiv   pre-print
In the first stage, we propose a state-of-the-art keyframes extraction algorithm that selects a subset of frames from the video to provide the most comprehensive video context and we filter those frames  ...  In this paper, we propose a solution to transform a video into a comics. We approach this task using a neural style algorithm based on Generative Adversarial Networks (GANs).  ...  In this stage we propose to use a state-of-the-art keyframe extraction algorithm based on reinforcement learning, which we further extend by combining temporal segmentation method with the image aesthetic  ... 
arXiv:1812.03473v1 fatcat:whsi2kzia5dhri4kg66nrfszki

Generative Target Update for Adaptive Siamese Tracking [article]

Madhu Kiran, Le Thanh Nguyen-Meidine, Rajat Sahay, Rafael Menelau Oliveira E Cruz, Louis-Antoine Blais-Morin, Eric Granger
2022 arXiv   pre-print
In particular, our approach relies on an auto-encoder trained through adversarial learning to detect changes in a target object's appearance and predict a future target template, using a set of target  ...  Results indicate that our proposed approach can outperform state-of-art trackers, and its overall robustness allows tracking for a longer time before failure.  ...  via reinforcement learning Duman and Erdem [2019] .  ... 
arXiv:2202.09938v1 fatcat:f2lr6pr5sfhnbch7i4qjgpivie

Video Summarization through Reinforcement Learning with a 3D Spatio-Temporal U-Net [article]

Tianrui Liu, Qingjie Meng, Jun-Jie Huang, Athanasios Vlontzos, Daniel Rueckert, Bernhard Kainz
2021 arXiv   pre-print
A 3D spatio-temporal U-Net is used to efficiently encode spatio-temporal information of the input videos for downstream reinforcement learning (RL).  ...  An RL agent learns from spatio-temporal latent scores and predicts actions for keeping or rejecting a video frame in a video summary.  ...  ACKNOWLEDGMENT We thank the volunteers and sonographers from routine fetal screening at St. Thomas' Hospital London. This work was supported by the Wellcome Trust IEH Award  ... 
arXiv:2106.10528v1 fatcat:6q3wnioqtrdktkfl5w5n3pjthi

Community-Empowered Air Quality Monitoring System [article]

Yen-Chia Hsu, Paul Dille, Jennifer Cross, Beatrice Dias, Randy Sargent, Illah Nourbakhsh
2018 arXiv   pre-print
The residents lacked the technological fluency to gather and curate diverse scientific data to advocate for regulatory change.  ...  Developing information technology to democratize scientific knowledge and support citizen empowerment is a challenging task.  ...  ACKNOWLEDGMENTS The Heinz Endowments, Allegheny County Clean Air Now, and all other participants. The authors thank Yen-Chi Chen for the advice in statistical analysis.  ... 
arXiv:1804.03293v1 fatcat:lqukrqk2eraydmmfnozoxos6xq

Summarizing Videos with Attention [article]

Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, Paolo Remagnino
2019 arXiv   pre-print
In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism.  ...  To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass  ...  [23] propose an adversarial network to summarize the video by minimizing the distance between the video and its summary.  ... 
arXiv:1812.01969v2 fatcat:5kzfj6ne2fa3tdywbs776dblf4

Visual Summarization of Scholarly Videos Using Word Embeddings and Keyphrase Extraction [chapter]

Hang Zhou, Christian Otto, Ralph Ewerth
2019 Lecture Notes in Computer Science  
For this purpose, we exploit video annotations that are automatically generated by speech recognition and video OCR (optical character recognition).  ...  Besides the quality of the learning resource's content, it is essential to discover the most relevant and suitable video in order to support the learning process most effectively.  ...  As stated in Section 2, this technique models topics and phrases in a single graph, and their mutual reinforcement together with a specific mechanism to select the most import keyphrases are used to generate  ... 
doi:10.1007/978-3-030-30760-8_28 fatcat:hr4cw4osprfzhar3tgznhblxqu

Artificial Intelligence in the Creative Industries: A Review [article]

Nantheera Anantrasirichai, David Bull
2021 arXiv   pre-print
(RNNs) and Deep Reinforcement Learning (DRL).  ...  A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks  ...  A category sentence generative adversarial network has also been proposed that combines GAN, RNN and reinforcement learning to enlarge training datasets, which improves performance for sentiment classification  ... 
arXiv:2007.12391v5 fatcat:mn2xqeylyrbabbu5zwln3admtm

Artificial intelligence in the creative industries: a review

Nantheera Anantrasirichai, David Bull
2021 Artificial Intelligence Review  
(RNNs) and deep Reinforcement Learning (DRL).  ...  A brief background of AI, and specifically machine learning (ML) algorithms, is provided including convolutional neural networks (CNNs), generative adversarial networks (GANs), recurrent neural networks  ...  A category sentence generative adversarial network has also been proposed that combines GAN, RNN and reinforcement learning to enlarge training datasets, which improves performance for sentiment classification  ... 
doi:10.1007/s10462-021-10039-7 fatcat:tcctdi7vprfx7mlujvqmpiy3ru

Characterizing Abhorrent, Misinformative, and Mistargeted Content on YouTube [article]

Kostantinos Papadamou
2021 arXiv   pre-print
., flat earth) than for emerging ones (like COVID-19) and that these recommendations are more common on the search results page than on a user's homepage or the video recommendations section.  ...  YouTube has revolutionized the way people discover and consume video.  ...  For the detection of disturbing videos, we built a deep learning classifier that analyzes various metadata of a given video, as well as its thumbnail.  ... 
arXiv:2105.09819v1 fatcat:k2mdclt4orgunnlwb6hnnrka4i

Task-driven Semantic Coding via Reinforcement Learning [article]

Xin Li, Jun Shi, Zhibo Chen
2021 arXiv   pre-print
learning (RL).  ...  To solve this challenge, we design semantic maps for different tasks to extract the pixelwise semantic fidelity for videos/images.  ...  Task-driven Semantic Coding via Reinforcement Learning Xin Li, Jun Shi and Zhibo Chen, Senior Member, IEEE, Abstract-Task-driven semantic video/image coding has drawn considerable attention with the development  ... 
arXiv:2106.03511v1 fatcat:nmdyqgerufc3jj6dj2zfdo3hp4

MERLOT: Multimodal Neural Script Knowledge Models [article]

Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi
2021 arXiv   pre-print
By pretraining with a mix of both frame-level (spatial) and video-level (temporal) objectives, our model not only learns to match images to temporally corresponding words, but also to contextualize what  ...  Ablation analyses demonstrate the complementary importance of: 1) training on videos versus static images; 2) scaling the magnitude and diversity of the pretraining video corpus; and 3) using diverse objectives  ...  ’s thumbnail selection algorithm selects high quality, clear frames. https://ai. googleblog.com/2015/10/improving-youtube-video-thumbnails-with.html 7 http://www.speech.cs.cmu.edu/cgi-bin/cmudict  ... 
arXiv:2106.02636v3 fatcat:mrj2t3yuanbdzhsujshtky4enq

A Black-Box Attack Model for Visually-Aware Recommender Systems [article]

Rami Cohen, Oren Sar Shalom, Dietmar Jannach, Amihood Amir
2020 arXiv   pre-print
Due to the advances in deep learning, visually-aware recommender systems (RS) have recently attracted increased research interest.  ...  Such systems combine collaborative signals with images, usually represented as feature vectors outputted by pre-trained image models.  ...  We train a model for each combination of dataset and underlying RS, yielding 4 models in total.  ... 
arXiv:2011.02701v1 fatcat:nccqgmz5tzhojpk2kar4w4fogq

Exposure: A White-Box Photo Post-Processing Framework [article]

Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, Stephen Lin
2018 arXiv   pre-print
To apply the filters in a proper sequence and with suitable parameters, we employ a deep reinforcement learning approach that learns to make decisions on what action to take next, given the current state  ...  As it is difficult for users to acquire paired images that reflect their retouching preferences, we present in this paper a deep learning approach that is instead trained on unpaired data, namely a set  ...  How to determine the sequence and parameters of these filters for a given input image is learned with a deep reinforcement learning (RL) approach guided by a generative adversarial network (GAN) that models  ... 
arXiv:1709.09602v2 fatcat:2kx5a34cz5ahbdfbcqdb2o4lwy

Exposure

Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, Stephen Lin
2018 ACM Transactions on Graphics  
How to determine the sequence and parameters of these filters for a given input image is learned with a deep reinforcement learning (RL) approach guided by a generative adversarial network (GAN) that models  ...  In addition, multiple operations are learned separately, while in our work operations are optimized elegantly as a whole, guided by the RL and GAN architecture. Reinforcement Learning.  ... 
doi:10.1145/3181974 fatcat:s3pf7fdi25hqncvga2gjbfjdzu
« Previous Showing results 1 — 15 out of 135 results