A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
A task-based evaluation of combined set and network visualization
2016
Information Sciences
We conducted a comparative crowdsourced user study to evaluate all five techniques based on tasks that require information from both the network and the sets. ...
This paper addresses the problem of how best to visualize network data grouped into overlapping sets. We address it by evaluating various existing techniques alongside a new technique. ...
and EulerView respectively; and Mohamad Tayssir Alkowatly for help with setting up the framework that enabled us to run EulerView. ...
doi:10.1016/j.ins.2016.05.045
fatcat:vlwrjnvzxfaizgzyur2wn3xqbi
The effectiveness of visual aids in teaching foreign languages
2022
Zamonaviy lingvistik tadqiqotlar: xorijiy tajribalar, istiqbolli izlanishlar va tillarni o'qitishning innovatsion usullari
The article analyzes the effectiveness of visual aids in the process of teaching foreign languages and suggests ways to use them in the classroom. ...
and use of computer-based testing, diagnostic, monitoring and evaluation systems. ...
These systems are a set of software and hardware tools and equipment that allows you to combine various types of information (text, hand-drawn graphics, slides, music, moving images, sound, video) and ...
doi:10.47689/linguistic-research-vol-iss1-pp233-235
fatcat:cc4fzp3wabflbcwn3qgiedczr4
A Cross-Modal Image Fusion Method Guided by Human Visual Characteristics
[article]
2020
arXiv
pre-print
research of image fusion theory based on the characteristics of human visual perception is less. ...
Inspired by the characteristics of human visual perception, we propose a robust multi-task auxiliary learning optimization image fusion theory. ...
found a complete and effective evaluation of image quality. ...
arXiv:1912.08577v4
fatcat:lajicugsc5cllpjzihqag6tcda
Cross-Modal Image Fusion Theory Guided by Subjective Visual Attention
[article]
2019
arXiv
pre-print
Secondly, based on the human visual attention perception mechanism, we introduce the human visual attention network guided by subjective tasks on the basis of the multi-task auxiliary learning network. ...
The human visual perception system has very strong robustness and contextual awareness in a variety of image processing tasks. ...
In view of this problem, although there are some image quality evaluation methods based on generative network [27] and deep learning [28] in the field of image quality evaluation, they are still quite ...
arXiv:1912.10718v1
fatcat:4fgrayc4wjdolcey4vspg3miuy
Survey of Recent Advances in Visual Question Answering
[article]
2017
arXiv
pre-print
Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs - in terms of image processing and natural language processing. ...
This paper presents a survey of different approaches proposed to solve the problem of Visual Question Answering. We also describe the current state of the art model in later part of paper. ...
These networks are then combined with other modules and jointly trained to predict the answers. Based on different tasks, 5 types of modules are described. ...
arXiv:1709.08203v1
fatcat:sqmhzvf2o5frjjj3gdi7nq74du
Zhejiang University at ImageCLEF 2019 Visual Question Answering in the Medical Domain
2019
Conference and Labs of the Evaluation Forum
We propose a novel convolutional neural network (CNN) based on VGG16 network and Global Average Pooling strategy to extract visual features. ...
This paper describes the submission of Zhejiang University for Visual Question Answering task in medical domain (VQA-Med) of ImageCLEF 2019 [2] . ...
Feature fuse with co-attention mechanism The strategy to combine visual and semantic feature plays an important role in improving performance of VQA task. ...
dblp:conf/clef/YanLXXG19
fatcat:74xc5442yrg2xkdkyux2gdebeq
Visual Encodings for Networks with Multiple Edge Types
2020
Proceedings of the International Conference on Advanced Visual Interfaces
e) combines position and colour, f) uses size and g) combines size and colour to create a glyph. ...
The best encodings were integrated into a visual analytics tool for inferring dynamic Bayesian networks and evaluated by computational biologists for additional evidence. ...
Data Set Used in the Evaluation To create a realistic collection of networks, we ran BANJO on the songbird data [36] to create 17 DBNs. ...
doi:10.1145/3399715.3399827
dblp:conf/avi/VogogiasABK20
fatcat:k4iguksr6rfuxnjym2i7vfq2au
V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
We evaluate a range of deep learning architectures, and find that existing models, including those popular for vision-and-language tasks, are unable to solve seemingly-simple instances. ...
Such tasks include, for example, image captioning, visual question answering, and visual navigation. Their evaluation is however hindered by task-specific confounding factors and dataset biases. ...
Related work Evaluation of abstract visual reasoning Evaluating reasoning has a long history in the field of AI, but is typically based on pre-defined or easily identifiable symbols. ...
doi:10.1609/aaai.v34i07.6885
fatcat:v63l3mjlcbewhpbrr3yabmupwe
Flood Detection via Twitter Streams using Textual and Visual Features
[article]
2020
arXiv
pre-print
For the text-based flood events detection, we use a transformer network (i.e., pretrained Italian BERT model) achieving an F1-score of .853. ...
In total, we proposed four different solutions including a multi-modal solution combining textual and visual information for the mandatory run, and three single modal image and text-based solutions as ...
Multimodal Model (Run 1) The multimodal network consists of a text and an image network combined to form a shared representation before the classification layer. ...
arXiv:2011.14944v1
fatcat:czr7jr66sfe3vkjortx74guavy
VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search, and Video hyperlinking
2017
TREC Video Retrieval Evaluation
In this study, we intend to find whether the combination of the concept-based system, captioning system and text-based search system would do any help to improve search performance. ...
One is our concept-based, zero-example video search system which has been proved useful in previous years [4], one is a video captioning system which is individually trained in VTT task, the other is a ...
Acknowledgment The work described in this paper was supported by two grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 11250716, 11210514). ...
dblp:conf/trecvid/NguyenLCL00N17
fatcat:jjfp3n7qunfg5mvnrmdmf4m5dq
UNITOR @ DANKMEMES: Combining Convolutional Models and Transformer-based architectures for accurate MEME management
[chapter]
2020
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
UNITOR implements a neural model which combines a Deep Convolutional Neural Network to encode visual information of input images and a Transformerbased architecture to encode the meaning of the attached ...
UNITOR ranked first in all subtasks, clearly confirming the robustness of the investigated neural architectures and suggesting the beneficial impact of the proposed combination strategy. ...
Transformer-based Architectures for text classification. A MEME is a combination of visual information and the overlaid caption. ...
doi:10.4000/books.aaccademia.7420
fatcat:qvcld66xafd3de7ky4s7jlqxle
Multimodal Feature Fusion for Video Advertisements Tagging Via Stacking Ensemble
[article]
2021
arXiv
pre-print
This framework introduces stacking-based ensembling approach to reduce the influence of varying levels of noise and conflicts between different modalities. ...
Specifically, we propose a novel multi-modal feature fusion framework, with the goal to combine complementary information from multiple modalities. ...
We consider different perspectives (visual, text, sound, and other features) to predict the tagging of posts, and train a DNN-based classification model to obtain the taggings. ...
arXiv:2108.00679v1
fatcat:o24bdkie65gd5jkdbimglj2wwe
Adopting Semantic Information of Grayscale Radiographs for Image Classification and Retrieval
2018
Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies
All images are visually represented using Convolutional Neural Networks (CNN) and the Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) Show-and-Tell model is adopted for keyword generation ...
For the image classification tasks, Random Forest models trained with Bag-of-Keypoints visual representations were adopted. ...
Keyword Generation For keyword generation, a combination of encoding and decoding using Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) based ...
doi:10.5220/0006732301790187
dblp:conf/biostec/PelkaNF18
fatcat:aoadhd2inbc6tkx5ahua4uuuz4
Off-the-Shelf Deep Features for Saliency Detection
2021
SN Computer Science
We propose several saliency features which are computed from different networks and different levels with the aim to define which optimal network and layer for the task of saliency detection. ...
In this paper, we develop a saliency approach based on the computation of deep whitened features combined with shape features from object proposals. ...
Related Work
Visual Saliency Itti [21] proposed a saliency method based on combining low level features such as color, orientation and intensity. ...
doi:10.1007/s42979-021-00499-7
fatcat:wgu6awptnfdo5kqrnwh7x5fxny
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
[article]
2021
arXiv
pre-print
The highest classification accuracy of 93.9%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5% compared with DCASE baseline. ...
In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. ...
(SC), indicate individual roles of visual and audio features as well as their combination within SC task. 2) We then propose an ensemble of audio-based and visual-based frameworks, which help to achieve ...
arXiv:2106.06840v1
fatcat:ihkzmj4wcncuhgg3jyeymxomhq
« Previous
Showing results 1 — 15 out of 487,490 results