A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
SubICap: Towards Subword-informed Image Captioning
[article]
2020
arXiv
pre-print
Existing Image Captioning (IC) systems model words as atomic units in captions and are unable to exploit the structural information in the words. This makes representation of rare words very difficult and out-of-vocabulary words impossible. Moreover, to avoid computational complexity, existing IC models operate over a modest sized vocabulary of frequent words, such that the identity of rare words is lost. In this work we address this common limitation of IC systems in dealing with rare words in
arXiv:2012.13122v1
fatcat:zn4plh6q5bevzbncrunxiiw5ta
more »
... the corpora. We decompose words into smaller constituent units 'subwords' and represent captions as a sequence of subwords instead of words. This helps represent all words in the corpora using a significantly lower subword vocabulary, leading to better parameter learning. Using subword language modeling, our captioning system improves various metric scores, with a training vocabulary size approximately 90% less than the baseline and various state-of-the-art word-level models. Our quantitative and qualitative results and analysis signify the efficacy of our proposed approach.
Towards Visual Affordance Learning: A Benchmark for Affordance Segmentation and Recognition
[article]
2022
arXiv
pre-print
The physical and textural attributes of objects have been widely studied for recognition, detection and segmentation tasks in computer vision. A number of datasets, such as large scale ImageNet, have been proposed for feature learning using data hungry deep neural networks and for hand-crafted feature extraction. To intelligently interact with objects, robots and intelligent machines need the ability to infer beyond the traditional physical/textural attributes, and understand/learn visual cues,
arXiv:2203.14092v1
fatcat:jhpsjl6z4nd4nk43m5ivonuqwy
more »
... called visual affordances, for affordance recognition, detection and segmentation. To date there is no publicly available large dataset for visual affordance understanding and learning. In this paper, we introduce a large scale multi-view RGBD visual affordance learning dataset, a benchmark of 47210 RGBD images from 37 object categories, annotated with 15 visual affordance categories and 35 cluttered/complex scenes with different objects and multiple affordances. To the best of our knowledge, this is the first ever and the largest multi-view RGBD visual affordance learning dataset. We benchmark the proposed dataset for affordance recognition and segmentation. To achieve this we propose an Affordance Recognition Network a.k.a ARNet. In addition, four state-of-the-art deep learning networks are evaluated for affordance segmentation task. Our experimental results showcase the challenging nature of the dataset and present definite prospects for new and robust affordance learning algorithms. The dataset is available at: https://sites.google.com/view/afaqshah/dataset.
Deep Bayesian Image Set Classification: A Defence Approach against Adversarial Attacks
[article]
2021
arXiv
pre-print
Syed Afaq Ali Shah received the PhD degree in computer vision and machine learning from The University of Western Australia (UWA), Crawley, WA, Australia. ...
Shah are with the Discipline of Information Technology, Murdoch University, Perth, Australia. • M. ...
arXiv:2108.10217v1
fatcat:s6hmgbf2sffj3c3rldm6dbefou
Automatic Number Plate Recognition:A Detailed Survey of Relevant Algorithms
2021
Sensors
Technologies and services towards smart-vehicles and Intelligent-Transportation-Systems (ITS), continues to revolutionize many aspects of human life. This paper presents a detailed survey of current techniques and advancements in Automatic-Number-Plate-Recognition (ANPR) systems, with a comprehensive performance comparison of various real-time tested and simulated algorithms, including those involving computer vision (CV). ANPR technology has the ability to detect and recognize vehicles by
doi:10.3390/s21093028
pmid:33925845
fatcat:rljgab5qlne4vi3njjyylvhxxi
more »
... number-plates using recognition techniques. Even with the best algorithms, a successful ANPR system deployment may require additional hardware to maximize its accuracy. The number plate condition, non-standardized formats, complex scenes, camera quality, camera mount position, tolerance to distortion, motion-blur, contrast problems, reflections, processing and memory limitations, environmental conditions, indoor/outdoor or day/night shots, software-tools or other hardware-based constraint may undermine its performance. This inconsistency, challenging environments and other complexities make ANPR an interesting field for researchers. The Internet-of-Things is beginning to shape future of many industries and is paving new ways for ITS. ANPR can be well utilized by integrating with RFID-systems, GPS, Android platforms and other similar technologies. Deep-Learning techniques are widely utilized in CV field for better detection rates. This research aims to advance the state-of-knowledge in ITS (ANPR) built on CV algorithms; by citing relevant prior work, analyzing and presenting a survey of extraction, segmentation and recognition techniques whilst providing guidelines on future trends in this area.
WEmbSim: A Simple yet Effective Metric for Image Captioning
[article]
2020
arXiv
pre-print
The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings(MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work
arXiv:2012.13137v1
fatcat:gw25ymwapnhm5p5dhrl5x76roi
more »
... an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.
Efficient Image Set Classification using Linear Regression based Image Reconstruction
[article]
2017
arXiv
pre-print
We propose a novel image set classification technique using linear regression models. Downsampled gallery image sets are interpreted as subspaces of a high dimensional space to avoid the computationally expensive training step. We estimate regression models for each test image using the class specific gallery subspaces. Images of the test set are then reconstructed using the regression models. Based on the minimum reconstruction error between the reconstructed and the original images, a
arXiv:1701.02485v1
fatcat:ub2ur5klfbhp7o5chui3drugrq
more »
... voting strategy is used to classify the test set. We performed extensive evaluation on the benchmark UCSD/Honda, CMU Mobo and YouTube Celebrity datasets for face classification, and ETH-80 dataset for object classification. The results demonstrate that by using only a small amount of training data, our technique achieved competitive classification accuracy and superior computational speed compared with the state-of-the-art methods.
CommuNety: A Deep Learning System for the Prediction of Cohesive Social Communities
[article]
2020
arXiv
pre-print
Fig. 8 . 8 Size Distribution of Communities
Fig. 9 . 9 Average Network Density
Afaq Shah received the Ph.D. degree in computer vision and machine learning from The University of Western Australia ( ...
Shah is with the Department of Information Technology, Mathematics and Statistics, Murdoch University, Australia, e-mail: afaq.shah@murdoch.edu.au. W. ...
arXiv:2007.14741v1
fatcat:ifpsao223zcthchbnqj7duplma
NNEval: Neural Network Based Evaluation Metric for Image Captioning
[chapter]
2018
Lecture Notes in Computer Science
The automatic evaluation of image descriptions is an intricate task, and it is highly important in the development and fine-grained analysis of captioning systems. Existing metrics to automatically evaluate image captioning systems fail to achieve a satisfactory level of correlation with human judgements at the sentence level. Moreover, these metrics, unlike humans, tend to focus on specific aspects of quality, such as the n-gram overlap or the semantic meaning. In this paper, we present the
doi:10.1007/978-3-030-01237-3_3
fatcat:kokhsilt3vcqndc5vo2zqugqgu
more »
... st learning-based metric to evaluate image captions. Our proposed framework enables us to incorporate both lexical and semantic information into a single learned metric. This results in an evaluator that takes into account various linguistic features to assess the caption quality. The experiments we performed to assess the proposed metric, show improvements upon the state of the art in terms of correlation with human judgements and demonstrate its superior robustness to distractions.
Application of MXenes in Perovskite Solar Cells: A Short Review
2021
Nanomaterials
Application of MXene materials in perovskite solar cells (PSCs) has attracted considerable attention owing to their supreme electrical conductivity, excellent carrier mobility, adjustable surface functional groups, excellent transparency and superior mechanical properties. This article reviews the progress made so far in using Ti3C2TxMXene materials in the building blocks of perovskite solar cells such as electrodes, hole transport layer (HTL), electron transport layer (ETL) and perovskite
doi:10.3390/nano11082151
pmid:34443979
pmcid:PMC8401012
fatcat:mbiqgc6oqnbbnc3khedjka7cfm
more »
... active layer. Moreover, we provide an outlook on the exciting opportunities this recently developed field offers, and the challenges faced in effectively incorporating MXene materials in the building blocks of PSCs for better operational stability and enhanced performance.
Machine Learning Approaches for Prediction of Facial Rejuvenation using Real and Synthetic Data
2019
IEEE Access
This paper proposes a novel machine learning approaches to predict the outcome of facial rejuvenation prior to a cosmetic procedure. This is achieved by estimating the required amount of dermal filler volume that needs to be applied on the face by learning the underlying structural mapping from the pretreatment and posttreatment 3D face images. We develop and train our proposed deep neural network, called Rejuv3DNet, designed specifically to predict the dermal filler volume. We also propose the
doi:10.1109/access.2019.2899379
fatcat:uiszlvtfmrccphmvfrklesplza
more »
... kernel regression (KR)-based model to validate and improve our volume estimation results using regression. Our other contributions include the development of the first 3D face cosmetic dataset, which consists of realworld pretreatment and posttreatment 3D face images and a novel technique for the generation of synthetic cosmetic treatment 3D face images. Our experimental results show that the proposed Rejuv3DNet and the KR model achieve 62.5% and 66.67%, respectively, on real-world data, while these techniques achieve a prediction accuracy of 75.2% and 89.5%, and 77.2% and 90.1% on our two different synthetic datasets. Our proposed techniques have been found to be computationally efficient, achieving near real-time prediction performance. The reported accuracies are our preliminary results for proof of concept, which can be improved with more data. The proposed approach has the potential for further investigation in the cosmetic surgery domain. INDEX TERMS Deep learning, deep neural network, facial analysis, regression.
Scene Graph Generation: A Comprehensive Survey
[article]
2022
arXiv
pre-print
Deep learning techniques have led to remarkable breakthroughs in the field of generic object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically mapping an image into a semantic structural scene graph, which requires the correct labeling of detected objects and their
arXiv:2201.00443v2
fatcat:s4w7sdf6dndzneujly54srh5c4
more »
... ships. Although this is a challenging task, the community has proposed a lot of SGG approaches and achieved good results. In this paper, we provide a comprehensive survey of recent achievements in this field brought about by deep learning techniques. We review 138 representative works that cover different input modalities, and systematically summarize existing methods of image-based SGG from the perspective of feature extraction and fusion. We attempt to connect and systematize the existing visual relationship detection methods, to summarize, and interpret the mechanisms and the strategies of SGG in a comprehensive way. Finally, we finish this survey with deep discussions about current existing problems and future research directions. This survey will help readers to develop a better understanding of the current research status and ideas.
Learning-based Composite Metrics for Improved Caption Evaluation
2018
Proceedings of ACL 2018, Student Research Workshop
The evaluation of image caption quality is a challenging task, which requires the assessment of two main aspects in a caption: adequacy and fluency. These quality aspects can be judged using a combination of several linguistic features. However, most of the current image captioning metrics focus only on specific linguistic facets, such as the lexical or semantic, and fail to meet a satisfactory level of correlation with human judgements at the sentence-level. We propose a learning-based
doi:10.18653/v1/p18-3003
dblp:conf/acl/SharifWBS18
fatcat:pn2r4slifzev5ejax66pk5kq5i
more »
... k to incorporate the scores of a set of lexical and semantic metrics as features, to capture the adequacy and fluency of captions at different linguistic levels. Our experimental results demonstrate that composite metrics draw upon the strengths of standalone measures to yield improved correlation and accuracy.
A novel 3D vorticity based approach for automatic registration of low resolution range images
2015
Pattern Recognition
This paper tackles the problem of feature matching and range image registration. Our approach is based on a novel set of discriminating three-dimensional (3D) local features, named 3D-Vor (Vorticity). In contrast to conventional local feature representation techniques, which use the vector field (i.e. surface normals) to just construct their local reference frames, the proposed feature representation exploits the vorticity of the vector field computed at each point of the local surface to
doi:10.1016/j.patcog.2015.03.014
fatcat:m7ndkqzrsrgcdoslq2fj2pessy
more »
... e the distinctive characteristics at each point of the underlying 3D surface. The 3D-Vor descriptors of two range images are then matched using a fully automatic feature matching algorithm which identifies correspondences between the two range images. Correspondences are verified in a local validation step of the proposed algorithm and used for the pairwise registration of the range images. Quantitative results on low resolution Kinect 3D data (Washington RGB-D dataset) show that our proposed automatic registration algorithm is accurate and computationally efficient. The performance evaluation of the proposed descriptor was also carried out on the challenging low resolution Washington RGB-D (Kinect) object dataset, for the tasks of automatic range image registration. Reported experimental results show that the proposed local surface descriptor is robust to resolution, noise and more accurate than state-of-the-art techniques. It achieves 90% registration accuracy compared to 50%, 69.2% and 52% for spin image, 3D SURF and SISI/LD-SIFT descriptors, respectively.
WEmbSim: A Simple yet Effective Metric for Image Captioning
2020
2020 Digital Image Computing: Techniques and Applications (DICTA)
The area of automatic image caption evaluation is still undergoing intensive research to address the needs of generating captions which can meet adequacy and fluency requirements. Based on our past attempts at developing highly sophisticated learning-based metrics, we have discovered that a simple cosine similarity measure using the Mean of Word Embeddings (MOWE) of captions can actually achieve a surprisingly high performance on unsupervised caption evaluation. This inspires our proposed work
doi:10.1109/dicta51227.2020.9363392
fatcat:bdbpf3xwhbcq3df7yvjdkmfp2e
more »
... n an effective metric WEmbSim, which beats complex measures such as SPICE, CIDEr and WMD at system-level correlation with human judgments. Moreover, it also achieves the best accuracy at matching human consensus scores for caption pairs, against commonly used unsupervised methods. Therefore, we believe that WEmbSim sets a new baseline for any complex metric to be justified.
Effect of Different Levels of Zinc and Compost on Yield and Yield Components of Wheat
2022
Agronomy
Management of organic matter and micronutrients is very important for the sustainable improvement of soil health. Poor soil organic matter usually results in lower availability of zinc (Zn) micronutrients in plants. Such deficiency in Zn causes a significant decrease in the growth and yield of crops. The need at the current time is to balance the application of organic amendments with Zn micronutrients to achieve optimum crop yields. Thus, the current study was conducted to investigate wheat,
doi:10.3390/agronomy12071562
fatcat:grlwutcfkzfgzmpw7ipygtkkcu
more »
... ing compost as organic matter and Zn as a micronutrient. There were three levels of compost (i.e., control (0C), 5 t/ha (5C) and 10 t/ha (10C)) and four levels of Zn (control (0Zn), 2.5 kg Zn/ha (2.5Zn), 5.0 kg Zn/ha (5.0Zn) and 10.0 kg Zn/ha (10.0Zn)) applied with three replicates. The addition of 10C under 10.0Zn produced significantly better results for the maximum enhancement in plant height (8.08%), tillers/m2 (21.61%), spikes/m2 (22.33%) and spike length (40.50%) compared to 0C. Significant enhancements in 1000-grain weight, biological yield and grain yield also validated the effectiveness of 10C under 10.0Zn compared to 0C. In conclusion, application of 10C with 10.0Zn showed the potential to improve wheat growth and yield attributes. The addition of 10C with 10.0Zn also regulated soil mineral N, total soil N and extractable soil P. Further investigation is recommended with different soil textures to verify 10C with 10.0Zn as the best amendment for the enhancement of wheat yield in poor organic matter and Zn-deficient soils.
« Previous
Showing results 1 — 15 out of 74 results