Deep Metadata Fusion for Traffic Light to Lane Assignment
IEEE Robotics and Automation Letters
We present a deep metadata fusion approach that connects image data and heterogeneous metadata inside a Convolutional Neural Network (CNN). This approach enables us to assign all relevant traffic lights to their associated lanes. To achieve this, a common CNN topology is trained by down-sampled and transformed input images to predict an indication vector. The indication vector contains the column positions of all the relevant traffic lights that are associated with lanes. In parallel, we fuse
... parallel, we fuse prepared and adaptively weighted Metadata Feature Maps (MFM) with the convolutional feature map input of a selected convolutional layer. The results are compared to rule-based, only-metadata, and only-vision approaches. In addition, human performance of the traffic light to ego-vehicle lane assignment has been measured by a subjective test. The proposed approach outperforms all other approaches. It achieves about 93.0% average precision for a real-world dataset. In a more complex dataset, 87.1% average precision is achieved. In particular, the new approach reaches significantly higher results with 93.7% to 91.0% average accuracy for a real-world dataset in contrast to lower human performance. Index Terms-Intelligent transportation systems, computer vision for transportation, deep learning in robotics and automation.