Abstract: Automatic speech recognition (ASR) has been significantly improved in the past years. However, most robust ASR systems are based on air-conducted (AC) speech, and their performances in low ...
Abstract: Cross-modal image-text retrieval is an important area of Vision-and-Language task that models the similarity of image-text pairs by embedding features into a shared space for alignment. To ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results
Feedback