The PlantIF framework consists of image and text feature extractors, semantic space encoders, and a multimodal feature fusion module. Image and text feature extractors are used to present visual and ...