Abstract—A large image dataset plays a crucial role in building automatic vision recognition system. However, collecting and labeling data are tedious, laborious and time-consuming tasks. In some cases, it is chicken and egg problem: it is only possible to get application data after the system deployment. In our study, we are interested in building automatic plant identification systems from images. As plants distribution on the world is not uniform and may change in response to the availability of resources, the availability of species in different areas is different. That is why some species are very abundant in one region and non-existing in others regions. Even the distribution of plant species is diverse, plant species in the planet share common features. They all have organ types such as leaf, flower, etc. Taking into this observation, in this paper, we propose a new approach for building an image-based plant identification without an available image database based on the combination of deep learning, transfer learning, and crowd-sourcing. The proposed approach consists of four main steps: plant organ detection, plant image collection, data validation and plant identification. Plant organ detection aims to learn organ type characteristic from available image datasets of plants while the purpose of the data collection step is to crawl dataset from crowd-sourced sources. Then, plant organ detection will be used in data validation in order to remove the unwanted/invalid images while keeping the valid ones. Finally, plant identification method will be developed and evaluated from the new image dataset. We illustrate and demonstrate the use of the proposed approach for building a Vietnamese medicinal plant retrieval system.
Index Terms—Organ detection, plant identification, deep learning, convolutional neural network.
Thi Thanh-Nhan Nguyen is with International Research Institute MICA, HUST-CNRS/UMI-2594-GRENOBLE INP, Hanoi, Vietnam and University of Information and Communication Technology, Thainguyen University, Thainguyen, Vietnam (e-mail: nttnhan@ictu.edu.vn).
Thi-Lan Le and Hai Vu are with International Research Institute MICA, HUST-CNRS/UMI-2594-GRENOBLE INP, Hanoi, Vietnam (e-mail: thi-lan.le@mica.edu.vn, hai.vu@mica.edu.vn).
Van-Sam Hoang is with Vietnam Forestry University, Hanoi, Vietnam (e-mail: hoangsam@vfu.edu.vn).
Cite: Thi Thanh-Nhan Nguyen, Thi-Lan Le, Hai Vu, and Van-Sam Hoang, "Towards an Automatic Plant Identification System without Dedicated Dataset," International Journal of Machine Learning and Computing vol. 9, no. 1, pp. 26-34, 2019.