ICDAR 2025 Workshop on
Documents Analysis of Low-resource Languages
Motivation
The importance of low-resource document analysis is multifaceted, particularly in the fields of cultural preservation, data scarcity, linguistic research, and technological applications. Firstly, low-resource languages often embody unique cultural and historical contexts. Document analysis facilitates the digitization and preservation of these linguistic materials, providing crucial resources for understanding human history and cultural evolution. For instance, many endangered languages possess vast amounts of scanned documents, which can be analyzed to create valuable linguistic and cultural repositories. Secondly, low-resource languages typically suffer from a lack of large-scale annotated datasets, posing challenges for training machine learning models. Document analysis techniques, such as Optical Character Recognition (OCR) and document layout analysis, enable the extraction and structuring of data from existing documents, thereby mitigating data scarcity issues. Moreover, document analysis plays a pivotal role in enhancing machine translation capabilities. Monolingual data extracted through OCR can be utilized to improve machine translation for low-resource languages, which is particularly critical for languages with limited parallel corpora. Additionally, document analysis supports linguistic research by enabling the study of language variations and historical documentation, shedding light on the evolution and unique features of these languages. Finally, document analysis enhances the accessibility and usability of low-resource language documents. For example, advancements in OCR systems for non-Latin scripts allow researchers to extract text more efficiently from scanned documents, enabling applications such as content summarization and information retrieval. In summary, low-resource document analysis is not only a vital tool for cultural preservation but also a key driver of language technology development and academic research.
Tentative Schedule
All times in Beijing Time (UTC+08:00)
Time | Events | |
09:00 - 9:10 | Opening Remarks | |
09:10 - 10:00 | Invited Talk: To Be Announced | |
10:00 - 10:20 | Coffee break | |
10:20 - 12:00 | Presentation | |
12:00 - 13:00 | Discussion & Conclusion | |
12:00 - 14:00 | Poster |
Call for Papers
Acceptable submission topics may include but are not limited to:
- Document image processing
- Document image processing Optical Character Recognition, OCR
- Logical layout analysis
- Handwriting recognition
- Natural language processing for document understanding
- Medical document analysis
- Document entity recognition
- Document entity relationship
- Pretrained model for document analysis
- Language model for document information extraction
- Gold-Standard benchmarks and datasets for low-resource languages
- Document analysis systems for low-resource languages
Submission
This workshop invites original contributions in both theoretical and applied research domains. All submissions must adhere to the formatting guidelines specified on the ICDAR 2025 official website. Paper length is limited to 15 pages (excluding references) and must comply with our double-blind review requirements:
- Remove all author identifiers (names, affiliations, etc.) from the manuscript
- Cite previous work in third-person format to avoid identity disclosure
- Omit acknowledgments section in initial submissions
Submissions will be accepted through the workshop's EasyChair submission portal. At least one author of each accepted paper must complete workshop registration to present the work. Detailed submission procedures are available on the ICDAR 2025 guidelines portal.
Contact
Important Dates
- Submission Deadline: July 4, 2025(No further extension)
- Decisions Announced: July 25, 2025
- Camera Ready Deadline: July 31, 2025
- September 21, 2025
Publication
Accepted papers will be published in the ICDAR 2025 workshop proceedings.
Workshop Chairs
- Yong,Tso, Xizang University, China
- Brian Kenji Iwana,Kyushu University,Japan
- Yu,Yongbin, University of Electronic Science and Technology, China
Program Committee Members
- Nyima,Trashi, Xizang University,China
- Brian Kenji Iwana,Kyushu University,Japan
- Harold MOUCHERE,Nantes Univerisity, France
- Cheng,Jian,University of Electronic Science and Technology, China
- Anna Zhu, Wuhan University of Technology,China
- Yu,Yongbin, University of Electronic Science and Technology, China
- Yong,Tso, Xizang University,China
- Rinchen,Dongrub, Xizang University,China
Short CV of the Workshop Chairs
Prof. Yong Tso. Tso Yong is a professor of Xizang University from China, a senior member of the Chinese Association of Artificial Intelligence. Her main research area is Artificial Intelligence (Few Shot Learning), mainly focusing on intelligent analysis of Tibetan ancient literature, including digitization of ancient books, knowledge extraction, and language modeling etc. She has served as Principal Investigator (PI) for research projects including the National Natural Science Foundation of China (NSFC) grants and a sub-project under the National Key R&D Program of China (NKPs), and won two first prizes and two second prizes in science and technology of the Xizang Autonomous Region,and also served as the chief engineer for the National Key Research and Development project "Integration and Application Demonstration of Digital Technology for Tibetan Ancient Books and Documents". She had as a national visiting scholar at the University of Bergen in Norway, the University of Virginia in the United States, and the University of British Columbia in Canada.
Prof. Brian Kenji Iwana. Brian Kenji Iwana is an Associate Professor at the Department of Advanced Information Science in the Graduate School of Information Science and Electrical Engineering, Kyushu University. He received a B.S. in Computer Engineering of the University of California, Irvine, USA. After getting his Bachelor's, Brian Kenji Iwana worked as a software developer at the National Aeronautics and Space Administration (NASA) in Mountain View, California. He returned to academia and received a Ph.D. from the Graduate School of Information Science and Electrical Engineering, Kyushu University. He was also a graduate from the Graduate Education and Research Training Program in Decision Science for a Sustainable Society, Kyushu University. Since then, he worked as a Post Doc, Assistant Professor, and then an Associate Professor at the Graduate School of Information Science and Electrical Engineering, Kyushu University. Furthermore, he is affiliated with the International Undergraduate Program In English (IUPE), Kyushu University and the Graduate Program of Interdisciplinary Policy Analysis and Design (GIPAD), Kyushu University. He is an Associate Editor for the journal, Springer Nature Computer Science, and has served on many international conference program committees, such as ICDAR, ICFHR, AAAI, ICPR, and DAS. His research interests include time series recognition, dynamic programming, artificial neural networks, document recognition, and natural language processing (NLP).
Prof. Yu,Yongbin. Yongbin Yu is an Associate Professor at the School of Information and Software Engineering, University of Electronic Science and Technology of China (UESTC). He has visited the University of Michigan at Ann Arbor, Ann Arbor, MI, USA, in 2013-2014, and the University of California at Santa Barbara, Santa Barbara, CA, USA, in 2016-2017. He has worked as the Guest Deputy Director with the Department of Big Data Industry, Sichuan Provincial Economic and Information Commission, in 2018-2020. His research focuses on natural language processing, memristor-based neural network, swarm intelligence, and big data. He has made Research and Application of Key Technologies in Tibetan Natural Language Processing, and won the First Prize of Science and Technology Award of Tibet Autonomous Region in 2018.He has organized international academic events as Publicity Co-Chair and Technical Program Commitee (TPC) member for conferences including ICCCAS.
Invited Speakers
To Be Announced
Title:
Abstract: