Computer Science, 1987-2025
Permanent URI for this collectionhttps://theses-dissertations.princeton.edu/handle/88435/dsp01mp48sc83w
Browse
Browsing Computer Science, 1987-2025 by Author "Chan, Emilio"
- Results Per Page
- Sort Options
Handwritten Chinese Error Correction for Learners of Chinese as a Foreign Language
(2025-04-10) Chan, Emilio; Fong, Ruth CatherineHandwritten Chinese character error correction (HCCEC) is the process by which machine-learning models assess an image of a handwritten Chinese character, determine whether or not it is written incorrectly, and if it written incorrectly, output the character that the writer intended to write. HCCEC has gained more attention in recent years, but so far no work has been done to assess or create models targeted towards learners of Chinese as a foreign language (CFL learners). CFL learners stand to gain a great deal from HCCEC. An effective HCCEC model would be an effective educational tool to help CFL learners learn and practice handwriting Chinese characters. As part of this work, a dataset containing handwritten Chinese characters produced by CFL learners was created that contains both correctly written Chinese characters and incorrectly written Chinese characters. Next, an existing HCCEC model called the Tree-structure Analysis Network (TAN) is trained on a large dataset containing characters written by middle school students in China and then evaluated on test sets of the CFL learner dataset (Li et al. 2023, Li et al. 2023). Finally, TAN is fine-tuned using the training and validation sets of the CFL learner dataset and re-evaluated on the test sets. While performance on key evaluation metrics does not reach that of previous work on different datasets, this work does show that fine-tuning HCCEC models using data produced by CFL learners can improve all key metrics when evaluating the model on characters written by CFL learners (Hu et al. 2023, Li et al. 2023). It is my hope that this work can be the first of many exploring the potential of HCCEC applied to characters written by learners of Chinese as a foreign language.