Tailored Fine-tuning for Comma Insertion in Czech

Authors

  • Jakub Machura
  • Hana Žižková
  • Patrik Stano
  • Tereza Vrabcová
  • Dana Hlaváčková
  • Ondřej Trnovec

DOI:

https://doi.org/10.2478/jazcas-2025-0024

Keywords:

comma, Czech, Fine-tuning, Large Language Model (LLM)

Abstract

Transfer learning techniques, particularly the use of pre-trained Transformers, can be trained on vast amounts of text in a particular language and can be tailored to specific grammar correction tasks, such as automatic punctuation correction. The Czech pre-trained RoBERTa model demonstrates outstanding performance in this task (Machura et al. 2022); however, previous attempts to improve the model have so far led to a slight degradation (Machura et al. 2023). In this paper, we present a more targeted fine-tuning of this model, addressing linguistic phenomena that the base model overlooked. Additionally, we provide a comparison with other models trained on a more diverse dataset beyond just web texts.

Downloads

Published

2025-03-31

How to Cite

Tailored Fine-tuning for Comma Insertion in Czech. (2025). Jazykovedný časopis [Journal of Linguistics], 76(1), 268-278. https://doi.org/10.2478/jazcas-2025-0024