Preview

Bulletin of "Turan" University

Advanced search

Economic aspects of error identification in semi-structured publications in the state language

https://doi.org/10.46914/1562-2959-2024-1-3-128-138

Abstract

Due to the rapid growth of information on the Internet and social networks, research in the field of computational linguistics has become very relevant. The volume of information that people and machines create in natural language needs to be processed, analyzed and verified. Information retrieval systems, dialog systems, and machine translation tools are used for this. The range of automatic text processing systems is very wide, it covers various tasks. Finding errors in texts and words, identifying and correcting incorrect words is one of the most important tasks of natural language processing (NLP). The article provides an overview of semi-structured data, methods and technologies for identifying incorrect words in natural languages. The paper gives an overview of semi-structured data, methods and techniques for detecting incorrect words in natural languages. The aim of the research is to develop an effective approach for detecting and correcting errors occurring in Kazakh-language texts, especially in the context of limited resources and unstructured data. The research includes the use of machine learning techniques as well as economic analysis of the costs of developing and implementing such solutions. The proposed approach facilitates the automation of text verification, which can significantly reduce the cost of manual data processing and improve the quality of information in various spheres, including business and public administration.

About the Authors

L. M. Baitenova
Turan University
Kazakhstan

D.e.s., professor.

Almaty



D. R. Rakhimova
Turan University; Al Farabi Kazakh National University
Kazakhstan

PhD, associate professor.

Almaty



A. T. Turarbek
Turan University; Al Farabi Kazakh National University
Kazakhstan

PhD, associate professor.

Almaty



E. Adali
Istanbul Technical University
Turkey

PhD, professor.

Istanbul



References

1. Rahimova D.R. (2020) Komp'juternaja obrabotka kazahskogo jazyka: sbornik nauchnyh trudov (materialov) // Қazaқ universitetі. Almaty, 146 p. (In Russian).

2. Han B., Baldwin T. (2011) Lexical normalisation of short text messages: Makn sens a# twitter // 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. V. 1. P. 368–378. (In English).

3. Farra N. et al. (2014) Generalized Character-Level Spelling Error Correction. Association for Computational Linguistics. No. 2. P. 161–167. (In English).

4. Hladek D. et al. (2020) Survey of Automatic Spelling Correction // Electronics. V. 9. No. 10. P. 1–29. (In English).

5. Peter B. (1997) Semistructured data // Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. P. 117–121. (In English).

6. Brill E., Moore R.C. (2000) An improved error model for noisy channel spelling correction // Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics. P. 1–10. (In English).

7. Farag A., Ernesto W., Andreas N. (2009) Revised N-Gram based Automatic Spelling Correction Tool to Improve Retrieval Effectiveness. No. 40. P. 117–121. (In English).

8. Kaufmann M., Kalita J. (2010) Syntactic normalization of twitter messages // International conference on natural language processing. Kharagpur, India. P. 266. (In English).

9. Luchshie programmy dlja ispravlenija oshibok v tekste. URL: https://www.rush-analytics.ru/blog/programmy-dlya-ispravleniya-v-tekste-razbor-primerov-i-osnovnye-osobennosti (accessed: 22.08.2024). (In Russian).

10. Shaalan K., Aref R., Fahmy A. (2010) An approach for analyzing and correcting spelling errors for non-native Arabic learners // Computer Science. The 7th International Conference on Informatics and Systems. P. 53–59. (In English).

11. Taktashkin D.V., Mokrousova E.A. (2017) Metody i algoritmy proverki orfografii testovyh dokumentov // Jelektronnyj nauchno-prakticheskij zhurnal «Sovremennye nauchnye issledovanija i innovacii». No. 5. URL: https://web.snauka.ru/issues/2017/05/72892 (data obrashhenija: 12.08.2023). (In Russian).

12. Rakesh K., Minu B. and Kumar S. (2018) A study of spell checking techniques for indian languages // JK Research Journal in Mathematics and Computer Sciences. V. 1. No. 1. P. 105–111. (In English).

13. Tukeyev U., Turganbaeva A. (2016) Lexicon-free stemming for the Kazakh language. Materials of the International Scientific Conference «Computer science and Applied Mathematics» dedicated to the 25th anniversary of the Independence of the Republic of Kazakhstan and the 25th anniversary of the Institute of Information and Computational Technologies. Part ІІ. Almaty. September 21–24. P. 84–88. (In English).

14. Tukeyev U., Turganbaeva A., Karibayeva A., Amirova D., Abduali B. (2020) Language_Resources_ for_Kazakh_language. URL: https://github.com/NLPKazNU/Language_Resources_for_Kazakh_language. (accessed: 12.08.2024). (In English).

15. Recent advances in Apertium, a free/open-source rul-based machine translation platform for low-esource languages. 2021. URL: https://turkic.apertium.org/index.kaz.html?choice=kaz&qA=%D0%9C%D0%B5%D0%BD%D1%96%D2%A3%20%D0%BE%D2%9B%D1%83%D1%88%D1%8B%D0%BC%20#analyzation (accessed: 29.07.2024). (In English).


Review

For citations:


Baitenova L.M., Rakhimova D.R., Turarbek A.T., Adali E. Economic aspects of error identification in semi-structured publications in the state language. Bulletin of "Turan" University. 2024;(3):128-138. (In Kazakh) https://doi.org/10.46914/1562-2959-2024-1-3-128-138

Views: 187


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1562-2959 (Print)
ISSN 2959-1236 (Online)