Использование моделей BERT и GPT как эффективное решение для автоматизации построения онтологий

Aleksandr M. Katyshev; Anton V. Anikin

doi:10.52575/2687-0932-2026-53-1-144-152

Authors

Aleksandr M. Katyshev Volgograd State Technical University
Anton V. Anikin Volgograd State Technical University

DOI:

https://doi.org/10.52575/2687-0932-2026-53-1-144-152

Keywords:

Ontological Knowledge Base, Knowledge Graph, Ontology Learning, BERT, GPT

Abstract

This paper addresses the challenge of automated ontology construction, particularly for morphologically rich languages like Russian, where existing tools such as Text2Onto and FRED show significant limitations. We introduce a novel hybrid methodology that synergistically integrates two powerful transformer-based models to build comprehensive ontological knowledge bases from Russian text corpora. The primary objective is to overcome the trade-off between precision and recall inherent in single-model approaches. Our proposed framework operates in a two-stage process. Initially, a Russian-adapted Bidirectional Encoder Representations from Transformers (BERT) model is employed for high-precision extraction of explicit knowledge. Leveraging its deep contextual understanding, BERT performs named entity recognition to identify candidate concepts and extracts a foundational set of semantic relationships through a sentence-pair classification approach. Subsequently, a fine-tuned Generative Pre-trained Transformer (GPT) model is utilized for knowledge enrichment and recall enhancement. GPT generates plausible hypotheses about unstated or implicit relationships between concepts, refines and verifies relations found by BERT, and resolves logical conflicts, thereby filling knowledge gaps. An empirical evaluation was conducted on a corpus of educational texts on web development to validate the method efficacy. The combined BERT+GPT approach demonstrated superior performance, achieving an F1-measure of 0.82, which significantly surpasses standalone BERT (0.80), FRED (0.62), and Text2Onto (0.52). This improvement is primarily attributed to a substantial increase in recall (0.81) while maintaining high precision (0.82). The practical application and utility of the generated ontologies are discussed in the context of their integration with knowledge management platforms like Stardog, enabling advanced semantic search, data enrichment, and logical inference capabilities.

Downloads

Download data is not yet available.

Author Biographies

Aleksandr M. Katyshev, Volgograd State Technical University

Lecturer of the Department of Software for Automated Systems, Volgograd, Russia

Anton V. Anikin, Volgograd State Technical University

Candidate of Technical Sciences, Associate Professor of the Department of Software for Automated Systems, Volgograd, Russia
E-mail: anton@anikin.name

References

References

Al-Aswadi F.N., Chan H.Y., Gan K.H. 2020. Automatic ontology construction from text: a review from shallow to deep learning trend. Artificial Intelligence Review, 53: 3901–3928. DOI: 10.1007/s10462-019-09782-9.

Anikin A., Kultsova M., Irina Z., Sadovnikova N., Litovkin D. 2014. Knowledge based models and software tools for learning management in open learning network. In: Communications in Computer and Information Science. Vol. 466. Springer, 156-171. DOI: 10.1007/978-3-319-11854-3_15.

Bhatt A., Vaghela N., Dudhia K. 2024. Generating knowledge graphs from large language models: A comparative study of GPT-4, LLAMA 2, and BERT. arXiv preprint arXiv:2401.07412.

Biemann C. 2005. Ontology Learning from Text: A Survey of Methods. In: Proceedings of the LDV-Forum, Band 20(2): 75-93.

Bosselut A., Rashkin H., Sap M., Malaviya C., Celikyilmaz A., Choi Y. 2019. Comet: Commonsense transformers for automatic knowledge graph construction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy, Association for Computational Linguistics, 1530-1540. DOI: 10.18653/v1/P19-1146.

Brown T., Mann B., Ryder N., et al. 2020. Language models are few-shot learners. In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020), 1877-1901.

Cimiano P., Völker J. 2005. A framework for ontology learning and data-driven change discovery. In: Natural Language Processing and Information Systems. Alicante, Spain, Springer, 227-238. DOI: 10.1007/11428817_22.

Devlin J., Chang M.-W., Lee K., Toutanova K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis, Minnesota, Association for Computational Linguistics, 4171-4186. DOI: 10.18653/v1/N19-1423.

Fortuna B., Grobelnik M., Mladenic D. 2007. OntoGen: Semi-automatic Ontology Editor. In: Knowledge Discovery in Databases: PKDD 2007. Warsaw, Poland, Springer, 65-76. DOI: 10.1007/978-3-540-74976-9_9.

Gangemi A., Presutti V., Reforgiato Recupero D., et al. 2017. Semantic web machine reading with FRED. Semantic Web, 8(6): 873-893. DOI: 10.3233/SW-160240.

Haque F., Xu D., Niu X. 2025. A Comprehensive Survey on Bias and Fairness in Large Language Models. In: Trends and Applications in Knowledge Discovery and Data Mining. Springer,

-101. DOI: 10.1007/978-981-96-8197-6_7.

Hogan A., Blomqvist E., Cochez M., et al. 2021. Knowledge graphs. ACM Computing Surveys (CSUR), 54(4): 1-37. DOI: 10.1145/3447790.

Karpukhin V., Baranchukov A., Burtsev M., Tsetlin Y., Gusev G. 2021. RuGPT-3: Large-scale russian language models with few-shot learning capabilities. arXiv preprint arXiv:2109.04351.

Katyshev A., Anikin A., Denisov M., Petrova T. 2021. Intelligent Approaches for the Automated Domain Ontology Extraction. In: Advanced Network Technologies and Intelligent Computing. Springer, 81-91. DOI: 10.1007/978-981-96-8197-6_7.

Kuratov Y., Arkhipov M. 2019. Adaptation of deep bidirectional multilingual transformers for russian language. arXiv preprint arXiv:1905.07213.

Pan S., Luo L., Wang Y., et al. 2023. Unifying Large Language Models and Knowledge Graphs: A Roadmap. arXiv preprint arXiv:2306.08302.

Petroni F., Rocktäschel T., Lewis P., et al. 2019. Language models as knowledge bases? In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, Association for Computational Linguistics, 2763-2773. DOI: 10.18653/v1/D19-1282.

Ye H., Zhang N., Deng S., et al. 2022. Ontology-enhanced Prompt-tuning for Few-shot Learning. arXiv preprint arXiv:2201.11332.

Zhao B., Ji C., Zhang Y., et al. 2023. Large language models are complex table parsers. arXiv preprint arXiv:2312.11521.

Zheng J., Xiang Z., Stoeckert Jr C.J., He Y. 2014. Ontodog: a web-based ontology community view generation tool. Bioinformatics, 30(9): 1340-1342. DOI: 10.1093/bioinformatics/btt761.

Zheng L., Guha N., Anderson B.R., Henderson P., Ho D.E. 2021. When does pre-training help? assessing self-supervised learning for law and the casehold dataset of 53,000+ legal holdings. In: Proceedings of the 18th International Conference on Artificial Intelligence and Law (ICAIL 2021). São Paulo, Brazil, ACM, 159-168. DOI: 10.1145/3462757.3462772.

Effective BERT and GPT Integration for Ontology Development

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Aleksandr M. Katyshev, Volgograd State Technical University

Anton V. Anikin, Volgograd State Technical University

References

Share

Published

How to Cite

Issue

Section

Most read articles by the same author(s)