Использование метода RAG и больших языковых моделей в интеллектуальных образовательных экосистемах

Denis M. Obolensky; Victoria I. Shevchenko

doi:10.52575/2687-0932-2024-51-3-699-709

Authors

Denis M. Obolensky New Technologies LLC
Victoria I. Shevchenko Sevastopol State University

DOI:

https://doi.org/10.52575/2687-0932-2024-51-3-699-709

Keywords:

RAG, LLM, intelligent educational ecosystem, large language models, python, Langchain

Abstract

The article discusses the usage of the Retrieval-Augmented Generation (RAG) algorithm and large language models in intelligent educational ecosystems. The authors demonstrate the ability of large language models to improve the representation of educational resources, vacancies and user preferences in recommendation systems. The application of the RAG algorithm to supplement the knowledge of large language models with new data without additional training is considered. The example of implementation in an intelligent educational ecosystem shows the use of the Langchain library, the GigaChat large language model and the Qdrant vector database with jobs and educational resources descriptions to generate a user-friendly description of the labor market in accordance with his request.

Downloads

Download data is not yet available.

Author Biographies

Denis M. Obolensky, New Technologies LLC

senior software developer, New Technologies LLC,
Sevastopol, Russia

E-mail: denismaster@outlook.com

Victoria I. Shevchenko, Sevastopol State University

Candidate of Technical Sciences, Associate Professor, Head of the basic department "Corporate Information Systems", Sevastopol State University,
Sevastopol, Russia

E-mail: VIShevchenko@sevsu.ru

References

Бабкин А.В., Корягин С.И., Либерман И.В., Клачек П.М. 2022. Индустрия 5.0: Создание интеллектуальной образовательной экосистемы. Экономика и индустрия 5.0 в условиях новой реальности (ИНПРОМ-2022), 76–79.

Малышев И.О., Смирнов А.А. 2024. Обзор современных генеративных нейросетей: отечественная и зарубежная практика. Международный журнал гуманитарных и естественных наук. №1-2(88).

Оболенский Д.М., Шевченко В.И. 2019. Интеллектуальные образовательные экосистемы. Сб. науч. тр. междунар. науч.-техн. конф. «DICTUM – FACTUM: от исследований к стратегическим решениям». Севастополь. 162–171. DOI: 10.32743/dictum-factum.2020.162-1714е4

Оболенский Д.М., Шевченко В.И. 2020. Концептуальная модель интеллектуальной образовательной экосистемы. Экономика. Информатика. 47(2): 390–401. DOI: 10.18413/2687-0932-2020-47-2-390-401.4е4е

Оболенский Д.М., Шевченко В.И. 2021. Обзор современных методов построения рекомендательных систем – на основе контента и гибридные системы. Мир компьютерных технологий: сборник статей всероссийской научно-технической конференции студентов, аспирантов и молодых ученых, Севастополь, 05–09 апреля 2021 г. Министерство науки и высшего образования РФ, Севастопольский государственный университет. Севастополь: Федеральное государственное автономное образовательное учреждение высшего образования «Севастопольский государственный университет», 151–156.

Оболенский Д.М., Шевченко В.И. 2023. Построение и анализ графа компетенций на основе данных вакансий с порталов поиска работы. Экономика. Информатика, 50(1): 191–202. https://doi.org/10.52575/2687-0932-2023-50-1-191-202

Achiam J., Adler S., Agarwal S., Ahmad L., Akkaya I., Aleman F., Almeida D., Altenschmidt J., Altman S., Anadkat S., et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.

Bowen J., Gang L., Chi H., Meng J., Heng J., Jiawei H. 2023. Large Language Models on Graphs: A Comprehensive Survey. arXiv preprint arXiv:2312.02783.

Brown T. et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33. 1877–1901.

Devlin J., Chang M., Lee K., Toutanova K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.

Gao Y., Xiong Y., Gao X., Jia K., Pan J., Bi Y., Dai Y., Sun J., Guo Q., Wang M., Wang H. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. ArXiv, abs/2312.10997.

GigaChat. 2024. URL: https://developers.sber.ru/portal/products/gigachat

Graph Data Platform | Graph Database Management System. Neo4j. 2024. URL: https://neo4j.com/

High-Performance Vector Search at Scale. Qdrant – Vector Database – Qdrant. 2024. URL: https://qdrant.tech/

Keraghel I., Morbieu S., Nadif M. Beyond Words: A Comparative Analysis of LLM Embeddings for Effective Clustering. In: Miliou, I., Piatkowski, N., Papapetrou, P. (eds). 2024. Advances in Intelligent Data Analysis XXII. IDA 2024. Lecture Notes in Computer Science, vol 14641. Springer, Cham. https://doi.org/10.1007/978-3-031-58547-0_17

Langchain. 2024. URL: https://python.langchain.com/v0.2/docs/introduction/

Lewis P., Perez E., Piktus A., Petroni F., Karpukhin V., Goyal N., Kuttler H., Lewis M., Wen-tau Yih, Rocktaschel T., et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33:9459–9474, URL https://doi.org/10.48550/arXiv.2005.11401.

Luo L., Li Y.-F., Haffari Gh., Pan Sh. 2023. Reasoning on graphs: Faithful and interpretable large language model reasoning. arXiv preprint arXiv:2310.01061.

Mikolov T., et al. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Radford A., et al. 2019.Improving language understanding by generative pre-training.

Shadab I., Subhajit G. 2020. Efficient Ranking Framework for Information Retrieval Using Similarity Measure. DOI: 10.1007/978-3-030-37218-7_141.

Shoeybi M. et al. 2019. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053.

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.

Weawiate – The AI-native database for a new generation of software. 2024. URL: https://weaviate.io/

YandexGPT 3 – Новое поколение генеративных текстовых нейросетей. 2024. YandexGPT. URL: https://ya.ru/ai/gpt-3

Yang L., Chen H., Li Zh., Ding X., Wu X. 2023. ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling. arXiv preprint arXiv:2306.11489.

Zhang J., Lertvittayakumjorn P., Guo Y. 2019. Integrating Semantic Knowledge to Tackle Zero-shot Text Classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics. 1031–1040.

Zhou D., Scharli N., Hou L., Wei J., Scales N., Wang X., Schuurmans D., Bousquet O., Le Q., Chi E.H. 2022. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. ArXiv, abs/2205.10625.

Application of Large Language Models and the RAG in Intelligent Educational Ecosystems

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Denis M. Obolensky, New Technologies LLC

Victoria I. Shevchenko, Sevastopol State University

References

Share

Published

How to Cite

Issue

Section

Most read articles by the same author(s)