Automatic Detection of Anger and Aggression in Speech Signals
DOI:
https://doi.org/10.52575/2712-746X-2023-50-4-944-954Keywords:
speech data, speech databases, classification, classification methods, low-level descriptors, anger recognition, aggression recognitionAbstract
The article discusses the issue of detecting anger and aggression in a speech signal. The fundamental differences between anger and aggression are considered. A review of solutions for recognizing destructive behavior in the form of anger and aggression using a speech signal, presented in various modern publications, was carried out. The main classification methods used to solve the problem of recognizing emotions from speech are considered. Information support in the form of Russian-language and non-Russian-language speech databases used to train models for recognizing emotions is analyzed. The main problems of using speech databases are formulated. The issue of choosing speech signal parameters used to classify emotions in general and destructive behavior in particular is considered. Implemented anger recognition on the Russian-language Dusha database using two approaches and three classification methods.
Downloads
References
Величко А.Н. 2022. Метод анализа речевого сигнала для автоматического определения агрессии в разговорной речи. Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии. № 4. С. 180-188.
Кажберова В.В., Чхартишвили А.Г., Губанов Д.А., Козицин И.В., Белявский Е.В., Федянин Д.Н., Черкасов С.Н., Мешков Д.О. 2023. Агрессия в общении медиапользователей: анализ особенностей поведения и взаимного влияния Вестник Московского университета. Серия 10: Журналистика. № 3. С. 26-56.
Buss A., Durkee A.An inventory for assessing different kinds of hostility. 1957. Journal of Consulting Psychology. 21(4): 343–349. URL: https://doi.org/10.1037/h0046900.
Dellaert F., Polzin T., Waibel A. 1996. Recognizing emotion in speech. Proceedings of the 4th Int. Conf. Spoken Lang. Process (ICSLP). pp. 1970–1973.
Eyben F., Weninger F., Gross F., et al. 2013. Recent developments in opensmile, the munich open-source multimedia feature extractor. Proceedings of ACM International Conference on Multimedia. pp. 835–838.
Kim J., Truong K.P., Englebienne G., et al. 2017. Learning spectro-temporal features with 3D CNNs for speech emotion recognition. Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction (ACII). pp. 383–388.
Kruse R., Borgelt C., Klawonn F., et al. 2022. Multi-layer perceptrons. Computational Intelligence. Springer, Cham. pp. 53-124.
Lefter I., Burghouts G.J., Rothkrantz L.J.M. 2014. An audio-visual dataset of human–human interactions in stressful situations. Journal on Multimodal User Interfaces. 8(1): 29-41.
Lefter I., Jomker C.M., Tuente S.K., et al. 2017. NAA: A multimodal database of negative affect and aggression. Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE. pp. 21-27.
Lefter I., Rothkrantz L.J.M., Burghouts G., et al. 2011. Addressing multimodality in overt aggression detection. Proceedings of the International Conference on Text, Speech and Dialogue. Springer, Berlin, Heidelberg. pp. 25-32.
Makarova V. 2000. Acoustic cues of surprise in Russian questions. Journal of the Acoustical Society of Japan (E), 21 (5): 243-250.
Neiberg D., Elenius K., Laskowski K. 2006. Emotion recognition in spontaneous speech using GMMs. Proceedings of the 9th Int. Conf. Spoken Lang. Process. pp. 809– 812.
Nogueiras A., Moreno A., Bonafonte A., et al. 2001. Speech emotion recognition using hidden Markov models. Proceedings of the 7th Eur. Conf. Speech Commun. Technol. pp. 746–749.
Perepelkina O., Kazimirova E., Konstantinova M. 2018. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. Proceedings of the International Conference on Speech and Computer. Springer, Cham. pp. 501-510.
Raudys Š. 2003. On the universality of the single-layer perceptron model. Neural Networks and Soft Computing. Physica. Heidelberg. pp. 79-86.
Sainath T.N., Vinyals O., Senior A., et al. 2015. Convolutional, long short-term memory, fully connected deep neural networks. Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 4580–4584.
Schuller B.W., Batliner A., Bergler C., et al. 2021. The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates. Proceedings of Interspeech. pp. 431–435.
Abstract views: 77
Share
Published
How to Cite
Issue
Section
Copyright (c) 2023 Economics. Information Technologies
This work is licensed under a Creative Commons Attribution 4.0 International License.