AKI1
Детаљи сесије / Session details
AKI1
09.06.2026. 09:00–11:00
Председавајући / ChairMiomir Mijić, Dejan Ćirić,
Институција / InstitutionUniverzitet u Beogradu - Elektrotehnički fakultet, Beograd, Srbija | Univerzitet u Nišu - Elektronski fakultet, Niš, Srbija
- AKI1.1Advancing Assistive Speech Technologies for Inclusion Without Linguistic BarriersКључне речи / Keywords: speech technologies, assistive technologies, text-to-speech (TTS) systems, screen readers, augmentative and alternative communication (AAC)
Апстракт / Abstract
Advancements in assistive speech technologies have
revolutionized communication for people with disabilities
worldwide, advancing their inclusion in society. Their
application for less-resourced languages remains a
challenge due to the limited amount of data available for
training speech synthesis models. This makes it difficult
to ensure adequate representation and inclusion for
speakers of these languages. The problem is more pronounced
for children who require age-appropriate text-to-speech
(TTS) voices to serve as their personal identity in
augmentative and alternative communication (AAC) devices.
To bridge this gap, the Speech Group at FEEIT, UKIM is
spearheading the development of speech synthesis voices.
Key achievements include the creation of "Suze," a free,
high-quality TTS voice targeting on-device off-line use,
specifically for the Cboard AAC app, but also for screen
readers for mobile and PC. Expanding on this success, the
team recently developed child voices for Macedonian and
other European languages for AAC within the VoiceKids
project. Through ongoing research and long-standing
partnerships with the assistive technology organizations,
we drive regional innovation, ensuring that linguistic
barriers do not hinder accessibility or social inclusion in
the digital age. - AKI1.2Direct Localization of Acoustic Impulse Sources in Outdoor Environments: Experimental ValidationКључне речи / Keywords: Direct localization, TDOA method, microphone arrays, ambiguity function, association problem
Апстракт / Abstract
Results of experimental validation of method for
direct localization of acoustic impulse sources in outdoor
environment,
proposed previously by the authors are presented.
Based on the ambiguity function formulated for the
collocated
and distributed antenna arrays, the ambiguity function of
the
microphone array was formulated as a tool for
characterization
of the non-uniform geometry of the microphone array in terms
of ambiguous grating lobes and side lobe levels. - AKI1.3Autoregressive Modeling of Drone Noise SignalsКључне речи / Keywords: autoregressive modeling, drone noise, noise synthesis, stochastic signal modeling, spectral analysis, time-frequency analysis
Апстракт / Abstract
Drone noise is a growing acoustic issue as unmanned aerial
vehicles become more common in urban and suburban areas,
prompting methods for analysis, modeling, and synthesis.
This paper presents a parametric approach to modeling and
synthesizing drone noise signals using autoregressive (AR)
techniques. It examines two methods: short-time AR modeling
and subband AR modeling through filter-bank decomposition.
The short-time method captures temporal changes by adapting
parameters for each frame, while the subband method models
frequency-specific structures by applying separate AR
models to each spectral band. Both approaches produce
synthetic signals by exciting the estimated models with
white Gaussian noise. The results demonstrate that these
techniques effectively replicate the broadband
characteristics and overall spectral envelope of drone
noise. The subband method more accurately captures
frequency-localized details, whereas the short-time method
efficiently models temporal variations. Although small
smoothing occurs in fine spectral features, both methods
closely resemble the original signals. Overall, AR-based
modeling offers a computationally efficient and physically
meaningful means for drone noise analysis and synthesis,
with practical applications in acoustic simulation and
signal processing. - AKI1.4A CNN-Based Bimodal Speech Recognition Framework with MFFCC and TEMFCC FeaturesКључне речи / Keywords: Speech recognition, Whisper, Convolutional neural networks, Mel scale, Teager energy operator, Cepstral means subtraction
Апстракт / Abstract
This paper presents the results of bimodal speech
recognition (normal and whispered speech) under specific
conditions. The front end of the Automatic Speech
Recognition (ASR) system is based on both Mel Frequency
Cepstral Coefficients (MFCC) and Teager Energy Mel
Frequency Cepstral Coefficients (TEMFCC). The back end
employs Convolutional Neural Networks (CNNs) for
classification. Speech samples are taken from the Whi-Spe
database, and Cepstral Mean Subtraction (CMS) is applied as
a standard normalization technique. The results are
presented through tables and histograms, enabling a
comparison of recognition performance between MFCC and
TEMFCC features under both matched and mismatched
conditions. - AKI1.5Measurement of Sound Absorption Coefficient under Anechoic Conditions by Transfer Function MethodКључне речи / Keywords: Sound absorption coefficient, Anechoic measurements, Acoustic material characterization
Апстракт / Abstract
The sound absorption coefficient is a fundamental parameter
for characterizing acoustic materials, widely used in
building acoustics, noise control engineering, and material
design. Its accurate determination, however, remains a
challenging task due to the strong dependence on
measurement conditions, sound field characteristics, and
experimental assumptions. Traditional methods, such as
reverberation room and impedance tube techniques, provide
standardized but often limited information, particularly
with respect to angle-dependent behavior and intrinsic
material properties.
This paper provides an overview of existing approaches for
measuring the sound absorption coefficient, with particular
emphasis on free-field and anechoic conditions. Key
challenges, including low-frequency limitations, finite
sample effects, and sensitivity to measurement geometry,
are discussed. A measurement methodology based on a
semi-arc configuration is then introduced. The proposed
system employs a swept-sine excitation and a transfer
function approach using three microphones to estimate the
reflection properties of the material. Two realizations of
the semi-arc measurement setup are presented, enabling
flexible control of incidence angles and improved
characterization capabilities. The paper aims to contribute
to the ongoing development of reliable and versatile
measurement techniques for sound absorption under
controlled acoustic conditions. - AKI1.6Bee Sound Acquisition in a Natural EnvironmentКључне речи / Keywords: Bee sound acquisition., MEMS microphones., Raspberry Pi., Signal processing., Acoustic feature extraction., Melspectrogram.
Апстракт / Abstract
The preservation of bee populations is essential for
maintaining global ecological balance and agricultural
productivity. Given that traditional hive monitoring
techniques are
invasive and labor-intensive, acoustic monitoring has
emerged as a
superior non-invasive alternative. This paper presents the
development and field implementation of an autonomous system
for bee sound acquisition and advanced signal
characterization.
The hardware architecture is based on a Raspberry Pi 4
platform
integrated with high-precision MEMS microphones,
specifically
designed for continuous operation in natural, outdoor
environments. The study details a robust digital signal
processing
pipeline, including multi-stage filtering, signal
stabilization, and
the extraction of critical acoustic features such as
Zero-Crossing
Rate (ZCR), Root Mean Square (RMS) energy, and MelFrequency
Cepstral Coefficients (MFCC). By utilizing Melspectrograms
and spectral analysis, the system provides a detailed
time-frequency representation of the colony's acoustic
signatures.
The results demonstrate the effectiveness of the proposed
acquisition protocol in isolating biological signals from
environmental noise, establishing a reliable foundation for
longterm acoustic surveillance and biodiversity monitoring
in
apiculture. - AKI1.7Design and Development of a Serbian Voice-Controlled CalculatorКључне речи / Keywords: Voice calculator, Arithmetical operation, Mel scale, cepstral coefficients, speech recognition, Neural networks
Апстракт / Abstract
This paper presents the design and development of a
Serbian voice-controlled calculator. The system is
implemented
through several stages, including parsing the input
sentence to
identify operands and operators, speech recognition using
neural
networks, conversion of recognized speech into numerical
values,
and synthesis of the final spoken result. The developed
models
include digits (zero to nine) and the tree basic arithmetic
operations (addition, subtraction, multiplication) and one
relation (equal) in
Serbian. All speech signals are represented as feature
vectors
composed of 12 Mel-frequency cepstral coefficients (MFCCs).
The neural network architecture consists of two layers.
Each word is represented by
20 frames. The performance of the system is evaluated in
terms of
word recognition rate, with results presented in tables and
histograms.
