VII 1
Детаљи сесије / Session details
VII 1
08.06.2026. 09:00–11:00
Председавајући / ChairMiljan Vučetić, Nemanja Ilić
Институција / InstitutionVlatacom Institute of High Technology, Belgrade, Serbia
- VII1.1An Analysis of the Performance of Models for Direct Serbian Speech-to-English TranslationКључне речи / Keywords: automatic speech translation, automatic speech recognition, machine translation, whisper, seamlessm4t
Апстракт / Abstract
Automatic speech translation has become an important area
of research with the development of deep learning and large
multi-lingual models. In this study, we investigate
Serbian-to-English speech translation using Whisper and
SeamlessM4T models. We evaluate the performance in three
different scenarios: zero-shot speech translation,
cascaded approach where ASR models are fine-tuned on
Serbian using the Južne Vesti and ParlaSpeech corpora, and
translation is performed with both proprietary (GPT-4o) and
open-source models (Salamandra-7B-Instruct and
EuroLLM-1.7B) and finally, we explore end-to-end adaptation
by fine-tuning models directly on a parallel
Serbian–English cor-pus for speech-to-text translation.
Evaluation on the Južne Vesti test set using BLEU, METEOR,
and WER metrics shows that direct fine-tuning improves
translation performance compared to zero-shot and cascaded
approaches, while open-source trans-lation models could
provide a viable alternative when proprie-tary systems are
unavailable. These results highlight the im-portance of
task-specific adaptation and parallel speech–translation
data in improving translation quality for low-resource
languages like Serbian. - VII1.2To slide or to snip? LLM fine-tuning for sentiment analysis of long Serbian movie reviewsКључне речи / Keywords: Sentiment analysis, Serbian NLP, BERTić, Document-level classification, SerbMR
Апстракт / Abstract
We establish the first fine-tuned transformer results on
the SerbMR dataset, evaluating strategies for long-form
sentiment analysis in Serbian, a morphologically rich
language. Using BERTić, a widely used large language model
for Serbian and closely related languages, we compare two
strategies for handling the limited context window:
truncation (tail and head) and sliding-window-based
aggregation (mean pooling, max pooling, majority voting,
and a rule-based heuristic). Evaluation is performed across
binary (2-class) and ternary (3-class) classification
schemes. The fine-tuned BERTić achieves 93.76% accuracy on
the binary task and 74.55% on the ternary task, an
improvement of approximately 9 and 12.5 percentage points
respectively over the best reported results of linear
classifiers. The sliding window with mean pooling proves to
be the strongest overall strategy, most notably on the
harder ternary task, but its 45% runtime overhead offers
only marginal gains over head truncation in the binary
setting. The consistent advantage of head over tail
truncation across all conditions suggests that evaluative
language in Serbian movie reviews concentrates towards the
end of the text. - VII1.3Fuse-T Gated Residual Late Fusion of Text Semantics and Thread Topology for Unseen-Event Rumour Classification in Conversational Reply GraphsКључне речи / Keywords: rumour detection, early classification, multi-modal fusion, graph neural networks, GraphSAGE, RoBERTa, PHEME
Апстракт / Abstract
Rumours on social media spread rapidly during
breaking events, while early evidence is often sparse,
noisy, and
highly event-specific. Text-only classifiers can overfit to
keywords
and writing style tied to particular events, while
structure-only
propagation models struggle when discussion trees are
small. This
paper presents Fuse-T, a gated residual late-fusion
architecture for
binary rumour classification on conversational reply
graphs. FuseT uses a pre-trained language model as a stable
semantic backbone
and injects graph propagation cues as a controlled additive
residual. A learned element-wise gate modulates the injected
topology signal and is initialized near zero to avoid
covariate shift
on the text classifier. We evaluate Fuse-T on
leave-one-event-out
(LOEO) generalization across seven events from the PHEME
collection. Across events, Fuse-T improves average Macro-F1
from
62.13% (text-only RoBERTa) and 62.42% (text-attributed GNN
baseline) to 65.68%. In an early-detection setting using
only the
first 10 minutes of replies, Fuse-T retains robust
performance
(about 63% Macro-F1), while structure-only models degrade
substantially. - VII1.4Evaluation of VGG16-Based Transfer Learning Strategies for Pollen Classification from Reconstructed Digital HologramsКључне речи / Keywords: Machine learning, transfer learning, image classification, digital holography, pollen grains
Апстракт / Abstract
Reliable pollen classification is an essential step in
estimating airborne pollen concentration, a key indicator
for environmental monitoring and allergy forecasting. This
study addresses the problem of classifying six pollen taxa
from reconstructed digital holographic images, as a step
toward potential improvement of automated concentration
assessment systems. Four training strategies were
evaluated: three VGG models were trained end-to-end, one
trained with randomly initialized weights, another
initialized with frozen pretrained ImageNet weights and
third fine-tuned entirely by allowing all layers to update
during training. The fourth fine-tuned VGG was employed as
feature extractor that was then combined with a support
vector machine (SVM) classifier. Despite strong
morphological similarities among certain pollen taxa, all
models showed promising results, with fine-tuned strategies
achieving the best performances, demonstrating the
capability of deep learning for accurate pollen
classification applicable to real-time monitoring systems.
The experiments confirmed a high degree of robustness and
generalization, paving the way for the development of
enhanced methodologies that can further improve the
reliability of the classification and expand to a larger
number of pollen taxa. - VII1.5Automated Chemical Vulnerability Assessment of Canvas Paintings from XRF Spectral Imaging Using Deep Learning and Foundation ModelsКључне речи / Keywords: XRF, Spectral Imaging, Deep Learning
Апстракт / Abstract
This paper presents the design of an automated
pipeline for chemical vulnerability assessment of canvas
paintings
from XRF spectral imaging. The design is based on the
recognition of the materials used (from XRF data) and on the
physico-chemical prediction of time- and condition-dependent
degradation of these materials. The pipeline that processes
XRF
spectra and, for each element, calculates the signal
intensity by
integrating within a window around the corresponding peak,
with
the linearly interpolated background below the peak
subtracted,
was developed. The two-dimensional maps of the elements
(calcium, titanium, iron, copper, and lead-single emission
line)
were created using the resulting peak intensities. The
Pearson
correlation coefficients were calculated for two
independent scans
on detector 10264 to evaluate the reproducibility of the
inputs.
The coefficients for relevant elements (Ca, Ti, Fe, Cu, Pb)
achieve r > 0.98, confirming excellent quantitative
reproducibility
and that the instrumental setup remains stable over time.
The
system chains self-supervised denoising, physics-based
element
extraction, NMF decomposition, a literature-grounded CVI,
and
SAM-based region segmentation—requiring no expert-labeled
training data. Thirteen segments were automatically
identified by
SAM, and six per-region mean CVI values for the highest-risk
segments were labeled. The whole procedure typically
requires
extensive expert analysis, while using this pipeline, the
process
takes under 70 seconds (on an Intel Core i5-12450H with 16GB
RAM) - VII1.6Robustness of Graph Neural Networks under Structural and Feature CorruptionsКључне речи / Keywords: GNN, robustness, graph corruptions, inference-time evaluation, node classification, homophily
Апстракт / Abstract
GNNs are successfully used in various tasks and domains
involving working with graph data. However, their
robustness under realistic, non-adversarial corruptions is
underexplored. In this paper, we provide a systematic
evaluation of GNNs under structural and feature corruptions
at different levels of severity. To provide a more
comprehensive evaluation, four models are used: GCN, GAT,
GraphSAGE, and GIN, and four datasets: Cora, CiteSeer,
PubMed, and OGBN-ArXiv. To ensure our findings are
statistically meaningful, a rigorous statistical evaluation
is conducted with multiple seeds, including the Wilcoxon
signed-rank tests for pairwise comparison,
Benjamini-Hochberg false discovery rate, and effect sizes.
Results indicate that feature corruptions are more damaging
to the model performance than structural ones under
inference-time corruptions. Robustness of GNNs is driven
more by corruption type and severity than by the model
architecture. Also, homophily and calibration have
statistically significant but limited correlation with
predictive performance. Therefore, to properly evaluate
GNNs' reliability, clean-test accuracy is insufficient;
rather, the evaluation of model performance under diverse
corruption settings using a statistically rigorous
procedure is necessary. - VII1.7Flow Matching Policy for Behavioral CloningКључне речи / Keywords: behavioral cloning, flow matching policy, Gaussian policy, diffusion policy
Апстракт / Abstract
Behavioral cloning (BC) is a foundational imitation
learning paradigm, but many standard continuous-control BC
baselines rely on unimodal Gaussian policies or other
relatively low-expressivity action parameterizations.
Consequently, they struggle to capture the complex,
multi-modal strategies present in diverse offline datasets,
such as those containing human, medium-quality, or mixed
trajectories, leading to a significant performance gap. To
address this limitation, we introduce the Flow Matching
Policy (FMP), a highly expressive representation for
continuous control BC. Our approach models the conditional
action distribution as a continuous-time normalizing flow,
learning an observation-conditioned velocity field to
transport a simple base noise distribution into the
empirical action distribution. Evaluations against strong
Gaussian and diffusion policy baselines across standard
continuous control benchmarks demonstrate that the FMP
consistently achieves competitive or superior performance.
These results suggest that continuous-time flow models are
a promising alternative for capturing highly complex and
varied behaviors from noisy data. - VII1.8Metaheuristic Optimization of Boosting and Hybrid Machine Learning Models for IoT Intrusion Detection: A ReviewКључне речи / Keywords: IoT, cyber security, intrusion detection, AI, metaheuristics
Апстракт / Abstract
The rapid expansion of Internet of Things (IoT)
infrastructures has significantly increased the attack
surface of
modern digital ecosystems. Due to constrained computational
resources and heterogeneous device architectures,
traditional
intrusion detection mechanisms are often inadequate for IoT
environments. Machine learning techniques have emerged as
efficient solutions for intelligent intrusion detection
systems, while
metaheuristic optimization algorithms have demonstrated sub
stantial improvements in feature selection and
hyperparameter
tuning processes.
This paper provides a comprehensive review of metaheuristic
optimized machine learning approaches for IoT intrusion de
tection. Existing methods are systematically categorized
based
on optimization strategy, learning architecture, and
application
domain. Particular emphasis is placed on boosting-based
models,
deep learning frameworks, and hybrid multi-level
optimization
strategies. Furthermore, commonly used datasets, evaluation
methodologies, and performance trends are analyzed to
identify
current research directions and limitations. The study
highlights
emerging trends, including domain-specific IoT applications
and hybrid optimization frameworks, while identifying open
challenges related to benchmarking, deployment efficiency,
and
generalization across heterogeneous IoT scenarios. - VII1.9A comparative study of KAN and Neural ODE models for LR-DDoS attack detection in IoT networksКључне речи / Keywords: Low-rate DDoS, neural networks, KAN, Neural ODE, IoT, datasets
Апстракт / Abstract
In recent years, cybersecurity has become a critical aspect
of modern network systems, as Internet of Things (IoT)
networks continue to grow rapidly across various
application domains. While these advancements bring
significant benefits in terms of connectivity and
functionality, they also introduce increased security and
privacy risks. One of the most challenging threats is the
Low-Rate DDoS (LR-DDoS) attack, which can cause significant
damage to target systems using minimal resources and
traffic, typically representing only a small portion of
total network activity, making detection highly difficult.
In this paper, we propose and evaluate two models, a
modified Kolmogorov–Arnold Network (KAN) and a Neural
Ordinary Differential Equation (Neural ODE), under the same
experimental conditions. Experiments are conducted on the
CICIoT2023 dataset using a 5-fold cross-validation strategy
and standard evaluation metrics including Accuracy,
Precision, Recall, and F1-score. The results show that both
models achieve high and stable performance, with accuracy
above 96% and strong generalization capabilities. Overall,
the study demonstrates the effectiveness of both approaches
while highlighting their different trade-offs in detection
performance and computational efficiency.
