Publications | Yiran Pang

2026

Decoupling Shared and Personalized Knowledge: A Dual-Branch Federated Learning Framework for Multi-Domain with Non-IID Data

Yiran Pang, Zhen Ni, and Xiangnan Zhong

In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-26), 2026

Abs Code

Federated learning (FL) enables collaborative model training without centralizing data. In multi-domain scenarios with non-identically and independently distributed (non-IID) data, prediction performance is often hindered by catastrophic forgetting of specialized local knowledge and negative transfer from conflicting client updates. To address these challenges, we propose a personalized FL framework with dual-branch (pFedDB) structure and a two-phase training protocol. The dual-branch architecture separates the model into a shared branch for cross-client aggregation and a private branch that remains on each local client. The private branch is never overwritten by server updates, which prevents the catastrophic forgetting of domain-specific knowledge. This structure also significantly reduces communication overhead per round as only the shared branch is transmitted. To mitigate negative transfer, our two-phase protocol first establishes a personalized knowledge anchor by training a single-branch expert model on each client’s local data. In the second phase, the locally trained model is cloned to initialize private and shared branches. Only the shared branch is aggregated in federated training. This process enables the shared branch to learn a general representation that complements the established local expertise. This design consistently improves the performance of every client over its single-domain baseline, overcoming the challenge of negative transfer among clients. Experiments on our new Chest-X-Ray-4 suite and three public benchmarks show that the proposed pFedDB method obtains 30% saving in communication overhead per round and competitive or better accuracy performance than recent FL methods.

2025

Is OpenVLA Truly Robust? A Systematic Evaluation of Positional Robustness

Yiran Pang, Yiheng Zhao, Zhuopu Zhou, Tingkai Hu, and Ranxin Hou

In Proceedings of the International Joint Conference on Natural Language Processing and Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP–AACL), 2025

Abs

Federated reinforcement learning (FedRL) enables multiple agents to collaboratively train a global policy without sharing raw data, making it ideal for privacy-sensitive applications. However, FedRL faces challenges in heterogeneous environments where differing state-transition dynamics lead to nonidentical input distributions and imbalanced parameter updates during aggregation. Therefore, this paper develops a personalized observation normalization (PON) method, allowing each agent to locally normalize raw state inputs using a continuously updated running mean and variance. This design ensures consistent scaling of local feature without overshadowing across agents during aggregation. Furthermore, we demonstrate that sharing normalization parameters across agents is ineffective due to the diverse local input distributions, which highlights the necessity of personalized statistics. Experiments on heterogeneous MuJoCo tasks show that our developed PON accelerates training and achieves superior performance compared to baseline methods.
Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

Yiran Pang, Zhen Ni, and Xiangnan Zhong

In 2025 International Joint Conference on Neural Networks (IJCNN), 2025

Abs

Federated reinforcement learning (FedRL) enables multiple agents to collaboratively train a global policy without sharing raw data, making it ideal for privacy-sensitive applications. However, FedRL faces challenges in heterogeneous environments where differing state-transition dynamics lead to non-identical input distributions and imbalanced parameter updates during aggregation. Therefore, this paper develops a personalized observation normalization (PON) method, allowing each agent to locally normalize raw state inputs using a continuously updated running mean and variance. This design ensures consistent scaling of local feature without overshadowing across agents during aggregation. Furthermore, we demonstrate that sharing normalization parameters across agents is ineffective due to the diverse local input distributions, which highlights the necessity of personalized statistics. Experiments on heterogeneous MuJoCo tasks show that our developed PON accelerates training and achieves superior performance compared to baseline methods.
A fast federated reinforcement learning approach with phased weight-adjustment technique

Yiran Pang, Zhen Ni, and Xiangnan Zhong

Neurocomputing, 2025

Abs DOI

Federated reinforcement learning (FRL) enables multiple agents to learn collaboratively without directly sharing their local data. This method addresses the data privacy concerns in the distributed systems. However, FRL faces challenges such as high communication costs, since it requires extensive interactions to achieve satisfied performance. Therefore, this paper develops a fast FRL method with a dynamic aggregation coefficient to reduce the communication load during the learning process. Diverging from traditional FRL techniques which rely on static averaging, our approach begins by setting the initial aggregation coefficient to the logarithm of the number of participating agents. This elevation can enhance the early integration of updates from distributed agents and facilitate a rapid initial learning phase. As communications progress, the aggregation coefficient linearly decreases, transitioning to an average aggregation by the end of the specified interval. This gradual reduction aligns individual learning updates more closely over time, shifting towards a unified global learning model. Furthermore, we implement a value-clipping strategy to constrain global updates within a predefined safe range, thus safeguarding against the potential overflow issues. The aggregation coefficient stabilizes after the initial aggressive integration phase to ensure the training stability. The boundedness analysis of the model aggregation confirms that, despite the high initial coefficient, the parameters of the global model remain within the manageable limits on the FRL server. This strategy is applicable to both tabular and deep learning methods. We validate the designed algorithm on navigation and control tasks, including heterogeneous environments where distinct state transitions and dynamics are designed for each agent. The experimental results demonstrate that our proposed approach achieves faster convergence across various environments.
Integration of a new layer normalization process into federated reinforcement learning for environments with heterogeneous attribute spaces

Yiran Pang, Zhen Ni, and Xiangnan Zhong

In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications VII, SPIE, 2025

Abs DOI

Reinforcement learning method across multiple environments requires coordination across different domains, each with distinct conditions and resource constraints. The system must integrate information from various environments while ensuring data privacy and security. Federated reinforcement learning (FedRL) provides a practical distributed framework by enabling models to be trained without sharing raw data. This approach not only protects data privacy but also reduces communication overhead. However, applying FedRL still faces challenges. The heterogeneity among multiple environments often leads to data shifts, resulting in decreased performance after parameter aggregation. To address this instability, we incorporate layer normalization into FedRL framework. Each agent computes normalization statistics based on intermediate features within its neural network, and the server periodically aggregates the learnable affine transformation parameters from all agents. Agents then apply the global parameters to ensure consistent feature scaling and shifting. This approach mitigates the instability caused by distribution shifts in heterogeneous real-world environments. To evaluate our method, we design experiments that simulate real-world challenges by introducing random and designated color schemes in CarRacing to create heterogeneous settings. The results show that incorporating layer normalization into the FedRL framework accelerates training convergence and yields higher cumulative rewards in heterogeneous environments.

2024

Federated Learning for Crowd Counting in Smart Surveillance Systems

Yiran Pang, Zhen Ni, and Xiangnan Zhong

IEEE Internet of Things Journal, Feb 2024

Abs DOI

Crowd counting in smart surveillance systems plays a crucial role in Internet of Things (IoT) and smart cities, and can affect various aspects, such as public safety, crowd management, and urban planning. Using surveillance data to centrally train a crowd counting model raises significant privacy concerns. Traditional methods try to alleviate the concern by reducing the focus on individuals, but the concern still needs to be thoroughly resolved. In this work, we develop a horizontal federated learning (HFL) framework to train the crowd counting models which can preserve privacy simultaneously. This framework enables the smart surveillance system to learn from model aggregation without accessing the private data stored on local devices. Therefore, it eliminates the need for video data transmission, reduces communication costs, and avoids raw data leakage. Due to the lack of federated learning (FL) crowd counting data sets, we design four non-independent and identically distributed (non-IID) partitioning strategies, including feature-skew, quantity-skew, scene-skew, and time-skew, to simulate real-world FL scenarios. In addition, we present an efficient fully convolutional network (e-FCN) for each client to demonstrate the practical applicability of the proposed framework. The e-FCN adopts an encoder-decoder architecture with fewer parameters, making it communication-friendly and easier to train. This design can achieve competitive performance compared to more complex models in surveillance crowd counting in literature. Finally, we evaluate the proposed HFL framework with e-FCN under our skew strategies on multiple real-world data sets, including crowd surveillance, ShanghaiTech PartB, WorldExpo’10, FDST, CityUHK-X, UCSD, and MALL. Extensive experiments allow us to present our developed Federated Crowd Counting benchmark as a reference for future research and provide guidance for FL algorithm selection in smart surveillance system deployment.
A Perspective-Embedded Scale-Selection Network for Crowd Counting in Public Transportation

Jun Yi, Yiran Pang, Wei Zhou, Meng Zhao, and Fujian Zheng

IEEE Transactions on Intelligent Transportation Systems, May 2024

Abs DOI

Crowd counting in congested urban transport systems is a highly challenging task for computer vision and deep learning due to several factors such as mutual occlusion, perspective change, and large-scale variations. In this paper, a novel perspective-embedded scale-selection multi-column network named PESSNet is proposed for crowd counting and high-quality density maps generation. The proposed method aligns the branches to various scales by leveraging different receptive fields, and utilizes perspective parameters to adjust the sensitivity of each branch to different perspective areas in the scene. Specifically, the PESSNet consists of four key components: 1) feature pyramid network (FPN) fuses multi-stage features extracted from the backbone network; 2) scale-selection dilated layer (SSDL) extracts features by using different dilated convolution kernels for each stage; 3) perspective-embedded fusion layer (PEFL) encodes the spatial perspective relationships across all feature levels into the network and provides a more effective fine-grained weight map; and 4) density maps generator (DMG) employs deconvolution layer as a decoder to generate high-quality density maps. The above strategies maximizes the ability of multi-column network to extract the features of instances with various scales. Extensive experiments on seven crowd counting benchmark datasets, JHU-CROWD, ShanghaiTech, UCF-QNRF, ShanghaiTechRGBD, WorldEXPO’10, TRANCOS, and NWPU-Crowd indicate that PESSNet achieves reliable recognition performance and high robustness in difference crowd counting.
MSDCNN: A multiscale dilated convolution neural network for fine-grained 3D shape classification

Wei Zhou, Fujian Zheng, Yiheng Zhao, Yiran Pang, and Jun Yi

Neural Networks, May 2024

Abs DOI

Multi-view deep neural networks have shown excellent performance on 3D shape classification tasks. However, global features aggregated from multiple views data often lack content information and spatial relationship, which leads to difficult identification the small variance among subcategories in the same category. To solve this problem, in this paper, a novel multiscale dilated convolution neural network termed as MSDCNN is proposed for multi-view fine-grained 3D shape classification. Firstly, a sequence of views are rendered from 12-viewpoints around the input 3D shape by the sequential view capturing module. Then, the first 22 convolution layers of ResNeXt50 is employed to extract the semantic features of each view, and a global mixed feature map is obtained through the element-wise maximum operation of the 12 output feature maps. Furthermore, attention dilated module (ADM), which combines four concatenated attention dilated block (ADB), is designed to extract larger receptive field features from global mixed feature map to enhance context information among the views. Specifically, each ADB is consisted by an attention mechanism module and a dilated convolution with different dilation rates. In addition, prediction module with label smoothing is proposed to classify features, which contains 3 × 3 convolution and adaptive average pooling. The performance of our method is validated experimentally on the ModelNet10, ModelNet40 and FG3D datasets. Experimental results demonstrate the effectiveness and superiority of the proposed MSDCNN framework for 3D shape fine-grained classification.
LWUAVDet: A Lightweight UAV Object Detection Network on Edge Devices

Xuanlin Min, Wei Zhou, Rui Hu, Yinyue Wu, Yiran Pang, and Jun Yi

IEEE Internet of Things Journal, Jul 2024

Abs DOI

Real-time object detection on unmanned aerial vehicles (UAVs) poses a challenging issue due to the limited computing resources of edge devices. To address this problem, we propose a novel lightweight object detection network named LWUAVDet for real-time UAV applications. The detector comprises three core components: E-FPN, PixED Head, and Aux Head. First, we develop an extended and refined topology in the Neck layer, called E-FPN, to enhance the multiscale representation of each stage and alleviate the aliasing effect caused by the repetitive feature fusion of the Neck. Second, we propose a pixel encoder and decoder for dimension exchange between space and channel to achieve flexible and effective feature extraction in the Head layer, named PixED Head. Furthermore, Aux Head for the auxiliary task merely using the Head layer is presented for online distillation to enhance feature representation. Specially, in Aux Head, we introduce the weighted sum of Focal Loss and complete intersection over union loss for the cost matrix of the sample assigner to alleviate category imbalance and aspect ratio imbalance of the UAV data. The performance of our LWUAVDet is validated experimentally on the NVIDIA Jetson Xavier NX and Jetson Nano GPU devices. Extensive experiments demonstrate that the LWUAVDet models achieve a better tradeoff between accuracy and latency on VisDrone, UAVDT, and VOC2012 data sets compared to state-of-the-art lightweight models.
Adaptable and Reliable Text Classification using Large Language Models

Zhiqiang Wang, Yiran Pang, Yanbin Lin, and Xingquan Zhu

In 2024 IEEE International Conference on Data Mining Workshops (ICDMW), Dec 2024

Abs DOI

Text classification is fundamental in Natural Language Processing (NLP), and the advent of Large Language Models (LLMs) has revolutionized the field. This paper introduces an adaptable and reliable text classification paradigm, which leverages LLMs as the core component to address text classification tasks. Our system simplifies the traditional text classification workflows, reducing the need for extensive preprocessing and domain-specific expertise to deliver adaptable and reliable text classification results. We evaluated the performance of several LLMs, machine learning algorithms, and neural network-based architectures on four diverse datasets. Results demonstrate that certain LLMs surpass traditional methods in sentiment analysis, spam SMS detection, and multi-label classification. Furthermore, it is shown that the system’s performance can be further enhanced through few-shot or fine-tuning strategies, making the fine-tuned model the top performer across all datasets. Source code and datasets are available in this GitHub repository: https://github.com/yeyimilk/llm-zero-shot-classifiers.

2023

YOLOTrashCan: A Deep Learning Marine Debris Detection Network

Wei Zhou, Fujian Zheng, Gang Yin, Yiran Pang, and Jun Yi

IEEE Transactions on Instrumentation and Measurement, 2023

Abs DOI

Monitoring marine debris has long been a challenging issue owing to the complex and changeable underwater environment. To fast and accurately detect marine debris, in this article, a novel object detection network termed as YOLOTrashCan is proposed for detecting underwater marine debris. The YOLOTrashCan model consists of feature enhancement and feature fusion. In the feature enhancement part, the ECA_DO-Conv_CSPDarknet53 backbone, which combines efficient channel attention (ECA) module and depthwise over-parameterized convolutional (DO-Conv), is proposed to extract the depth semantic features of marine debris. In the feature fusion part, the DPMs_PixelShuffle_PANET module is presented to improve the detection ability for marine debris, where dilated parallel modules (DPMs) with multiscale dilated rate are designed as enhanced feature modules for different scale objects of marine debris. Notably, the size of the network is only 214 MB using the DPMs’ method. Extensive experiments and thorough analysis are validated on the TrashCan 1.0 dataset. Experimental results show that the proposed algorithm not only improves the detection accuracy of underwater marine debris but also reduces the size of the network model.
A Multi-Scale Spatio-Temporal Network for Violence Behavior Detection

Wei Zhou, Xuanlin Min, Yiheng Zhao, Yiran Pang, and Jun Yi

IEEE Transactions on Biometrics, Behavior, and Identity Science, Apr 2023

Abs DOI

Violence behavior detection has played an important role in computer vision, its widely used in unmanned security monitoring systems, Internet video filtration, etc. However, automatically detecting violence behavior from surveillance cameras has long been a challenging issue due to the real-time and detection accuracy. In this brief, a novel multi-scale spatio-temporal network termed as MSTN is proposed to detect violence behavior from video stream. To begin with, the spatio-temporal feature extraction module (STM) is developed to extract the key features between foreground and background of the original video. Then, temporal pooling and cross channel pooling are designed to obtain short frame rate and long frame rate from STM, respectively. Furthermore, short-time building (STB) branch and long-time building (LTB) branch are presented to extract the violence features from different spatio-temporal scales, where STB module is used to capture the spatial feature and LTB module is used to extract useful temporal feature for video recognition. Finally, a Trans module is presented to fuse the features of STB and LTB through lateral connection operation, where LTB feature is compressed into STB to improve the accuracy. Experimental results show the effectiveness and superiority of the proposed method on computational efficiency and detection accuracy.
Counting manatee aggregations using deep neural networks and Anisotropic Gaussian Kernel

Zhiqiang Wang, Yiran Pang, Cihan Ulus, and Xingquan Zhu

Scientific Reports, Apr 2023

Abs DOI

Manatees are aquatic mammals with voracious appetites. They rely on sea grass as the main food source, and often spend up to eight hours a day grazing. They move slow and frequently stay in groups (i.e. aggregations) in shallow water to search for food, making them vulnerable to environment change and other risks. Accurate counting manatee aggregations within a region is not only biologically meaningful in observing their habit, but also crucial for designing safety rules for boaters, divers, etc., as well as scheduling nursing, intervention, and other plans. In this paper, we propose a deep learning based crowd counting approach to automatically count number of manatees within a region, by using low quality images as input. Because manatees have unique shape and they often stay in shallow water in groups, water surface reflection, occlusion, camouflage etc. making it difficult to accurately count manatee numbers. To address the challenges, we propose to use Anisotropic Gaussian Kernel (AGK), with tunable rotation and variances, to ensure that density functions can maximally capture shapes of individual manatees in different aggregations. After that, we apply AGK kernel to different types of deep neural networks primarily designed for crowd counting, including VGG, SANet, Congested Scene Recognition network (CSRNet), MARUNet etc. to learn manatee densities and calculate number of manatees in the scene. By using generic low quality images extracted from surveillance videos, our experiment results and comparison show that AGK kernel based manatee counting achieves minimum Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The proposed method works particularly well for counting manatee aggregations in environments with complex background.
Large language models are zero-shot text classifiers

Zhiqiang Wang, Yiran Pang, and Yanbin Lin

arXiv preprint arXiv:2312.01044, Apr 2023

Abs DOI

Retrained large language models (LLMs) have become extensively used across various sub-disciplines of natural language processing (NLP). In NLP, text classification problems have garnered considerable focus, but still faced with some limitations related to expensive computational cost, time consumption, and robust performance to unseen classes. With the proposal of chain of thought prompting (CoT), LLMs can be implemented using zero-shot learning (ZSL) with the step by step reasoning prompts, instead of conventional question and answer formats. The zero-shot LLMs in the text classification problems can alleviate these limitations by directly utilizing pretrained models to predict both seen and unseen classes. Our research primarily validates the capability of GPT models in text classification. We focus on effectively utilizing prompt strategies to various text classification scenarios. Besides, we compare the performance of zero shot LLMs with other state of the art text classification methods, including traditional machine learning methods, deep learning methods, and ZSL methods. Experimental results demonstrate that the performance of LLMs underscores their effectiveness as zero-shot text classifiers in three of the four datasets analyzed. The proficiency is especially advantageous for small businesses or teams that may not have extensive knowledge in text classification.

2022

Crowd Density Analysis Method in Scenic Spot Based on Multi-Source Feature Fusion

Yiran Pang

Chongqing University of Science and Technology, Apr 2022

DOI

2021

An Improved MVCNN for 3D Shape Recognition

Yan Wang, Wanxia Zhong, Hang Su, Fujiang Zheng, Yiran Pang, Hongchuan Wen, and Kun Cai

In 2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT), Nov 2021

Abs DOI

The multi-view convolutional neural network architecture represented by MVCNN has achieved great success in 3D shape recognition. Taking the MVCNN architecture as the research goal, this paper proposes a novel 3D shape recognition convolutional neural network Attention-MVCNN that integrates channel attention mechanism, residual structure and Mish activation function. The channel attention machine is used to make the feature extraction network for Attention-MVCNN, which can reduce the feature redundancy caused by traditional convolution. The residual structure can reduce the network over-fitting problem and achieve better gradient information, thereby improving the performance of Attention-MVCNN. We replace the activation function in the Attention-MVCNN network with Mish, a self-regular non-monotonic neural activation function. The smooth activation function allows better information to penetrate the neural network, resulting in better accuracy and generalization. Experiments show that the improved Attention-MVCNN attains the competitive results on ModelNet40 dataset.