subscribe to arXiv mailings

Dark Experience for Incremental Keyword Spotting

Abstract: Spoken keyword spotting (KWS) is crucial for identifying keywords within audio inputs and is widely used in applications like Apple Siri and Google Home, particularly on edge devices. Current deep learning-based KWS systems, which are typically trained on a limited set of keywords, can suffer from performance degradation when encountering new domains, a challenge often addressed through few-shot f… ▽ More Spoken keyword spotting (KWS) is crucial for identifying keywords within audio inputs and is widely used in applications like Apple Siri and Google Home, particularly on edge devices. Current deep learning-based KWS systems, which are typically trained on a limited set of keywords, can suffer from performance degradation when encountering new domains, a challenge often addressed through few-shot fine-tuning. However, this adaptation frequently leads to catastrophic forgetting, where the model's performance on original data deteriorates. Progressive continual learning (CL) strategies have been proposed to overcome this, but they face limitations such as the need for task-ID information and increased storage, making them less practical for lightweight devices. To address these challenges, we introduce Dark Experience for Keyword Spotting (DE-KWS), a novel CL approach that leverages dark knowledge to distill past experiences throughout the training process. DE-KWS combines rehearsal and distillation, using both ground truth labels and logits stored in a memory buffer to maintain model performance across tasks. Evaluations on the Google Speech Command dataset show that DE-KWS outperforms existing CL baselines in average accuracy without increasing model size, offering an effective solution for resource-constrained edge devices. The scripts are available on GitHub for the future research. △ Less

Submitted 12 September, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

Comments: submitted ICASSP 2025

arXiv:2409.00369

An Empirical Study on Information Extraction using Large Language Models

Authors: Ridong Han, Chaohao Yang, Tao Peng, Prayag Tiwari, Xiang Wan, Lu Liu, Benyou Wang

Abstract: Human-like large language models (LLMs), especially the most powerful and popular ones in OpenAI's GPT family, have proven to be very helpful for many natural language processing (NLP) related tasks. Therefore, various attempts have been made to apply LLMs to information extraction (IE), which is a fundamental NLP task that involves extracting information from unstructured plain text. To demonstra… ▽ More Human-like large language models (LLMs), especially the most powerful and popular ones in OpenAI's GPT family, have proven to be very helpful for many natural language processing (NLP) related tasks. Therefore, various attempts have been made to apply LLMs to information extraction (IE), which is a fundamental NLP task that involves extracting information from unstructured plain text. To demonstrate the latest representative progress in LLMs' information extraction ability, we assess the information extraction ability of GPT-4 (the latest version of GPT at the time of writing this paper) from four perspectives: Performance, Evaluation Criteria, Robustness, and Error Types. Our results suggest a visible performance gap between GPT-4 and state-of-the-art (SOTA) IE methods. To alleviate this problem, considering the LLMs' human-like characteristics, we propose and analyze the effects of a series of simple prompt-based methods, which can be generalized to other LLMs and NLP tasks. Rich experiments show our methods' effectiveness and some of their remaining issues in improving GPT-4's information extraction ability. △ Less

Submitted 9 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

Comments: This submission was intended instead as the replacement of arXiv:2305.14450 , where it now appears as arXiv:2305.14450v2

arXiv:2408.15032 [pdf, other]

Mamba2MIL: State Space Duality Based Multiple Instance Learning for Computational Pathology

Authors: Yuqi Zhang, Xiaoqian Zhang, Jiakai Wang, Yuancheng Yang, Taiying Peng, Chao Tong

Abstract: Computational pathology (CPath) has significantly advanced the clinical practice of pathology. Despite the progress made, Multiple Instance Learning (MIL), a promising paradigm within CPath, continues to face challenges, particularly related to incomplete information utilization. Existing frameworks, such as those based on Convolutional Neural Networks (CNNs), attention, and selective scan space s… ▽ More Computational pathology (CPath) has significantly advanced the clinical practice of pathology. Despite the progress made, Multiple Instance Learning (MIL), a promising paradigm within CPath, continues to face challenges, particularly related to incomplete information utilization. Existing frameworks, such as those based on Convolutional Neural Networks (CNNs), attention, and selective scan space state sequential model (SSM), lack sufficient flexibility and scalability in fusing diverse features, and cannot effectively fuse diverse features. Additionally, current approaches do not adequately exploit order-related and order-independent features, resulting in suboptimal utilization of sequence information. To address these limitations, we propose a novel MIL framework called Mamba2MIL. Our framework utilizes the state space duality model (SSD) to model long sequences of patches of whole slide images (WSIs), which, combined with weighted feature selection, supports the fusion processing of more branching features and can be extended according to specific application needs. Moreover, we introduce a sequence transformation method tailored to varying WSI sizes, which enhances sequence-independent features while preserving local sequence information, thereby improving sequence information utilization. Extensive experiments demonstrate that Mamba2MIL surpasses state-of-the-art MIL methods. We conducted extensive experiments across multiple datasets, achieving improvements in nearly all performance metrics. Specifically, on the NSCLC dataset, Mamba2MIL achieves a binary tumor classification AUC of 0.9533 and an accuracy of 0.8794. On the BRACS dataset, it achieves a multiclass classification AUC of 0.7986 and an accuracy of 0.4981. The code is available at https://github.com/YuqiZhang-Buaa/Mamba2MIL. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.08121 [pdf]

Optimizing Highway Ramp Merge Safety and Efficiency via Spatio-Temporal Cooperative Control and Vehicle-Road Coordination

Authors: Ting Peng, Xiaoxue Xu, Yuan Li, Jie Wu, Tao Li, Xiang Dong, Yincai Cai, Peng Wu

Abstract: In view of existing automatic driving, it is difficult to accurately and timely obtain the status and driving intention of other vehicles. The safety risk and urgency of autonomous vehicles in the absence of collision are evaluated. To ensure safety and improve road efficiency, a method of pre-compiling the spatio-temporal trajectory of vehicles is established to eliminate conflicts between vehicl… ▽ More In view of existing automatic driving, it is difficult to accurately and timely obtain the status and driving intention of other vehicles. The safety risk and urgency of autonomous vehicles in the absence of collision are evaluated. To ensure safety and improve road efficiency, a method of pre-compiling the spatio-temporal trajectory of vehicles is established to eliminate conflicts between vehicles in advance. The calculation method of the safe distance under spatio-temporal conditions is studied, considering vehicle speed differences, vehicle positioning errors, and clock errors. By combining collision acceleration and urgent acceleration, an evaluation model for vehicle conflict risk is constructed. Mainline vehicles that may have conflicts with on-ramp vehicles are identified, and the target gap for on-ramp vehicles is determined. Finally, a cooperative control method is established based on the selected target gap, preparing the vehicle travel path in advance. Taking highway ramp merge as an example, the mainline priority spatio-temporal cooperative control method is proposed and verified through simulation. Using SUMO and Python co-simulation, mainline traffic volumes of 800 veh*h-1*lane-1 △ Less

Submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.05151 [pdf]

Meta-Learning Guided Label Noise Distillation for Robust Signal Modulation Classification

Authors: Xiaoyang Hao, Zhixi Feng, Tongqing Peng, Shuyuan Yang

Abstract: Automatic modulation classification (AMC) is an effective way to deal with physical layer threats of the internet of things (IoT). However, there is often label mislabeling in practice, which significantly impacts the performance and robustness of deep neural networks (DNNs). In this paper, we propose a meta-learning guided label noise distillation method for robust AMC. Specifically, a teacher-st… ▽ More Automatic modulation classification (AMC) is an effective way to deal with physical layer threats of the internet of things (IoT). However, there is often label mislabeling in practice, which significantly impacts the performance and robustness of deep neural networks (DNNs). In this paper, we propose a meta-learning guided label noise distillation method for robust AMC. Specifically, a teacher-student heterogeneous network (TSHN) framework is proposed to distill and reuse label noise. Based on the idea that labels are representations, the teacher network with trusted meta-learning divides and conquers untrusted label samples and then guides the student network to learn better by reassessing and correcting labels. Furthermore, we propose a multi-view signal (MVS) method to further improve the performance of hard-to-classify categories with few-shot trusted label samples. Extensive experimental results show that our methods can significantly improve the performance and robustness of signal AMC in various and complex label noise scenarios, which is crucial for securing IoT applications. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 8 pages, 7 figures

ACM Class: I.2; C.2

arXiv:2408.00799 [pdf, other]

Deep Uncertainty-Based Explore for Index Construction and Retrieval in Recommendation System

Authors: Xin Jiang, Kaiqiang Wang, Yinlong Wang, Fengchang Lv, Taiyang Peng, Shuai Yang, Xianteng Wu, Pengye Zhang, Shuo Yuan, Yifan Zeng

Abstract: In recommendation systems, the relevance and novelty of the final results are selected through a cascade system of Matching -> Ranking -> Strategy. The matching model serves as the starting point of the pipeline and determines the upper bound of the subsequent stages. Balancing the relevance and novelty of matching results is a crucial step in the design and optimization of recommendation systems,… ▽ More In recommendation systems, the relevance and novelty of the final results are selected through a cascade system of Matching -> Ranking -> Strategy. The matching model serves as the starting point of the pipeline and determines the upper bound of the subsequent stages. Balancing the relevance and novelty of matching results is a crucial step in the design and optimization of recommendation systems, contributing significantly to improving recommendation quality. However, the typical matching algorithms have not simultaneously addressed the relevance and novelty perfectly. One main reason is that deep matching algorithms exhibit significant uncertainty when estimating items in the long tail (e.g., due to insufficient training samples) items.The uncertainty not only affects the training of the models but also influences the confidence in the index construction and beam search retrieval process of these models. This paper proposes the UICR (Uncertainty-based explore for Index Construction and Retrieval) algorithm, which introduces the concept of uncertainty modeling in the matching stage and achieves multi-task modeling of model uncertainty and index uncertainty. The final matching results are obtained by combining the relevance score and uncertainty score infered by the model. Experimental results demonstrate that the UICR improves novelty without sacrificing relevance on realworld industrial productive environments and multiple open-source datasets. Remarkably, online A/B test results of display advertising in Shopee demonstrates the effectiveness of the proposed algorithm. △ Less

Submitted 5 August, 2024; v1 submitted 21 July, 2024; originally announced August 2024.

Comments: accepted by cikm2024

arXiv:2407.19867 [pdf]

Design and Testing for Steel Support Axial Force Servo System

Authors: Sana Ullah, Yonghong Zhou, Maokai Lai, Xiang Dong, Tao Li, Xiaoxue Xu, Yuan Li, Ting Peng

Abstract: Foundation excavations are deepening, expanding, and approaching structures. Steel supports measure and manage axial force. The study regulates steel support structure power during deep excavation using a novel axial force management system for safety, efficiency, and structural integrity. Closed-loop control changes actuator output to maintain axial force based on force. In deep excavation, the s… ▽ More Foundation excavations are deepening, expanding, and approaching structures. Steel supports measure and manage axial force. The study regulates steel support structure power during deep excavation using a novel axial force management system for safety, efficiency, and structural integrity. Closed-loop control changes actuator output to maintain axial force based on force. In deep excavation, the servo system regulates unstable soil, side pressure, and structural demands. Modern engineering and tech are used. Temperature changes automatically adjust the jack to maintain axial force. Includes hydraulic jacks, triple-acting cylinders, temperature, and deformation sensors, and automatic control. Foundation pit excavation is dynamic, yet structure tension is constant. There is no scientific way to regulate axial force foundation pit excavation. The revolutionary Servo system adjusts temperature, compression, and axial force to deform pits. System control requires foundation pit direction detection and modification. This engineering method has performed effectively for deep foundation pit excavation at railway crossings and other infrastructure projects. The surrounding protective structure may reduce the steel support's axial stress, making deep foundation excavation safe and efficient. Keywords: Servo systems, Steel strut support design, Deformation control, Monitoring and control, Deep excavation projects. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 6 pages,7 figures, 1 table, 2 graph, conference paper

arXiv:2407.12511 [pdf, other]

Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations

Authors: Tomáš Chobola, Yu Liu, Hanyi Zhang, Julia A. Schnabel, Tingying Peng

Abstract: Current deep learning-based low-light image enhancement methods often struggle with high-resolution images, and fail to meet the practical demands of visual perception across diverse and unseen scenarios. In this paper, we introduce a novel approach termed CoLIE, which redefines the enhancement process through mapping the 2D coordinates of an underexposed image to its illumination component, condi… ▽ More Current deep learning-based low-light image enhancement methods often struggle with high-resolution images, and fail to meet the practical demands of visual perception across diverse and unseen scenarios. In this paper, we introduce a novel approach termed CoLIE, which redefines the enhancement process through mapping the 2D coordinates of an underexposed image to its illumination component, conditioned on local context. We propose a reconstruction of enhanced-light images within the HSV space utilizing an implicit neural function combined with an embedded guided filter, thereby significantly reducing computational overhead. Moreover, we introduce a single image-based training loss function to enhance the model's adaptability to various scenes, further enhancing its practical applicability. Through rigorous evaluations, we analyze the properties of our proposed framework, demonstrating its superiority in both image quality and scene adaptability. Furthermore, our evaluation extends to applications in downstream tasks within low-light scenarios, underscoring the practical utility of CoLIE. The source code is available at https://github.com/ctom2/colie. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV24

arXiv:2407.09328 [pdf]

> 2π Phase Modulation using Exciton-Polaritons in a Two-Dimensional Superlattice

Authors: Jason Lynch, Pawan Kumar, Chen Chen, Nicholas Trainor, Shalini Kumari, Tzu-Yu Peng, Cindy Yueli Chen, Yu-Jung Lu, Joan Redwing, Deep Jariwala

Abstract: Active metamaterials promise to enable arbitrary, temporal control over the propagation of wavefronts of light for applications such as beam steering, optical communication modulators, and holograms. This has been done in the past using patterned silicon photonics to locally control the phase of light such that the metasurface acts as a large number of wavelets. Although phase modulation only requ… ▽ More Active metamaterials promise to enable arbitrary, temporal control over the propagation of wavefronts of light for applications such as beam steering, optical communication modulators, and holograms. This has been done in the past using patterned silicon photonics to locally control the phase of light such that the metasurface acts as a large number of wavelets. Although phase modulation only requires refractive index modulation when the interaction length is on the order of the wavelength, this is not enough to significantly modulate the phase of light in flatland. Instead, phase modulation is achieved using a resonant mode such as a plasmon or high-Q cavity mode that enable light to accumulate a large amount of phase over a short distance and coupling it to an active material that modulates the light-matter interactions. Here, we report that electrostatic doping can modulate the light-matter interaction strength of a two-dimensional WS2 based multi quantum well (MQW) structure going from strongly-coupled, phase-accumulating exciton-polaritons to weakly-coupled exciton-trion-polaritons. As a result of this transition, 2.02π radians of phase modulation is observed using spectroscopic ellipsometry. This result demonstrates the potential of the MQW structure as a compact, lightweight electro-optical modulators for LiDAR and optical communications in the red region of visible spectrum. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.06612 [pdf]

AI-based Automatic Segmentation of Prostate on Multi-modality Images: A Review

Authors: Rui Jin, Derun Li, Dehui Xiang, Lei Zhang, Hailing Zhou, Fei Shi, Weifang Zhu, Jing Cai, Tao Peng, Xinjian Chen

Abstract: Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CAD) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The ad… ▽ More Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CAD) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The advent of precision medicine and a significant increase in clinical capacity have spurred the need for various data-driven tasks in the field of medical imaging. Recently, numerous machine learning and data mining tools have been integrated into various medical areas, including image segmentation. This article proposes a new classification method that differentiates supervision types, either in number or kind, during the training phase. Subsequently, we conducted a survey on artificial intelligence (AI)-based automatic prostate segmentation methods, examining the advantages and limitations of each. Additionally, we introduce variants of evaluation metrics for the verification and performance assessment of the segmentation method and summarize the current challenges. Finally, future research directions and development trends are discussed, reflecting the outcomes of our literature survey, suggesting high-precision detection and treatment of prostate cancer as a promising avenue. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.03671 [pdf]

Spatio-temporal cooperative control Method of Highway Ramp Merge Based on Vehicle-road Coordination

Authors: Xiaoxue Xu, Maokai Lai, Haitao Zhang, Xiang Dong, Tao Li, Jie Wu, Yuan Li, Ting Peng

Abstract: The merging area of highway ramps faces multiple challenges, including traffic congestion, collision risks, speed mismatches, driver behavior uncertainties, limited visibility, and bottleneck effects. However, autonomous vehicles engaging in depth coordination between vehicle and road in merging zones, by pre-planning and uploading travel trajectories, can significantly enhance the safety and effi… ▽ More The merging area of highway ramps faces multiple challenges, including traffic congestion, collision risks, speed mismatches, driver behavior uncertainties, limited visibility, and bottleneck effects. However, autonomous vehicles engaging in depth coordination between vehicle and road in merging zones, by pre-planning and uploading travel trajectories, can significantly enhance the safety and efficiency of merging zones.In this paper,we mainly introduce mainline priority cooperation method to achieve the time and space cooperative control of highway merge.Vehicle-mounted intelligent units share real-time vehicle status and driving intentions with Road Section Management Units, which pre-plan the spatiotemporal trajectories of vehicle travel. After receiving these trajectories, Vehicle Intelligent Units strictly adhere to them. Through this deep collaboration between vehicles and roads, conflicts in time and space during vehicle travel are eliminated in advance. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02282 [pdf, other]

Learning Bipedal Walking on a Quadruped Robot via Adversarial Motion Priors

Authors: Tianhu Peng, Lingfan Bao, Joseph Humphreys, Andromachi Maria Delfaki, Dimitrios Kanoulas, Chengxu Zhou

Abstract: Previous studies have successfully demonstrated agile and robust locomotion in challenging terrains for quadrupedal robots. However, the bipedal locomotion mode for quadruped robots remains unverified. This paper explores the adaptation of a learning framework originally designed for quadrupedal robots to operate blind locomotion in biped mode. We leverage a framework that incorporates Adversarial… ▽ More Previous studies have successfully demonstrated agile and robust locomotion in challenging terrains for quadrupedal robots. However, the bipedal locomotion mode for quadruped robots remains unverified. This paper explores the adaptation of a learning framework originally designed for quadrupedal robots to operate blind locomotion in biped mode. We leverage a framework that incorporates Adversarial Motion Priors with a teacher-student policy to enable imitation of a reference trajectory and navigation on tough terrain. Our work involves transferring and evaluating a similar learning framework on a quadruped robot in biped mode, aiming to achieve stable walking on both flat and complicated terrains. Our simulation results demonstrate that the trained policy enables the quadruped robot to navigate both flat and challenging terrains, including stairs and uneven surfaces. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 7 pages,5 figures

arXiv:2406.13285 [pdf, ps, other]

The extremal problem for weighted combined energy and $ρ-$Nitsche type inequality

Authors: Ting Peng, Chaochuan Wang, Xiaogao Feng

Abstract: Let $A_1$ and $A_2$ be two circular annuli and let $ρ$ be a radial metric defined in the annuli $A_2$. We study the existence and uniqueness of the extremal problem for weighted combined energy between $A_1$ and $A_2$, and obtain that the extremal mapping is a certain radial mapping. In fact, this extremal mapping generalizes the $ρ-$harmonic mapping and satisfies equation (2.7) obtained by mean o… ▽ More Let $A_1$ and $A_2$ be two circular annuli and let $ρ$ be a radial metric defined in the annuli $A_2$. We study the existence and uniqueness of the extremal problem for weighted combined energy between $A_1$ and $A_2$, and obtain that the extremal mapping is a certain radial mapping. In fact, this extremal mapping generalizes the $ρ-$harmonic mapping and satisfies equation (2.7) obtained by mean of variation for weighted combined energy. Meanwhile, we get a $ρ-$Nitsche type inequality. This extends the results of Kalaj (J. Differential Equations, 268(2020)) and YTF (Arch. Math., 122(2024)), where they considered the case $ρ=1$ and $ρ=\frac{1}{|h|^{2}}$, respectively. Moreover, in the course of proving the extremal problem for weighted combined energy we also investigate the extremal problem for the weighted combined distortion (see Theorem 4.1). This extends the result obtained by Kalaj (J. London Math. Soc., 93(2016)). △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 14 pages

MSC Class: 30C70

arXiv:2406.01939 [pdf, other]

Speeding up Policy Simulation in Supply Chain RL

Authors: Vivek Farias, Joren Gijsbrechts, Aryan Khojandi, Tianyi Peng, Andrew Zheng

Abstract: Simulating a single trajectory of a dynamical system under some state-dependent policy is a core bottleneck in policy optimization algorithms. The many inherently serial policy evaluations that must be performed in a single simulation constitute the bulk of this bottleneck. To wit, in applying policy optimization to supply chain optimization (SCO) problems, simulating a single month of a supply ch… ▽ More Simulating a single trajectory of a dynamical system under some state-dependent policy is a core bottleneck in policy optimization algorithms. The many inherently serial policy evaluations that must be performed in a single simulation constitute the bulk of this bottleneck. To wit, in applying policy optimization to supply chain optimization (SCO) problems, simulating a single month of a supply chain can take several hours. We present an iterative algorithm for policy simulation, which we dub Picard Iteration. This scheme carefully assigns policy evaluation tasks to independent processes. Within an iteration, a single process evaluates the policy only on its assigned tasks while assuming a certain 'cached' evaluation for other tasks; the cache is updated at the end of the iteration. Implemented on GPUs, this scheme admits batched evaluation of the policy on a single trajectory. We prove that the structure afforded by many SCO problems allows convergence in a small number of iterations, independent of the horizon. We demonstrate practical speedups of 400x on large-scale SCO problems even with a single GPU, and also demonstrate practical efficacy in other RL environments. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.08621 [pdf, other]

RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content

Authors: Tianhao Peng, Chen Feng, Duolikun Danier, Fan Zhang, Benoit Vallade, Alex Mackin, David Bull

Abstract: With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artifacts, and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose… ▽ More With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artifacts, and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a novel blind deep video quality assessment (VQA) method specifically for enhanced video content. It employs a new Recurrent Memory Transformer (RMT) based network architecture to obtain video quality representations, which is optimized through a novel content-quality-aware contrastive learning strategy based on a new database containing 13K training patches with enhanced content. The extracted quality representations are then combined through linear regression to generate video-level quality indices. The proposed method, RMT-BVQA, has been evaluated on the VDPVE (VQA Dataset for Perceptual Video Enhancement) database through a five-fold cross validation. The results show its superior correlation performance when compared to ten existing no-reference quality metrics. △ Less

Submitted 5 September, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: This paper has been accepted by the ECCV 2024 AIM Advances in Image Manipulation workshop

arXiv:2405.01872 [pdf, other]

Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

Authors: Yichun Tai, Kun Yang, Tao Peng, Zhenzhen Huang, Zhijiang Zhang

Abstract: The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To thi… ▽ More The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To this end, we propose Stable Surface Defect Generation (StableSDG), which transfers the vast generation distribution embedded in Stable Diffusion model for steel surface defect image generation. To tackle with the distinctive distribution gap between steel surface images and generated images of the diffusion model, we propose two processes. First, we align the distribution by adapting parameters of the diffusion model, adopted both in the token embedding space and network parameter space. Besides, in the generation process, we propose image-oriented generation rather than from pure Gaussian noises. We conduct extensive experiments on steel surface defect dataset, demonstrating state-of-the-art performance on generating high-quality samples and training recognition models, and both designed processes are significant for the performance. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.19368 [pdf, other]

Exploring Multi-Lingual Bias of Large Code Models in Code Generation

Authors: Chaozheng Wang, Zongjie Li, Cuiyun Gao, Wenxuan Wang, Ting Peng, Hailiang Huang, Yuetang Deng, Shuai Wang, Michael R. Lyu

Abstract: Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despi… ▽ More Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despite the effectiveness, we observe a noticeable multilingual bias in the generation performance of LCMs. Specifically, LCMs demonstrate proficiency in generating solutions when provided with instructions in English, yet may falter when faced with semantically equivalent instructions in other NLs such as Chinese. Moreover, the ability of LCMs to generate code exhibits variety across different programming languages (PLs), such as Python and C++. The observed phenomenon indicates the presence of multi-lingual bias within the generative capabilities of LCMs, which has remained unexplored. In this paper, we aim to investigate the multi-lingual bias that exists in current LCMs. First, we initiate our investigation by constructing the first multi-lingual evaluation benchmark X-HumanEval-X, enabling us to systematically evaluate the extent of multi-lingual bias that exists in current LCMs. In our large-scale experiments on nine popular LCMs, we observe a pronounced multi-lingual bias of LCMs in code generation, including multi-NL and multi-PL bias. Specifically, when using Chinese instructions, the code generation capabilities of LCMs decrease by at least 13% in terms of the Pass@1 metric. Furthermore, LCMs perform variously across different programming languages, e.g., the performance gap between Python and C++ reaches as high as 20.9%. ... △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 12 pages

arXiv:2404.17070 [pdf, other]

Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey

Authors: Lingfan Bao, Joseph Humphreys, Tianhu Peng, Chengxu Zhou

Abstract: Bipedal robots are garnering increasing global attention due to their potential applications and advancements in artificial intelligence, particularly in Deep Reinforcement Learning (DRL). While DRL has driven significant progress in bipedal locomotion, developing a comprehensive and unified framework capable of adeptly performing a wide range of tasks remains a challenge. This survey systematical… ▽ More Bipedal robots are garnering increasing global attention due to their potential applications and advancements in artificial intelligence, particularly in Deep Reinforcement Learning (DRL). While DRL has driven significant progress in bipedal locomotion, developing a comprehensive and unified framework capable of adeptly performing a wide range of tasks remains a challenge. This survey systematically categorizes, compares, and summarizes existing DRL frameworks for bipedal locomotion, organizing them into end-to-end and hierarchical control schemes. End-to-end frameworks are assessed based on their learning approaches, whereas hierarchical frameworks are dissected into layers that utilize either learning-based methods or traditional model-based approaches. This survey provides a detailed analysis of the composition, capabilities, strengths, and limitations of each framework type. Furthermore, we identify critical research gaps and propose future directions aimed at achieving a more integrated and efficient framework for bipedal locomotion, with potential broad applications in everyday life. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 14 pages, 4 figures

arXiv:2404.05022 [pdf, other]

DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology

Authors: Valentin Koch, Sophia J. Wagner, Salome Kazeminia, Ece Sancar, Matthias Hehr, Julia Schnabel, Tingying Peng, Carsten Marr

Abstract: In hematology, computational models offer significant potential to improve diagnostic accuracy, streamline workflows, and reduce the tedious work of analyzing single cells in peripheral blood or bone marrow smears. However, clinical adoption of computational models has been hampered by the lack of generalization due to large batch effects, small dataset sizes, and poor performance in transfer lear… ▽ More In hematology, computational models offer significant potential to improve diagnostic accuracy, streamline workflows, and reduce the tedious work of analyzing single cells in peripheral blood or bone marrow smears. However, clinical adoption of computational models has been hampered by the lack of generalization due to large batch effects, small dataset sizes, and poor performance in transfer learning from natural images. To address these challenges, we introduce DinoBloom, the first foundation model for single cell images in hematology, utilizing a tailored DINOv2 pipeline. Our model is built upon an extensive collection of 13 diverse, publicly available datasets of peripheral blood and bone marrow smears, the most substantial open-source cohort in hematology so far, comprising over 380,000 white blood cell images. To assess its generalization capability, we evaluate it on an external dataset with a challenging domain shift. We show that our model outperforms existing medical and non-medical vision models in (i) linear probing and k-nearest neighbor evaluations for cell-type classification on blood and bone marrow smears and (ii) weakly supervised multiple instance learning for acute myeloid leukemia subtyping by a large margin. A family of four DinoBloom models (small, base, large, and giant) can be adapted for a wide range of downstream applications, be a strong baseline for classification problems, and facilitate the assessment of batch effects in new datasets. All models are available at github.com/marrlab/DinoBloom. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2403.19763 [pdf, other]

Creating Aesthetic Sonifications on the Web with SIREN

Authors: Tristan Peng, Hongchan Choi, Jonathan Berger

Abstract: SIREN is a flexible, extensible, and customizable web-based general-purpose interface for auditory data display (sonification). Designed as a digital audio workstation for sonification, synthesizers written in JavaScript using the Web Audio API facilitate intuitive mapping of data to auditory parameters for a wide range of purposes. This paper explores the breadth of sound synthesis techniques s… ▽ More SIREN is a flexible, extensible, and customizable web-based general-purpose interface for auditory data display (sonification). Designed as a digital audio workstation for sonification, synthesizers written in JavaScript using the Web Audio API facilitate intuitive mapping of data to auditory parameters for a wide range of purposes. This paper explores the breadth of sound synthesis techniques supported by SIREN, and details the structure and definition of a SIREN synthesizer module. The paper proposes further development that will increase SIREN's utility. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 7 pages, 1 figure, 5 listings, submitted to the Web Audio Conference 2024

arXiv:2403.07720 [pdf, other]

Multi-modal Auto-regressive Modeling via Visual Words

Authors: Tianshuo Peng, Zuchao Li, Lefei Zhang, Hai Zhao, Ping Wang, Bo Du

Abstract: Large Language Models (LLMs), benefiting from the auto-regressive modelling approach performed on massive unannotated texts corpora, demonstrates powerful perceptual and reasoning capabilities. However, as for extending auto-regressive modelling to multi-modal scenarios to build Large Multi-modal Models (LMMs), there lies a great difficulty that the image information is processed in the LMM as con… ▽ More Large Language Models (LLMs), benefiting from the auto-regressive modelling approach performed on massive unannotated texts corpora, demonstrates powerful perceptual and reasoning capabilities. However, as for extending auto-regressive modelling to multi-modal scenarios to build Large Multi-modal Models (LMMs), there lies a great difficulty that the image information is processed in the LMM as continuous visual embeddings, which cannot obtain discrete supervised labels for classification. In this paper, we successfully perform multi-modal auto-regressive modeling with a unified objective for the first time. Specifically, we propose the concept of visual words, which maps the visual features to probability distributions over LLM's vocabulary, providing supervision information for visual modelling. We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information. Experimental results and ablation studies on 5 VQA tasks and 4 benchmark toolkits validate the powerful performance of our proposed approach. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.04338 [pdf, other]

doi 10.1103/PhysRevB.109.195301

Structural disorder-induced topological phase transitions in quasicrystals

Authors: Tan Peng, Yong-Chen Xiong, Chun-Bo Hua, Zheng-Rong Liu, Xiaolu Zhu, Wei Cao, Fang Lv, Yue Hou, Bin Zhou, Ziyu Wang, Rui Xiong

Abstract: Recently, the structural disorder-induced topological phase transitions in periodic systems have attracted much attention. However, in aperiodic systems such as quasicrystalline systems, the interplay between structural disorder and band topology is still unclear. In this work, we investigate the effects of structural disorder on a quantum spin Hall insulator phase and a higher-order topological p… ▽ More Recently, the structural disorder-induced topological phase transitions in periodic systems have attracted much attention. However, in aperiodic systems such as quasicrystalline systems, the interplay between structural disorder and band topology is still unclear. In this work, we investigate the effects of structural disorder on a quantum spin Hall insulator phase and a higher-order topological phase in a two-dimensional Amman-Beenker tiling quasicrystalline lattice, respectively. We demonstrate that the structural disorder can induce a topological phase transition from a quasicrystalline normal insulator phase to an amorphous quantum spin Hall insulator phase, which is confirmed by bulk gap closing and reopening, robust edge states, quantized spin Bott index and conductance. Furthermore, the structural disorder-induced higher-order topological phase transition from a quasicrystalline normal insulator phase to an amorphous higher-order topological phase characterized by quantized quadrupole moment and topological corner states is also found. More strikingly, the disorder-induced higher-order topological insulator with eight corner states represents a distinctive topological state that eludes realization in conventional crystalline systems. Our work extends the study of the interplay between disorder effects and topologies to quasicrystalline and amorphous systems. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 9 pages,7 figures. arXiv admin note: text overlap with arXiv:2108.04971

Journal ref: Phys. Rev. B 109, 195301 (2024)

arXiv:2403.03705 [pdf, other]

doi 10.1103/PhysRevD.109.094048

How higher charmonia shape the puzzling data of the $e^+e^-\to ηJ/ψ$ cross section

Authors: Tian-Cai Peng, Zi-Yue Bai, Jun-Zhang Wang, Xiang Liu

Abstract: Recently, the BESIII collaboration performed a precise measurement of the $e^+e^-\to ηJ/ψ$ cross section. It is puzzling that the resonance parameters of the reported $Y(4230)$ show a substantial divergence from the previously measured results in both the open-charmed and hidden-charmed decay channels, and the line shape asymmetry of the data approaching 4.2 GeV also suggests that it might be diff… ▽ More Recently, the BESIII collaboration performed a precise measurement of the $e^+e^-\to ηJ/ψ$ cross section. It is puzzling that the resonance parameters of the reported $Y(4230)$ show a substantial divergence from the previously measured results in both the open-charmed and hidden-charmed decay channels, and the line shape asymmetry of the data approaching 4.2 GeV also suggests that it might be difficult to characterize the details of the structure around 4.2 GeV by a single resonance. This has motivated our great curiosity about how the charmonium states are distributed in the measured energy range and how they shape the puzzling data of the $e^+e^-\to ηJ/ψ$ cross section. In this work, we use five theoretically constructed charmonia in the range of $4.0\rm{-}4.5$ $\text{GeV}$, i.e., $ψ(4040)$, $ψ(4160)$, $ψ(4220)$, $ψ(4380)$, and $ψ(4415)$, to apply a combined fit to the data, in which their calculated decay ratios into $ηJ/ψ$ via hadronic loop mechanism are taken as input. The fit results can reproduce the measured cross section data well, especially for the subtle line shape around 4.2 GeV, showing that the structure around 4.2 GeV is possible from the contribution of both $ψ(4160)$ and $ψ(4220)$. △ Less

Submitted 30 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: 11 pages, 4 figures and 4 tables. Accepted by Phys. Rev. D. More discussions were added

Journal ref: Phys. Rev. D 109, 094048 (2024)

arXiv:2402.18189 [pdf, other]

VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation

Authors: Tao Peng, Ling Gui, Yi Sun

Abstract: In recent years, the rapid development of deep learning technology has brought new prospects to the field of vulnerability detection. Many vulnerability detection methods involve converting source code into images for detection, yet they often overlook the quality of the generated images. Due to the fact that vulnerability images lack clear and continuous contours, unlike images used in object det… ▽ More In recent years, the rapid development of deep learning technology has brought new prospects to the field of vulnerability detection. Many vulnerability detection methods involve converting source code into images for detection, yet they often overlook the quality of the generated images. Due to the fact that vulnerability images lack clear and continuous contours, unlike images used in object detection, Convolutional Neural Networks (CNNs) tend to lose semantic information during the convolution and pooling processes. Therefore, this paper proposes a pixel row oversampling method based on code line concatenation to generate more continuous code features, addressing the issue of discontinuity in code image coloration.Building upon these contributions, we propose the vulnerability detection system VulMCI and conduct tests on the SARD and NVD datasets. Experimental results demonstrate that VulMCI outperforms seven state-of-the-art vulnerability detectors (namely Checkmarx, FlawFinder, RATS, VulDeePecker, SySeVR, VulCNN, and Devign). Compared to other image-based methods, VulMCI shows improvements in various metrics, including a 2.877\% increase in True Positive Rate (TPR), a 5.446\% increase in True Negative Rate (TNR), and a 5.91\% increase in Accuracy (ACC). On the NVD real-world dataset, VulMCI achieves an average accuracy of 5.162\%, confirming its value in practical vulnerability detection applications. △ Less

Submitted 16 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.12620 [pdf, other]

Are Large Language Models (LLMs) Good Social Predictors?

Authors: Kaiqi Yang, Hang Li, Hongzhi Wen, Tai-Quan Peng, Jiliang Tang, Hui Liu

Abstract: The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by pre… ▽ More The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by previous studies is because of the existence of input shortcut features to the response. In fact, by removing these shortcuts, the performance is reduced dramatically. To further revisit the ability of LLMs, we introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings. With the comprehensive investigations on various LLMs, we reveal that LLMs cannot work as expected on social prediction when given general input features without shortcuts. We further investigate possible reasons for this phenomenon that suggest potential ways to enhance LLMs for social prediction. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.09948 [pdf, ps, other]

The extremal problem for weighted combined energy and the generalization of Nitsche inequality

Authors: Xiaogao Feng, Ruyue Tang, Ting Peng

Abstract: We consider the existence and uniqueness of a minimizer of the extremal problem for weighted combined energy between two concentric annuli and obtain that the extremal mapping is a certain radial mapping. Meanwhile, this in turn implies a Nitsche type phenomenon and we get a $\frac{1}{|w|^λ}-$Nitsche type inequality ($λ\neq1$). As an application, on the basis of the relationship between weighted c… ▽ More We consider the existence and uniqueness of a minimizer of the extremal problem for weighted combined energy between two concentric annuli and obtain that the extremal mapping is a certain radial mapping. Meanwhile, this in turn implies a Nitsche type phenomenon and we get a $\frac{1}{|w|^λ}-$Nitsche type inequality ($λ\neq1$). As an application, on the basis of the relationship between weighted combined energy and weighted combined distortion, we also investigate the extremal problem for weighted combined distortion on annuli. This extends the result obtained by Kalaj in \cite{Ka1}. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 15 pages

MSC Class: 30C62

arXiv:2401.08868 [pdf, other]

B-Cos Aligned Transformers Learn Human-Interpretable Features

Authors: Manuel Tran, Amal Lahiani, Yashin Dicente Cid, Melanie Boxberg, Peter Lienemann, Christian Matek, Sophia J. Wagner, Fabian J. Theis, Eldad Klaiman, Tingying Peng

Abstract: Vision Transformers (ViTs) and Swin Transformers (Swin) are currently state-of-the-art in computational pathology. However, domain experts are still reluctant to use these models due to their lack of interpretability. This is not surprising, as critical decisions need to be transparent and understandable. The most common approach to understanding transformers is to visualize their attention. Howev… ▽ More Vision Transformers (ViTs) and Swin Transformers (Swin) are currently state-of-the-art in computational pathology. However, domain experts are still reluctant to use these models due to their lack of interpretability. This is not surprising, as critical decisions need to be transparent and understandable. The most common approach to understanding transformers is to visualize their attention. However, attention maps of ViTs are often fragmented, leading to unsatisfactory explanations. Here, we introduce a novel architecture called the B-cos Vision Transformer (BvT) that is designed to be more interpretable. It replaces all linear transformations with the B-cos transform to promote weight-input alignment. In a blinded study, medical experts clearly ranked BvTs above ViTs, suggesting that our network is better at capturing biomedically relevant structures. This is also true for the B-cos Swin Transformer (Bwin). Compared to the Swin Transformer, it even improves the F1-score by up to 4.7% on two public datasets. △ Less

Submitted 18 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted at MICCAI 2023 (oral). Camera-ready available at https://doi.org/10.1007/978-3-031-43993-3_50

arXiv:2401.04720 [pdf, other]

Low-resource finetuning of foundation models beats state-of-the-art in histopathology

Authors: Benedikt Roth, Valentin Koch, Sophia J. Wagner, Julia A. Schnabel, Carsten Marr, Tingying Peng

Abstract: To handle the large scale of whole slide images in computational pathology, most approaches first tessellate the images into smaller patches, extract features from these patches, and finally aggregate the feature vectors with weakly-supervised learning. The performance of this workflow strongly depends on the quality of the extracted features. Recently, foundation models in computer vision showed… ▽ More To handle the large scale of whole slide images in computational pathology, most approaches first tessellate the images into smaller patches, extract features from these patches, and finally aggregate the feature vectors with weakly-supervised learning. The performance of this workflow strongly depends on the quality of the extracted features. Recently, foundation models in computer vision showed that leveraging huge amounts of data through supervised or self-supervised learning improves feature quality and generalizability for a variety of tasks. In this study, we benchmark the most popular vision foundation models as feature extractors for histopathology data. We evaluate the models in two settings: slide-level classification and patch-level classification. We show that foundation models are a strong baseline. Our experiments demonstrate that by finetuning a foundation model on a single GPU for only two hours or three days depending on the dataset, we can match or outperform state-of-the-art feature extractors for computational pathology. These findings imply that even with little resources one can finetune a feature extractor tailored towards a specific downstream task and dataset. This is a considerable shift from the current state, where only few institutions with large amounts of resources and datasets are able to train a feature extractor. We publish all code used for training and evaluation as well as the finetuned models. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.02669 [pdf, other]

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

Authors: Bin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin

Abstract: Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressive nature of LLMs results in highly dynamic behavior of the attention layers, showcasing significant differences in computational characteristics and memory requirements from the non-attention layers.… ▽ More Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressive nature of LLMs results in highly dynamic behavior of the attention layers, showcasing significant differences in computational characteristics and memory requirements from the non-attention layers. This presents substantial challenges for resource management and performance optimization in service systems. Existing static model parallelism and resource allocation strategies fall short when dealing with this dynamicity. To address the issue, we propose Infinite-LLM, a novel LLM serving system designed to effectively handle dynamic context lengths. Infinite-LLM disaggregates attention layers from an LLM's inference process, facilitating flexible and independent resource scheduling that optimizes computational performance and enhances memory utilization jointly. By leveraging a pooled GPU memory strategy across a cluster, Infinite-LLM not only significantly boosts system throughput but also supports extensive context lengths. Evaluated on a dataset with context lengths ranging from a few to 2000K tokens across a cluster with 32 A100 GPUs, Infinite-LLM demonstrates throughput improvement of 1.35-3.4x compared to state-of-the-art methods, enabling efficient and elastic LLM deployment. △ Less

Submitted 4 July, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

arXiv:2312.12917 [pdf, other]

Sign Language Production with Latent Motion Transformer

Authors: Pan Xie, Taiyi Peng, Yao Du, Qipeng Zhang

Abstract: Sign Language Production (SLP) is the tough task of turning sign language into sign videos. The main goal of SLP is to create these videos using a sign gloss. In this research, we've developed a new method to make high-quality sign videos without using human poses as a middle step. Our model works in two main parts: first, it learns from a generator and the video's hidden features, and next, it us… ▽ More Sign Language Production (SLP) is the tough task of turning sign language into sign videos. The main goal of SLP is to create these videos using a sign gloss. In this research, we've developed a new method to make high-quality sign videos without using human poses as a middle step. Our model works in two main parts: first, it learns from a generator and the video's hidden features, and next, it uses another model to understand the order of these hidden features. To make this method even better for sign videos, we make several significant improvements. (i) In the first stage, we take an improved 3D VQ-GAN to learn downsampled latent representations. (ii) In the second stage, we introduce sequence-to-sequence attention to better leverage conditional information. (iii) The separated two-stage training discards the realistic visual semantic of the latent codes in the second stage. To endow the latent sequences semantic information, we extend the token-level autoregressive latent codes learning with perceptual loss and reconstruction loss for the prior model with visual perception. Compared with previous state-of-the-art approaches, our model performs consistently better on two word-level sign language datasets, i.e., WLASL and NMFs-CSL. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted by WACV2024

arXiv:2312.09708 [pdf, other]

GraphRARE: Reinforcement Learning Enhanced Graph Neural Network with Relative Entropy

Authors: Tianhao Peng, Wenjun Wu, Haitao Yuan, Zhifeng Bao, Zhao Pengrui, Xin Yu, Xuetao Lin, Yu Liang, Yanjun Pu

Abstract: Graph neural networks (GNNs) have shown advantages in graph-based analysis tasks. However, most existing methods have the homogeneity assumption and show poor performance on heterophilic graphs, where the linked nodes have dissimilar features and different class labels, and the semantically related nodes might be multi-hop away. To address this limitation, this paper presents GraphRARE, a general… ▽ More Graph neural networks (GNNs) have shown advantages in graph-based analysis tasks. However, most existing methods have the homogeneity assumption and show poor performance on heterophilic graphs, where the linked nodes have dissimilar features and different class labels, and the semantically related nodes might be multi-hop away. To address this limitation, this paper presents GraphRARE, a general framework built upon node relative entropy and deep reinforcement learning, to strengthen the expressive capability of GNNs. An innovative node relative entropy, which considers node features and structural similarity, is used to measure mutual information between node pairs. In addition, to avoid the sub-optimal solutions caused by mixing useful information and noises of remote nodes, a deep reinforcement learning-based algorithm is developed to optimize the graph topology. This algorithm selects informative nodes and discards noisy nodes based on the defined node relative entropy. Extensive experiments are conducted on seven real-world datasets. The experimental results demonstrate the superiority of GraphRARE in node classification and its capability to optimize the original graph topology. △ Less

Submitted 13 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 14 pages, 7 figures

arXiv:2312.08084 [pdf, other]

A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis

Authors: Tianshuo Peng, Zuchao Li, Ping Wang, Lefei Zhang, Hai Zhao

Abstract: Multi-modal aspect-based sentiment analysis (MABSA) has recently attracted increasing attention. The span-based extraction methods, such as FSUIE, demonstrate strong performance in sentiment analysis due to their joint modeling of input sequences and target labels. However, previous methods still have certain limitations: (i) They ignore the difference in the focus of visual information between di… ▽ More Multi-modal aspect-based sentiment analysis (MABSA) has recently attracted increasing attention. The span-based extraction methods, such as FSUIE, demonstrate strong performance in sentiment analysis due to their joint modeling of input sequences and target labels. However, previous methods still have certain limitations: (i) They ignore the difference in the focus of visual information between different analysis targets (aspect or sentiment). (ii) Combining features from uni-modal encoders directly may not be sufficient to eliminate the modal gap and can cause difficulties in capturing the image-text pairwise relevance. (iii) Existing span-based methods for MABSA ignore the pairwise relevance of target span boundaries. To tackle these limitations, we propose a novel framework called DQPSA for multi-modal sentiment analysis. Specifically, our model contains a Prompt as Dual Query (PDQ) module that uses the prompt as both a visual query and a language query to extract prompt-aware visual information and strengthen the pairwise relevance between visual information and the analysis target. Additionally, we introduce an Energy-based Pairwise Expert (EPE) module that models the boundaries pairing of the analysis target from the perspective of an Energy-based Model. This expert predicts aspect or sentiment span based on pairwise stability. Experiments on three widely used benchmarks demonstrate that DQPSA outperforms previous approaches and achieves a new state-of-the-art performance. △ Less

Submitted 15 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: AAAI2024

arXiv:2312.02605 [pdf, other]

doi 10.1109/PCS60826.2024.10566283

Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation

Authors: Tianhao Peng, Ge Gao, Heming Sun, Fan Zhang, David Bull

Abstract: In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a… ▽ More In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a novel model-agnostic pruning scheme based on gradient decay and adaptive layer-wise distillation. Gradient decay enhances parameter exploration during sparsification whilst preventing runaway sparsity and is superior to the standard Straight-Through Estimation. The adaptive layer-wise distillation regulates the sparse training in various stages based on the distortion of intermediate features. This stage-wise design efficiently updates parameters with minimal computational overhead. The proposed approach has been applied to three popular end-to-end learnt video codecs, FVC, DCVC, and DCVC-HEM. Results confirm that our method yields up to 65% reduction in MACs and 2x speed-up with less than 0.3dB drop in BD-PSNR. Supporting code and supplementary material can be downloaded from: https://jasminepp.github.io/lightweightdvc/ △ Less

Submitted 5 December, 2023; originally announced December 2023.

Report number: 2312.02605

arXiv:2311.13117 [pdf]

Bile dynamics within the biliary tract and microfluidic-based bile component detection: A review

Authors: Tao Peng, Chenxiao Zhou, Zhexin Zhang, Yingying Liu, Xiaodong Lin, Yongqing, Yunlong Zhong, Ping Wang, Yanwei Jia

Abstract: Bilestones are solid masses found in the gallbladder or biliary tract, which block the normal bile flow and eventually result in severe life-threatening complications. Studies have shown that bilestone formation may be related to bile flow dynamics and the concentration level of bile components. The bile flow dynamics in the biliary tract play a critical role in disclosing the mechanism of bile st… ▽ More Bilestones are solid masses found in the gallbladder or biliary tract, which block the normal bile flow and eventually result in severe life-threatening complications. Studies have shown that bilestone formation may be related to bile flow dynamics and the concentration level of bile components. The bile flow dynamics in the biliary tract play a critical role in disclosing the mechanism of bile stasis and transportation. The concentration of bile composition is closely associated with processes such as nucleation and crystallization. Recently, microfluidic-based biosensors have been favored for multiple advantages over traditional bench-top detection assays for their less sample consumption, portability, low cost, and high sensitivity for real-time detection. Here, we reviewed the developments in bile dynamics study and microfluidics-based bile component detection methods. These studies may provide valuable insights into the bilestone formation mechanisms and better treatment, alongside our opinions on the future development of in vitro lithotriptic drug screening of bilestones and bile characterization tests. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.12461 [pdf, other]

HiFi-Syn: Hierarchical Granularity Discrimination for High-Fidelity Synthesis of MR Images with Structure Preservation

Authors: Ziqi Yu, Botao Zhao, Shengjie Zhang, Xiang Chen, Jianfeng Feng, Tingying Peng, Xiao-Yong Zhang

Abstract: Synthesizing medical images while preserving their structural information is crucial in medical research. In such scenarios, the preservation of anatomical content becomes especially important. Although recent advances have been made by incorporating instance-level information to guide translation, these methods overlook the spatial coherence of structural-level representation and the anatomical i… ▽ More Synthesizing medical images while preserving their structural information is crucial in medical research. In such scenarios, the preservation of anatomical content becomes especially important. Although recent advances have been made by incorporating instance-level information to guide translation, these methods overlook the spatial coherence of structural-level representation and the anatomical invariance of content during translation. To address these issues, we introduce hierarchical granularity discrimination, which exploits various levels of semantic information present in medical images. Our strategy utilizes three levels of discrimination granularity: pixel-level discrimination using a Brain Memory Bank, structure-level discrimination on each brain structure with a re-weighting strategy to focus on hard samples, and global-level discrimination to ensure anatomical consistency during translation. The image translation performance of our strategy has been evaluated on three independent datasets (UK Biobank, IXI, and BraTS 2018), and it has outperformed state-of-the-art algorithms. Particularly, our model excels not only in synthesizing normal structures but also in handling abnormal (pathological) structures, such as brain tumors, despite the variations in contrast observed across different imaging modalities due to their pathological characteristics. The diagnostic value of synthesized MR images containing brain tumors has been evaluated by radiologists. This indicates that our model may offer an alternative solution in scenarios where specific MR modalities of patients are unavailable. Extensive experiments further demonstrate the versatility of our method, providing unique insights into medical image translation. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.00217 [pdf]

doi 10.1371/journal.pclm.0000429

Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias

Authors: S. Lee, T. Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E. W. Maibach, A. Leiserowitz

Abstract: Large language models (LLMs) have demonstrated their potential in social science research by emulating human perceptions and behaviors, a concept referred to as algorithmic fidelity. This study assesses the algorithmic fidelity and bias of LLMs by utilizing two nationally representative climate change surveys. The LLMs were conditioned on demographics and/or psychological covariates to simulate su… ▽ More Large language models (LLMs) have demonstrated their potential in social science research by emulating human perceptions and behaviors, a concept referred to as algorithmic fidelity. This study assesses the algorithmic fidelity and bias of LLMs by utilizing two nationally representative climate change surveys. The LLMs were conditioned on demographics and/or psychological covariates to simulate survey responses. The findings indicate that LLMs can effectively capture presidential voting behaviors but encounter challenges in accurately representing global warming perspectives when relevant covariates are not included. GPT-4 exhibits improved performance when conditioned on both demographics and covariates. However, disparities emerge in LLM estimations of the views of certain groups, with LLMs tending to underestimate worry about global warming among Black Americans. While highlighting the potential of LLMs to aid social science research, these results underscore the importance of meticulous conditioning, model selection, survey question format, and bias assessment when employing LLMs for survey simulation. Further investigation into prompt engineering and algorithm auditing is essential to harness the power of LLMs while addressing their inherent limitations. △ Less

Submitted 7 February, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: 34 pages, 6 figures, 1 table

Journal ref: PLOS Climate, 3(2024), e0000429

arXiv:2310.05663 [pdf, ps, other]

Secondary proximity effect in a side-coupled double quantum dot structure

Authors: Jia-Ning Wang, Yong-Chen Xiong, Wang-Huai Zhou, Tan Peng, Ziyu Wang

Abstract: Semiconductor quantum dots in close proximity to superconductors may provoke localized bound states within the superconducting energy gap known as Yu-Shiba-Rusinov (YSR) state, which is a promising candidate for constructing Majorana zero modes and topological qubits. Side-coupled double quantum dot systems are ideal platforms revealing the secondary proximity effect. Numerical renormalization gro… ▽ More Semiconductor quantum dots in close proximity to superconductors may provoke localized bound states within the superconducting energy gap known as Yu-Shiba-Rusinov (YSR) state, which is a promising candidate for constructing Majorana zero modes and topological qubits. Side-coupled double quantum dot systems are ideal platforms revealing the secondary proximity effect. Numerical renormalization group calculations show that, if the central quantum dot can be treated as a noninteracting resonant level, it acts as a superconducting medium due to the ordinary proximity effect. The bound state in the side dot behaves as the case of a single impurity connected to two superconducting leads. The side dot undergoes quantum phase transitions between a singlet state and a doublet state as the Coulomb repulsion, the interdot coupling strength, or the energy level sweeps. Phase diagrams indicate that the phase boundaries could be well illustrated by $Δ\approx c {T_{K2}}$ in all cases, with $Δ$ is the superconducting gap, $T_{K2}$ is the side Kondo temperature and $c$ is of the order $1.0$. These findings offer valuable insights into the secondary proximity effect, and show great importance for designing superconducting quantum devices. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.04153 [pdf, other]

Fair coins tend to land on the same side they started: Evidence from 350,757 flips

Authors: František Bartoš, Alexandra Sarafoglou, Henrik R. Godmann, Amir Sahrani, David Klein Leunk, Pierre Y. Gui, David Voss, Kaleem Ullah, Malte J. Zoubek, Franziska Nippold, Frederik Aust, Felipe F. Vieira, Chris-Gabriel Islam, Anton J. Zoubek, Sara Shabani, Jonas Petter, Ingeborg B. Roos, Adam Finnemann, Aaron B. Lob, Madlen F. Hoffstadt, Jason Nak, Jill de Ron, Koen Derks, Karoline Huth, Sjoerd Terpstra , et al. (25 additional authors not shown)

Abstract: Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on… ▽ More Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on the same side it started -- DHM estimated the probability of a same-side outcome to be about 51%. Our data lend strong support to this precise prediction: the coins landed on the same side more often than not, $\text{Pr}(\text{same side}) = 0.508$, 95% credible interval (CI) [$0.506$, $0.509$], $\text{BF}_{\text{same-side bias}} = 2359$. Furthermore, the data revealed considerable between-people variation in the degree of this same-side bias. Our data also confirmed the generic prediction that when people flip an ordinary coin -- with the initial side-up randomly determined -- it is equally likely to land heads or tails: $\text{Pr}(\text{heads}) = 0.500$, 95% CI [$0.498$, $0.502$], $\text{BF}_{\text{heads-tails bias}} = 0.182$. Furthermore, this lack of heads-tails bias does not appear to vary across coins. Additional exploratory analyses revealed that the within-people same-side bias decreased as more coins were flipped, an effect that is consistent with the possibility that practice makes people flip coins in a less wobbly fashion. Our data therefore provide strong evidence that when some (but not all) people flip a fair coin, it tends to land on the same side it started. Our data provide compelling statistical support for the DHM physics model of coin tossing. △ Less

Submitted 2 June, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.02097 [pdf, other]

Leveraging Classic Deconvolution and Feature Extraction in Zero-Shot Image Restoration

Authors: Tomáš Chobola, Gesine Müller, Veit Dausmann, Anton Theileis, Jan Taucher, Jan Huisken, Tingying Peng

Abstract: Non-blind deconvolution aims to restore a sharp image from its blurred counterpart given an obtained kernel. Existing deep neural architectures are often built based on large datasets of sharp ground truth images and trained with supervision. Sharp, high quality ground truth images, however, are not always available, especially for biomedical applications. This severely hampers the applicability o… ▽ More Non-blind deconvolution aims to restore a sharp image from its blurred counterpart given an obtained kernel. Existing deep neural architectures are often built based on large datasets of sharp ground truth images and trained with supervision. Sharp, high quality ground truth images, however, are not always available, especially for biomedical applications. This severely hampers the applicability of current approaches in practice. In this paper, we propose a novel non-blind deconvolution method that leverages the power of deep learning and classic iterative deconvolution algorithms. Our approach combines a pre-trained network to extract deep features from the input image with iterative Richardson-Lucy deconvolution steps. Subsequently, a zero-shot optimisation process is employed to integrate the deconvolved features, resulting in a high-quality reconstructed image. By performing the preliminary reconstruction with the classic iterative deconvolution method, we can effectively utilise a smaller network to produce the final image, thus accelerating the reconstruction whilst reducing the demand for valuable computational resources. Our method demonstrates significant improvements in various real-world applications non-blind deconvolution tasks. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.01865 [pdf, other]

BigFUSE: Global Context-Aware Image Fusion in Dual-View Light-Sheet Fluorescence Microscopy with Image Formation Prior

Authors: Yu Liu, Gesine Muller, Nassir Navab, Carsten Marr, Jan Huisken, Tingying Peng

Abstract: Light-sheet fluorescence microscopy (LSFM), a planar illumination technique that enables high-resolution imaging of samples, experiences defocused image quality caused by light scattering when photons propagate through thick tissues. To circumvent this issue, dualview imaging is helpful. It allows various sections of the specimen to be scanned ideally by viewing the sample from opposing orientatio… ▽ More Light-sheet fluorescence microscopy (LSFM), a planar illumination technique that enables high-resolution imaging of samples, experiences defocused image quality caused by light scattering when photons propagate through thick tissues. To circumvent this issue, dualview imaging is helpful. It allows various sections of the specimen to be scanned ideally by viewing the sample from opposing orientations. Recent image fusion approaches can then be applied to determine in-focus pixels by comparing image qualities of two views locally and thus yield spatially inconsistent focus measures due to their limited field-of-view. Here, we propose BigFUSE, a global context-aware image fuser that stabilizes image fusion in LSFM by considering the global impact of photon propagation in the specimen while determining focus-defocus based on local image qualities. Inspired by the image formation prior in dual-view LSFM, image fusion is considered as estimating a focus-defocus boundary using Bayes Theorem, where (i) the effect of light scattering onto focus measures is included within Likelihood; and (ii) the spatial consistency regarding focus-defocus is imposed in Prior. The expectation-maximum algorithm is then adopted to estimate the focus-defocus boundary. Competitive experimental results show that BigFUSE is the first dual-view LSFM fuser that is able to exclude structured artifacts when fusing information, highlighting its abilities of automatic image fusion. △ Less

Submitted 3 November, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: paper in MICCAI 2023

arXiv:2308.07708 [pdf, ps, other]

A Real-time Non-contact Localization Method for Faulty Electric Energy Storage Components using Highly Sensitive Magnetometers

Authors: Tonghui Peng, Wei Gao, Ya Wu, Yulong Ma, Shiwu Zhang, Yinan Hu

Abstract: With the wide application of electric energy storage component arrays, such as battery arrays, capacitor arrays, inductor arrays, their potential safety risks have gradually drawn the public attention. However, existing technologies cannot meet the needs of non-contact and real-time diagnosis for faulty components inside these massive arrays. To solve this problem, this paper proposes a new method… ▽ More With the wide application of electric energy storage component arrays, such as battery arrays, capacitor arrays, inductor arrays, their potential safety risks have gradually drawn the public attention. However, existing technologies cannot meet the needs of non-contact and real-time diagnosis for faulty components inside these massive arrays. To solve this problem, this paper proposes a new method based on the beamforming spatial filtering algorithm to precisely locate the faulty components within the arrays in real-time. The method uses highly sensitive magnetometers to collect the magnetic signals from energy storage component arrays, without damaging or even contacting any component. The experimental results demonstrate the potential of the proposed method in securing energy storage component arrays. Within an imaging area of 80 mm $\times$ 80 mm, the one faulty component out of nine total components can be localized with an accuracy of 0.72 mm for capacitor arrays and 1.60 mm for battery arrays. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2308.02038 [pdf, other]

CLGT: A Graph Transformer for Student Performance Prediction in Collaborative Learning

Authors: Tianhao Peng, Yu Liang, Wenjun Wu, Jian Ren, Zhao Pengrui, Yanjun Pu

Abstract: Modeling and predicting the performance of students in collaborative learning paradigms is an important task. Most of the research presented in literature regarding collaborative learning focuses on the discussion forums and social learning networks. There are only a few works that investigate how students interact with each other in team projects and how such interactions affect their academic pe… ▽ More Modeling and predicting the performance of students in collaborative learning paradigms is an important task. Most of the research presented in literature regarding collaborative learning focuses on the discussion forums and social learning networks. There are only a few works that investigate how students interact with each other in team projects and how such interactions affect their academic performance. In order to bridge this gap, we choose a software engineering course as the study subject. The students who participate in a software engineering course are required to team up and complete a software project together. In this work, we construct an interaction graph based on the activities of students grouped in various teams. Based on this student interaction graph, we present an extended graph transformer framework for collaborative learning (CLGT) for evaluating and predicting the performance of students. Moreover, the proposed CLGT contains an interpretation module that explains the prediction results and visualizes the student interaction patterns. The experimental results confirm that the proposed CLGT outperforms the baseline models in terms of performing predictions based on the real-world datasets. Moreover, the proposed CLGT differentiates the students with poor performance in the collaborative learning paradigm and gives teachers early warnings, so that appropriate assistance can be provided. △ Less

Submitted 30 July, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures, conference: AAAI

arXiv:2307.14522 [pdf, other]

doi 10.1145/3582515.3609559

CliniDigest: A Case Study in Large Language Model Based Large-Scale Summarization of Clinical Trial Descriptions

Authors: Renee D. White, Tristan Peng, Pann Sripitak, Alexander Rosenberg Johansen, Michael Snyder

Abstract: A clinical trial is a study that evaluates new biomedical interventions. To design new trials, researchers draw inspiration from those current and completed. In 2022, there were on average more than 100 clinical trials submitted to ClinicalTrials.gov every day, with each trial having a mean of approximately 1500 words [1]. This makes it nearly impossible to keep up to date. To mitigate this issue,… ▽ More A clinical trial is a study that evaluates new biomedical interventions. To design new trials, researchers draw inspiration from those current and completed. In 2022, there were on average more than 100 clinical trials submitted to ClinicalTrials.gov every day, with each trial having a mean of approximately 1500 words [1]. This makes it nearly impossible to keep up to date. To mitigate this issue, we have created a batch clinical trial summarizer called CliniDigest using GPT-3.5. CliniDigest is, to our knowledge, the first tool able to provide real-time, truthful, and comprehensive summaries of clinical trials. CliniDigest can reduce up to 85 clinical trial descriptions (approximately 10,500 words) into a concise 200-word summary with references and limited hallucinations. We have tested CliniDigest on its ability to summarize 457 trials divided across 27 medical subdomains. For each field, CliniDigest generates summaries of $μ=153,\ σ=69 $ words, each of which utilizes $μ=54\%,\ σ=30\% $ of the sources. A more comprehensive evaluation is planned and outlined in this paper. △ Less

Submitted 31 July, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: 7 pages, 3 figures, 3 tables, conference: ACM GoodIt 23'; Second co-author: Tristan Peng; Citation: White, Peng, et al

arXiv:2307.07998 [pdf, other]

LUCYD: A Feature-Driven Richardson-Lucy Deconvolution Network

Authors: Tomáš Chobola, Gesine Müller, Veit Dausmann, Anton Theileis, Jan Taucher, Jan Huisken, Tingying Peng

Abstract: The process of acquiring microscopic images in life sciences often results in image degradation and corruption, characterised by the presence of noise and blur, which poses significant challenges in accurately analysing and interpreting the obtained data. This paper proposes LUCYD, a novel method for the restoration of volumetric microscopy images that combines the Richardson-Lucy deconvolution fo… ▽ More The process of acquiring microscopic images in life sciences often results in image degradation and corruption, characterised by the presence of noise and blur, which poses significant challenges in accurately analysing and interpreting the obtained data. This paper proposes LUCYD, a novel method for the restoration of volumetric microscopy images that combines the Richardson-Lucy deconvolution formula and the fusion of deep features obtained by a fully convolutional network. By integrating the image formation process into a feature-driven restoration model, the proposed approach aims to enhance the quality of the restored images whilst reducing computational costs and maintaining a high degree of interpretability. Our results demonstrate that LUCYD outperforms the state-of-the-art methods in both synthetic and real microscopy images, achieving superior performance in terms of image quality and generalisability. We show that the model can handle various microscopy modalities and different imaging conditions by evaluating it on two different microscopy datasets, including volumetric widefield and light-sheet microscopy. Our experiments indicate that LUCYD can significantly improve resolution, contrast, and overall quality of microscopy images. Therefore, it can be a valuable tool for microscopy image restoration and can facilitate further research in various microscopy applications. We made the source code for the model accessible under https://github.com/ctom2/lucyd-deconvolution. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: Accepted by 26th International Conference on Medical Image Computing and Computer Assisted Intervention

arXiv:2306.17088 [pdf, other]

Pupil-driven quantitative differential phase contrast imaging

Authors: Shuhe Zhang, Hao Wu, Tao Peng, Zeyu Ke, Meng Shao, Tos T. J. M. Berendschot, Jinhua Zhou

Abstract: In this research, we reveal the inborn but hitherto ignored properties of quantitative differential phase contrast (qDPC) imaging: the phase transfer function being an edge detection filter. Inspired by this, we highlighted the duality of qDPC between optics and pattern recognition, and propose a simple and effective qDPC reconstruction algorithm, termed Pupil-Driven qDPC (pd-qDPC), to facilitate… ▽ More In this research, we reveal the inborn but hitherto ignored properties of quantitative differential phase contrast (qDPC) imaging: the phase transfer function being an edge detection filter. Inspired by this, we highlighted the duality of qDPC between optics and pattern recognition, and propose a simple and effective qDPC reconstruction algorithm, termed Pupil-Driven qDPC (pd-qDPC), to facilitate the phase reconstruction quality for the family of qDPC-based phase reconstruction algorithms. We formed a new cost function in which modified L0-norm was used to represent the pupil-driven edge sparsity, and the qDPC convolution operator is duplicated in the data fidelity term to achieve automatic background removal. Further, we developed the iterative reweighted soft-threshold algorithms based on split Bregman method to solve this modified L0-norm problem. We tested pd-qDPC on both simulated and experimental data and compare against state-of-the-art (SOTA) methods including L2-norm, total variation regularization (TV-qDPC), isotropic-qDPC, and Retinex qDPC algorithms. Results show that our proposed model is superior in terms of phase reconstruction quality and implementation efficiency, in which it significantly increases the experimental robustness while maintaining the data fidelity. In general, the pd-qDPC enables the high-quality qDPC reconstruction without any modification of the optical system. It simplifies the system complexity and benefits the qDPC community and beyond including but not limited to cell segmentation and PTF learning based on the edge filtering property. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.14913 [pdf, other]

FSUIE: A Novel Fuzzy Span Mechanism for Universal Information Extraction

Authors: Tianshuo Peng, Zuchao Li, Lefei Zhang, Bo Du, Hai Zhao

Abstract: Universal Information Extraction (UIE) has been introduced as a unified framework for various Information Extraction (IE) tasks and has achieved widespread success. Despite this, UIE models have limitations. For example, they rely heavily on span boundaries in the data during training, which does not reflect the reality of span annotation challenges. Slight adjustments to positions can also meet r… ▽ More Universal Information Extraction (UIE) has been introduced as a unified framework for various Information Extraction (IE) tasks and has achieved widespread success. Despite this, UIE models have limitations. For example, they rely heavily on span boundaries in the data during training, which does not reflect the reality of span annotation challenges. Slight adjustments to positions can also meet requirements. Additionally, UIE models lack attention to the limited span length feature in IE. To address these deficiencies, we propose the Fuzzy Span Universal Information Extraction (FSUIE) framework. Specifically, our contribution consists of two concepts: fuzzy span loss and fuzzy span attention. Our experimental results on a series of main IE tasks show significant improvement compared to the baseline, especially in terms of fast convergence and strong performance with small amounts of data and training epochs. These results demonstrate the effectiveness and generalization of FSUIE in different tasks, settings, and scenarios. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: ACL2023

arXiv:2306.11368 [pdf, other]

RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation

Authors: Ruohong Mei, Wei Sui, Jiaxin Zhang, Xue Qin, Gang Wang, Tao Peng, Cong Yang

Abstract: In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational ef… ▽ More In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational efficiency, we propose a waypoint sampling strategy, enabling RoMe to reconstruct vast environments by focusing on sub-areas and subsequently merging them. Furthermore, we incorporate an extrinsic optimization module to enhance the robustness against inaccuracies in extrinsic calibration. Our extensive evaluations of both public datasets and wild data underscore RoMe's superiority in terms of speed, accuracy, and robustness. For instance, it costs only 2 GPU hours to recover a road surface of 600*600 square meters from thousands of images. Notably, RoMe's capability extends beyond mere reconstruction, offering significant value for autolabeling tasks in autonomous driving applications. All related data and code are available at https://github.com/DRosemei/RoMe. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Published in: IEEE Transactions on Intelligent Vehicles

arXiv:2305.14450 [pdf, other]

An Empirical Study on Information Extraction using Large Language Models

Authors: Ridong Han, Chaohao Yang, Tao Peng, Prayag Tiwari, Xiang Wan, Lu Liu, Benyou Wang

Abstract: Human-like large language models (LLMs), especially the most powerful and popular ones in OpenAI's GPT family, have proven to be very helpful for many natural language processing (NLP) related tasks. Therefore, various attempts have been made to apply LLMs to information extraction (IE), which is a fundamental NLP task that involves extracting information from unstructured plain text. To demonstra… ▽ More Human-like large language models (LLMs), especially the most powerful and popular ones in OpenAI's GPT family, have proven to be very helpful for many natural language processing (NLP) related tasks. Therefore, various attempts have been made to apply LLMs to information extraction (IE), which is a fundamental NLP task that involves extracting information from unstructured plain text. To demonstrate the latest representative progress in LLMs' information extraction ability, we assess the information extraction ability of GPT-4 (the latest version of GPT at the time of writing this paper) from four perspectives: Performance, Evaluation Criteria, Robustness, and Error Types. Our results suggest a visible performance gap between GPT-4 and state-of-the-art (SOTA) IE methods. To alleviate this problem, considering the LLMs' human-like characteristics, we propose and analyze the effects of a series of simple prompt-based methods, which can be generalized to other LLMs and NLP tasks. Rich experiments show our methods' effectiveness and some of their remaining issues in improving GPT-4's information extraction ability. △ Less

Submitted 10 September, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 52 pages, Version 2.0; This article has an original arxiv version entitled "Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors'', whose url link is arXiv:2305.14450v1

arXiv:2305.14243 [pdf, other]

Training Transitive and Commutative Multimodal Transformers with LoReTTa

Authors: Manuel Tran, Yashin Dicente Cid, Amal Lahiani, Fabian J. Theis, Tingying Peng, Eldad Klaiman

Abstract: Training multimodal foundation models is challenging due to the limited availability of multimodal datasets. While many public datasets pair images with text, few combine images with audio or text with audio. Even rarer are datasets that align all three modalities at once. Critical domains such as healthcare, infrastructure, or transportation are particularly affected by missing modalities. This m… ▽ More Training multimodal foundation models is challenging due to the limited availability of multimodal datasets. While many public datasets pair images with text, few combine images with audio or text with audio. Even rarer are datasets that align all three modalities at once. Critical domains such as healthcare, infrastructure, or transportation are particularly affected by missing modalities. This makes it difficult to integrate all modalities into a large pre-trained neural network that can be used out-of-the-box or fine-tuned for different downstream tasks. We introduce LoReTTa (Linking mOdalities with a tRansitive and commutativE pre-Training sTrAtegy) to address this understudied problem. Our self-supervised framework unifies causal modeling and masked modeling with the rules of commutativity and transitivity. This allows us to transition within and between modalities. As a result, our pre-trained models are better at exploring the true underlying joint probability distribution. Given a dataset containing only the disjoint combinations (A, B) and (B, C), LoReTTa can model the relation A <-> C with A <-> B <-> C. In particular, we show that a transformer pre-trained with LoReTTa can handle any mixture of modalities at inference time, including the never-seen pair (A, C) and the triplet (A, B, C). We extensively evaluate our approach on a synthetic, medical, and reinforcement learning dataset. Across different domains, our universal multimodal transformer consistently outperforms strong baselines such as GPT, BERT, and CLIP on tasks involving the missing modality tuple. △ Less

Submitted 16 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted at NeurIPS 2023 (poster). Camera-ready version

arXiv:2305.02542 [pdf, other]

Correcting for Interference in Experiments: A Case Study at Douyin

Authors: Vivek F. Farias, Hao Li, Tianyi Peng, Xinyuyang Ren, Huawei Zhang, Andrew Zheng

Abstract: Interference is a ubiquitous problem in experiments conducted on two-sided content marketplaces, such as Douyin (China's analog of TikTok). In many cases, creators are the natural unit of experimentation, but creators interfere with each other through competition for viewers' limited time and attention. "Naive" estimators currently used in practice simply ignore the interference, but in doing so i… ▽ More Interference is a ubiquitous problem in experiments conducted on two-sided content marketplaces, such as Douyin (China's analog of TikTok). In many cases, creators are the natural unit of experimentation, but creators interfere with each other through competition for viewers' limited time and attention. "Naive" estimators currently used in practice simply ignore the interference, but in doing so incur bias on the order of the treatment effect. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, are impractically high variance. We introduce a novel Monte-Carlo estimator, based on "Differences-in-Qs" (DQ) techniques, which achieves bias that is second-order in the treatment effect, while remaining sample-efficient to estimate. On the theoretical side, our contribution is to develop a generalized theory of Taylor expansions for policy evaluation, which extends DQ theory to all major MDP formulations. On the practical side, we implement our estimator on Douyin's experimentation platform, and in the process develop DQ into a truly "plug-and-play" estimator for interference in real-world settings: one which provides robust, low-bias, low-variance treatment effect estimates; admits computationally cheap, asymptotically exact uncertainty quantification; and reduces MSE by 99\% compared to the best existing alternatives in our applications. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Showing 1–50 of 202 results for author: Peng, T