Skip to main content

Showing 1–50 of 2,147 results for author: Chen, S

  1. arXiv:2409.03684  [pdf, ps, other

    quant-ph cs.DS cs.LG

    Predicting quantum channels over general product distributions

    Authors: Sitan Chen, Jaume de Dios Pont, Jun-Ting Hsieh, Hsin-Yuan Huang, Jane Lange, Jerry Li

    Abstract: We investigate the problem of predicting the output behavior of unknown quantum channels. Given query access to an $n$-qubit channel $E$ and an observable $O$, we aim to learn the mapping \begin{equation*} ρ\mapsto \mathrm{Tr}(O E[ρ]) \end{equation*} to within a small error for most $ρ$ sampled from a distribution $D$. Previously, Huang, Chen, and Preskill proved a surprising result that even if… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 20 pages, comments welcome

  2. arXiv:2409.03213  [pdf, other

    cs.CV

    Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

    Authors: Shen Chen, Jiale Zhou, Lei Li

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a promising approach for 3D scene representation, offering a reduction in computational overhead compared to Neural Radiance Fields (NeRF). However, 3DGS is susceptible to high-frequency artifacts and demonstrates suboptimal performance under sparse viewpoint conditions, thereby limiting its applicability in robotics and computer vision. To address these… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.02889  [pdf, other

    cs.CL cs.AI cs.CV cs.MM

    LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

    Authors: Xidong Wang, Dingjie Song, Shunian Chen, Chen Zhang, Benyou Wang

    Abstract: Expanding the long-context capabilities of Multi-modal Large Language Models~(MLLMs) is crucial for video understanding, high-resolution image understanding, and multi-modal agents. This involves a series of systematic optimizations, including model architecture, data construction and training strategy, particularly addressing challenges such as \textit{degraded performance with more images} and \… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 19 pages, 7 figures, 6 tables

  4. Does the Vulnerability Threaten Our Projects? Automated Vulnerable API Detection for Third-Party Libraries

    Authors: Fangyuan Zhang, Lingling Fan, Sen Chen, Miaoying Cai, Sihan Xu, Lida Zhao

    Abstract: Developers usually use TPLs to facilitate the development of the projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and fur… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 15 pages, 4 figures

  5. arXiv:2409.02650  [pdf, other

    cs.CR cs.ET

    SoK: Bitcoin Layer Two (L2)

    Authors: Minfeng Qi, Qin Wang, Zhipeng Wang, Manvir Schneider, Tianqing Zhu, Shiping Chen, William Knottenbelt, Thomas Hardjono

    Abstract: We present the first Systematization of Knowledge (SoK) on constructing Layer Two (L2) solutions for Bitcoin. We carefully examine a representative subset of ongoing Bitcoin L2 solutions (40 out of 335 extensively investigated cases) and provide a concise yet impactful identification of six classic design patterns through two approaches (i.e., modifying transactions \& creating proofs). Notably,… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  6. arXiv:2409.02426  [pdf, other

    cs.LG cs.CV

    Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

    Authors: Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu

    Abstract: Recent empirical studies have demonstrated that diffusion models can effectively learn the image distribution and generate new samples. Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging key empirical observ… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 39 pages, 9 figures

  7. arXiv:2409.02374  [pdf, other

    cs.CV cs.LG

    Exploring Low-Dimensional Subspaces in Diffusion Models for Controllable Image Editing

    Authors: Siyi Chen, Huijie Zhang, Minzhe Guo, Yifu Lu, Peng Wang, Qing Qu

    Abstract: Recently, diffusion models have emerged as a powerful class of generative models. Despite their success, there is still limited understanding of their semantic spaces. This makes it challenging to achieve precise and disentangled image generation without additional training, especially in an unsupervised way. In this work, we improve the understanding of their semantic spaces from intriguing obser… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  8. arXiv:2409.02371  [pdf, other

    cs.CV

    Unfolding Videos Dynamics via Taylor Expansion

    Authors: Siyi Chen, Minkyu Choi, Zesen Zhao, Kuan Han, Qing Qu, Zhongming Liu

    Abstract: Taking inspiration from physical motion, we present a new self-supervised dynamics learning strategy for videos: Video Time-Differentiation for Instance Discrimination (ViDiDi). ViDiDi is a simple and data-efficient strategy, readily applicable to existing self-supervised video representation learning frameworks based on instance discrimination. At its core, ViDiDi observes different aspects of a… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  9. arXiv:2409.02346  [pdf, other

    cs.LG cs.DC

    Robust Federated Finetuning of Foundation Models via Alternating Minimization of LoRA

    Authors: Shuangyi Chen, Yue Ju, Hardik Dalal, Zhongwen Zhu, Ashish Khisti

    Abstract: Parameter-Efficient Fine-Tuning (PEFT) has risen as an innovative training strategy that updates only a select few model parameters, significantly lowering both computational and memory demands. PEFT also helps to decrease data transfer in federated learning settings, where communication depends on the size of updates. In this work, we explore the constraints of previous studies that integrate a w… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Presented at ES-FOMO-II@ICML2024

  10. arXiv:2409.01931  [pdf, other

    physics.chem-ph cs.AI cs.LG physics.bio-ph physics.comp-ph

    On the design space between molecular mechanics and machine learning force fields

    Authors: Yuanqing Wang, Kenichiro Takaba, Michael S. Chen, Marcus Wieder, Yuzhi Xu, Tong Zhu, John Z. H. Zhang, Arnav Nagle, Kuang Yu, Xinyan Wang, Daniel J. Cole, Joshua A. Rackers, Kyunghyun Cho, Joe G. Greener, Peter Eastman, Stefano Martiniani, Mark E. Tuckerman

    Abstract: A force field as accurate as quantum mechanics (QM) and as fast as molecular mechanics (MM), with which one can simulate a biomolecular system efficiently enough and meaningfully enough to get quantitative insights, is among the most ardent dreams of biophysicists -- a dream, nevertheless, not to be fulfilled any time soon. Machine learning force fields (MLFFs) represent a meaningful endeavor towa… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  11. arXiv:2409.01782  [pdf, other

    cs.CV

    UWStereo: A Large Synthetic Dataset for Underwater Stereo Matching

    Authors: Qingxuan Lv, Junyu Dong, Yuezun Li, Sheng Chen, Hui Yu, Shu Zhang, Wenhan Wang

    Abstract: Despite recent advances in stereo matching, the extension to intricate underwater settings remains unexplored, primarily owing to: 1) the reduced visibility, low contrast, and other adverse effects of underwater images; 2) the difficulty in obtaining ground truth data for training deep learning models, i.e. simultaneously capturing an image and estimating its corresponding pixel-wise depth informa… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 12pages

  12. arXiv:2409.01722  [pdf, other

    cs.CR cs.DC

    ACCESS-FL: Agile Communication and Computation for Efficient Secure Aggregation in Stable Federated Learning Networks

    Authors: Niousha Nazemi, Omid Tavallaie, Shuaijun Chen, Anna Maria Mandalari, Kanchana Thilakarathna, Ralph Holz, Hamed Haddadi, Albert Y. Zomaya

    Abstract: Federated Learning (FL) is a promising distributed learning framework designed for privacy-aware applications. FL trains models on client devices without sharing the client's data and generates a global model on a server by aggregating model updates. Traditional FL approaches risk exposing sensitive client data when plain model updates are transmitted to the server, making them vulnerable to secur… ▽ More

    Submitted 4 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  13. arXiv:2409.01282  [pdf

    cs.CV cs.CR cs.LG

    One-Index Vector Quantization Based Adversarial Attack on Image Classification

    Authors: Haiju Fan, Xiaona Qin, Shuang Chen, Hubert P. H. Shum, Ming Li

    Abstract: To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  14. arXiv:2409.00858  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving

    Authors: Zilin Huang, Zihao Sheng, Sikai Chen

    Abstract: In the field of autonomous driving, developing safe and trustworthy autonomous driving policies remains a significant challenge. Recently, Reinforcement Learning with Human Feedback (RLHF) has attracted substantial attention due to its potential to enhance training safety and sampling efficiency. Nevertheless, existing RLHF-enabled methods often falter when faced with imperfect human demonstration… ▽ More

    Submitted 5 September, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: 33 pages, 20 figures

  15. arXiv:2409.00022  [pdf

    cs.MM cs.AI cs.CV

    Detecting Misinformation in Multimedia Content through Cross-Modal Entity Consistency: A Dual Learning Approach

    Authors: Zhe Fu, Kanlun Wang, Wangjiaxuan Xin, Lina Zhou, Shi Chen, Yaorong Ge, Daniel Janies, Dongsong Zhang

    Abstract: The landscape of social media content has evolved significantly, extending from text to multimodal formats. This evolution presents a significant challenge in combating misinformation. Previous research has primarily focused on single modalities or text-image combinations, leaving a gap in detecting multimodal misinformation. While the concept of entity consistency holds promise in detecting multi… ▽ More

    Submitted 16 August, 2024; originally announced September 2024.

    Comments: Accepted to PACIS 2024. 15 pages, 3 figures

    Journal ref: https://aisel.aisnet.org/pacis2024/track07_secprivacy/track07_secprivacy/2

  16. arXiv:2408.17380  [pdf, other

    cs.AI cs.LG

    Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control

    Authors: Zihao Sheng, Zilin Huang, Sikai Chen

    Abstract: Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency compared to model-free RL by utilizing a virtual environment model. However, it is challenging to obtain sufficiently accurate representations of the environmental dynamics due to uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performa… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  17. arXiv:2408.17272  [pdf, ps, other

    cs.CR cs.DM cs.IT math.NT

    Further Investigation on Differential Properties of the Generalized Ness-Helleseth Function

    Authors: Yongbo Xia, Chunlei Li, Furong Bao, Shaoping Chen, Tor Helleseth

    Abstract: Let $n$ be an odd positive integer, $p$ be a prime with $p\equiv3\pmod4$, $d_{1} = {{p^{n}-1}\over {2}} -1 $ and $d_{2} =p^{n}-2$. The function defined by $f_u(x)=ux^{d_{1}}+x^{d_{2}}$ is called the generalized Ness-Helleseth function over $\mathbb{F}_{p^n}$, where $u\in\mathbb{F}_{p^n}$. It was initially studied by Ness and Helleseth in the ternary case. In this paper, for $p^n \equiv 3 \pmod 4$… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 34 pages

    MSC Class: 94A60; 11T71; 11T06; 05-08

  18. arXiv:2408.17065  [pdf, other

    cs.CV

    Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning

    Authors: Zhiyuan Yan, Yandan Zhao, Shen Chen, Xinghe Fu, Taiping Yao, Shouhong Ding, Li Yuan

    Abstract: Three key challenges hinder the development of current deepfake video detection: (1) Temporal features can be complex and diverse: how can we identify general temporal artifacts to enhance model generalization? (2) Spatiotemporal models often lean heavily on one type of artifact and ignore the other: how can we ensure balanced learning from both? (3) Videos are naturally resource-intensive: how ca… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  19. arXiv:2408.16414  [pdf, other

    cs.LG cs.AI math.NA physics.comp-ph

    Fourier Spectral Physics Informed Neural Network: An Efficient and Low-Memory PINN

    Authors: Tianchi Yu, Yiming Qi, Ivan Oseledets, Shiyi Chen

    Abstract: With growing investigations into solving partial differential equations by physics-informed neural networks (PINNs), more accurate and efficient PINNs are required to meet the practical demands of scientific computing. One bottleneck of current PINNs is computing the high-order derivatives via automatic differentiation which often necessitates substantial computing resources. In this paper, we foc… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  20. arXiv:2408.16245  [pdf, other

    cs.LG q-bio.BM

    Large-Scale Multi-omic Biosequence Transformers for Modeling Peptide-Nucleotide Interactions

    Authors: Sully F. Chen, Robert J. Steele, Beakal Lemeneh, Shivanand P. Lad, Eric Oermann

    Abstract: The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. Almost all research on large-scale biosequence transformers has focused on one domain at a time (single-omic), usually nucleotides or peptides. These models have seen incredible success in downstream tasks in each domain and have achieved particu… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 27 pages, 5 figures

  21. arXiv:2408.16068  [pdf, other

    q-bio.GN cs.AI stat.ML

    Identification of Prognostic Biomarkers for Stage III Non-Small Cell Lung Carcinoma in Female Nonsmokers Using Machine Learning

    Authors: Huili Zheng, Qimin Zhang, Yiru Gong, Zheyan Liu, Shaohan Chen

    Abstract: Lung cancer remains a leading cause of cancer-related deaths globally, with non-small cell lung cancer (NSCLC) being the most common subtype. This study aimed to identify key biomarkers associated with stage III NSCLC in non-smoking females using gene expression profiling from the GDS3837 dataset. Utilizing XGBoost, a machine learning algorithm, the analysis achieved a strong predictive performanc… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted for publication in the IEEE ICBASE 2024 conference

  22. arXiv:2408.15815  [pdf, other

    cs.SE

    MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing

    Authors: Congying Xu, Songqiang Chen, Jiarong Wu, Shing-Chi Cheung, Valerio Terragni, Hengcheng Zhu, Jialun Cao

    Abstract: While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enha… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: This paper is accepted to ASE 2024

  23. arXiv:2408.15710  [pdf, other

    cs.CL

    Conan-embedding: General Text Embedding with More and Better Negative Samples

    Authors: Shiyu Li, Yang Tang, Shizhe Chen, Xi Chen

    Abstract: With the growing popularity of RAG, the capabilities of embedding models are gaining increasing attention. Embedding models are primarily trained through contrastive loss learning, with negative examples being a key component. Previous work has proposed various hard negative mining strategies, but these strategies are typically employed as preprocessing steps. In this paper, we propose the conan-e… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  24. arXiv:2408.15667  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers

    Authors: Qian Wang, Zhaoyang Bu, Jiaxuan Mao, Wenyu Zhu, Jingya Zhao, Wei Du, Guochao Shi, Min Zhou, Si Chen, Jieming Qu

    Abstract: Recent advancements in deep learning techniques have sparked performance boosts in various real-world applications including disease diagnosis based on multi-modal medical data. Cough sound data-based respiratory disease (e.g., COVID-19 and Chronic Obstructive Pulmonary Disease) diagnosis has also attracted much attention. However, existing works usually utilise traditional machine learning or dee… ▽ More

    Submitted 2 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  25. arXiv:2408.14954  [pdf, other

    cs.NI eess.SP

    Stochastic Geometry Based Modelling and Analysis of Uplink Cooperative Satellite-Aerial-Terrestrial Networks for Nomadic Communications with Weak Satellite Coverage

    Authors: Wen-Yu Dong, Shaoshi Yang, Ping Zhang, Sheng Chen

    Abstract: Cooperative satellite-aerial-terrestrial networks (CSATNs), where unmanned aerial vehicles (UAVs) are utilized as nomadic aerial relays (A), are highly valuable for many important applications, such as post-disaster urban reconstruction. In this scenario, direct communication between terrestrial terminals (T) and satellites (S) is often unavailable due to poor propagation conditions for satellite… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 17 pages, 16 pages, 2 tables, accepted to appear on IEEE Journal on Selected Areas in Communications, Aug. 2024

  26. arXiv:2408.14868  [pdf, other

    cs.CV

    ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

    Authors: Wenjin Hou, Dingjie Fu, Kun Li, Shiming Chen, Hehe Fan, Yi Yang

    Abstract: Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global visual features from Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for visual-semantic interactions. Due to the limited receptive f… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  27. arXiv:2408.14511  [pdf, other

    cs.AI cs.CL cs.LG math.ST stat.ML

    Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods

    Authors: Xinyang Hu, Fengzhuo Zhang, Siyu Chen, Zhuoran Yang

    Abstract: Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems using pretrained large language models (LLMs). In this work, we analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity. To this end, we introduce a multi-step latent variable model that… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 150 pages, 18 figures, 3 tables

  28. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  29. arXiv:2408.14144  [pdf, other

    cs.LG cs.DC

    Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness

    Authors: Boyuan Li, Zihao Peng, Yafei Li, Mingliang Xu, Shengbo Chen, Baofeng Ji, Cong Shen

    Abstract: Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techni… ▽ More

    Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  30. arXiv:2408.13983  [pdf, other

    cs.CV

    Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation

    Authors: Yushun Tang, Shuoshuo Chen, Zhihe Lu, Xinchao Wang, Zhihai He

    Abstract: Transformer-based methods have achieved remarkable success in various machine learning tasks. How to design efficient test-time adaptation methods for transformer models becomes an important research task. In this work, motivated by the dual-subband wavelet lifting scheme developed in multi-scale signal processing which is able to efficiently separate the input signals into principal components an… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  31. arXiv:2408.13890  [pdf, other

    cs.CV

    Making Large Language Models Better Planners with Reasoning-Decision Alignment

    Authors: Zhijian Huang, Tao Tang, Shaoxiang Chen, Sihao Lin, Zequn Jie, Lin Ma, Guangrun Wang, Xiaodan Liang

    Abstract: Data-driven approaches for autonomous driving (AD) have been widely adopted in the past decade but are confronted with dataset bias and uninterpretability. Inspired by the knowledge-driven nature of human driving, recent approaches explore the potential of large language models (LLMs) to improve understanding and decision-making in traffic scenarios. They find that the pretrain-finetune paradigm o… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  32. arXiv:2408.12815  [pdf, other

    cs.CV cs.AI

    Staircase Cascaded Fusion of Lightweight Local Pattern Recognition and Long-Range Dependencies for Structural Crack Segmentation

    Authors: Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mianzhao Wang, Shengyong Chen

    Abstract: Detecting cracks with pixel-level precision for key structures is a significant challenge, as existing methods struggle to effectively integrate local textures and pixel dependencies of cracks. Furthermore, these methods often possess numerous parameters and substantial computational requirements, complicating deployment on edge devices. In this paper, we propose a staircase cascaded fusion crack… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  33. arXiv:2408.12665  [pdf, ps, other

    cs.LG cs.AI cs.GR

    Fairness-Aware Streaming Feature Selection with Causal Graphs

    Authors: Leizhen Zhang, Lusi Li, Di Wu, Sheng Chen, Yi He

    Abstract: Its crux lies in the optimization of a tradeoff between accuracy and fairness of resultant models on the selected feature subset. The technical challenge of our setting is twofold: 1) streaming feature inputs, such that an informative feature may become obsolete or redundant for prediction if its information has been covered by other similar features that arrived prior to it, and 2) non-associatio… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by the 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2024)

  34. arXiv:2408.12354  [pdf, other

    eess.AS cs.SD

    LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

    Authors: Shihao Chen, Yu Gu, Jianwei Cui, Jie Zhang, Rilin Chen, Lirong Dai

    Abstract: Any-to-any singing voice conversion (SVC) aims to transfer a target singer's timbre to other songs using a short voice sample. However many diffusion model based any-to-any SVC methods, which have achieved impressive results, usually suffered from low efficiency caused by a mass of inference steps. In this paper, we propose LCM-SVC, a latent consistency distillation (LCD) based latent diffusion mo… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted to ISCSLP 2024. arXiv admin note: text overlap with arXiv:2406.05325

  35. arXiv:2408.11878  [pdf, other

    cs.CL cs.CE q-fin.CP

    Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

    Authors: Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu , et al. (14 additional authors not shown)

    Abstract: Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, table… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 33 pages, 13 figures

  36. arXiv:2408.11854  [pdf, other

    cs.CL cs.AI cs.LG

    When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications?

    Authors: Yanjun Gao, Skatje Myers, Shan Chen, Dmitriy Dligach, Timothy A Miller, Danielle Bitterman, Matthew Churpek, Majid Afshar

    Abstract: The introduction of Large Language Models (LLMs) has advanced data representation and analysis, bringing significant progress in their use for medical questions and answering. Despite these advancements, integrating tabular data, especially numerical data pivotal in clinical contexts, into LLM paradigms has not been thoroughly explored. In this study, we examine the effectiveness of vector represe… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Under review

  37. arXiv:2408.11138  [pdf, other

    cs.RO cs.CV

    Target-Oriented Object Grasping via Multimodal Human Guidance

    Authors: Pengwei Xie, Siang Chen, Dingchang Hu, Yixiang Dai, Kaiqin Yang, Guijin Wang

    Abstract: In the context of human-robot interaction and collaboration scenarios, robotic grasping still encounters numerous challenges. Traditional grasp detection methods generally analyze the entire scene to predict grasps, leading to redundancy and inefficiency. In this work, we reconsider 6-DoF grasp detection from a target-referenced perspective and propose a Target-Oriented Grasp Network (TOGNet). TOG… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024 Workshop on Assistive Computer Vision and Robotics (ACVR 2024)

  38. arXiv:2408.11085  [pdf, other

    cs.CV

    GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting

    Authors: Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan Hu, Zirui Wang, Ming Cheng, Victor Adrian Prisacariu, Tristan Braud

    Abstract: We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement framework, GSLoc. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences.… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: The project page is available at https://gsloc.active.vision

  39. arXiv:2408.10663  [pdf, other

    cs.CL

    REInstruct: Building Instruction Data from Unlabeled Corpus

    Authors: Shu Chen, Xinyan Guan, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

    Abstract: Manually annotating instruction data for large language models is difficult, costly, and hard to scale. Meanwhile, current automatic annotation methods typically rely on distilling synthetic data from proprietary LLMs, which not only limits the upper bound of the quality of the instruction data but also raises potential copyright issues. In this paper, we propose REInstruct, a simple and scalable… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL2024 Findings

  40. arXiv:2408.10039  [pdf, other

    cs.AI

    MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis

    Authors: Ruihui Hou, Shencheng Chen, Yongqi Fan, Lifeng Zhu, Jing Sun, Jingping Liu, Tong Ruan

    Abstract: Clinical diagnosis is critical in medical practice, typically requiring a continuous and evolving process that includes primary diagnosis, differential diagnosis, and final diagnosis. However, most existing clinical diagnostic tasks are single-step processes, which does not align with the complex multi-step diagnostic procedures found in real-world clinical settings. In this paper, we propose a mu… ▽ More

    Submitted 29 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  41. arXiv:2408.09261  [pdf, other

    cs.CV

    Adaptify: A Refined Adaptation Scheme for Frame Classification in Atrophic Gastritis Videos

    Authors: Zinan Xiong, Shuijiao Chen, Yizhe Zhang, Yu Cao, Benyuan Liu, Xiaowei Liu

    Abstract: Atrophic gastritis is a significant risk factor for developing gastric cancer. The incorporation of machine learning algorithms can efficiently elevate the possibility of accurately detecting atrophic gastritis. Nevertheless, when the trained model is applied in real-life circumstances, its output is often not consistently reliable. In this paper, we propose Adaptify, an adaptation scheme in which… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: ISBI 2024 Proceeding

  42. arXiv:2408.08699  [pdf, other

    cs.LG cs.DC

    RBLA: Rank-Based-LoRA-Aggregation for Fine-tuning Heterogeneous Models in FLaaS

    Authors: Shuaijun Chen, Omid Tavallaie, Niousha Nazemi, Albert Y. Zomaya

    Abstract: Federated Learning (FL) is a promising privacy-aware distributed learning framework that can be deployed on various devices, such as mobile phones, desktops, and devices equipped with CPUs or GPUs. In the context of server-based Federated Learning as a Service (FLaas), FL enables the central server to coordinate the training process across multiple devices without direct access to the local data,… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  43. arXiv:2408.08284  [pdf, other

    physics.chem-ph cs.LG

    Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning

    Authors: Frank Hu, Michael S. Chen, Grant M. Rotskoff, Matthew W. Kanan, Thomas E. Markland

    Abstract: Rapid determination of molecular structures can greatly accelerate workflows across many chemical disciplines. However, elucidating structure using only one-dimensional (1D) NMR spectra, the most readily accessible data, remains an extremely challenging problem because of the combinatorial explosion of the number of possible molecules as the number of constituent atoms is increased. Here, we intro… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  44. arXiv:2408.07967  [pdf, other

    cs.CV

    FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

    Authors: Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai

    Abstract: This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper i… ▽ More

    Submitted 19 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  45. arXiv:2408.07253  [pdf, other

    cs.LG cs.CV

    All-around Neural Collapse for Imbalanced Classification

    Authors: Enhao Zhang, Chaohua Li, Chuanxing Geng, Songcan Chen

    Abstract: Neural Collapse (NC) presents an elegant geometric structure that enables individual activations (features), class means and classifier (weights) vectors to reach \textit{optimal} inter-class separability during the terminal phase of training on a \textit{balanced} dataset. Once shifted to imbalanced classification, such an optimal structure of NC can be readily destroyed by the notorious \textit{… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  46. arXiv:2408.06967  [pdf, ps, other

    quant-ph cs.CC cs.DS cs.LG

    Stabilizer bootstrapping: A recipe for efficient agnostic tomography and magic estimation

    Authors: Sitan Chen, Weiyuan Gong, Qi Ye, Zhihan Zhang

    Abstract: We study the task of agnostic tomography: given copies of an unknown $n$-qubit state $ρ$ which has fidelity $τ$ with some state in a given class $C$, find a state which has fidelity $\ge τ- ε$ with $ρ$. We give a new framework, stabilizer bootstrapping, for designing computationally efficient protocols for this task, and use this to get new agnostic tomography protocols for the following classes:… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 68 pages

  47. arXiv:2408.06478  [pdf, other

    cs.CR cs.PL

    Theorem-Carrying-Transaction: Runtime Certification to Ensure Safety for Smart Contract Transactions

    Authors: Nikolaj S. Bjørner, Ashley J. Chen, Shuo Chen, Yang Chen, Zhongxin Guo, Tzu-Han Hsu, Peng Liu, Nanqing Luo

    Abstract: Security bugs and trapdoors in smart contracts have been impacting the Ethereum community since its inception. Conceptually, the 1.45-million Ethereum's contracts form a single "gigantic program" whose behaviors are determined by the complex reference-topology between the contracts. Can the Ethereum community be assured that this gigantic program conforms to its design-level safety properties, des… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  48. arXiv:2408.06445  [pdf, other

    cs.LG cs.AI

    Multi-View Neural Differential Equations for Continuous-Time Stream Data in Long-Term Traffic Forecasting

    Authors: Zibo Liu, Zhe Jiang, Shigang Chen

    Abstract: Long-term traffic flow forecasting plays a crucial role in intelligent transportation as it allows traffic managers to adjust their decisions in advance. However, the problem is challenging due to spatio-temporal correlations and complex dynamic patterns in continuous-time stream data. Neural Differential Equations (NDEs) are among the state-of-the-art methods for learning continuous-time traffic… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  49. arXiv:2408.05899  [pdf, other

    quant-ph cs.AI cs.LG

    Quantum Gradient Class Activation Map for Model Interpretability

    Authors: Hsin-Yi Lin, Huan-Hsin Tseng, Samuel Yen-Chi Chen, Shinjae Yoo

    Abstract: Quantum machine learning (QML) has recently made significant advancements in various topics. Despite the successes, the safety and interpretability of QML applications have not been thoroughly investigated. This work proposes using Variational Quantum Circuits (VQCs) for activation mapping to enhance model transparency, introducing the Quantum Gradient Class Activation Map (QGrad-CAM). This hybrid… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Submitted to IEEE SiPS 2024

  50. arXiv:2408.05849  [pdf, other

    cs.LG stat.ML

    An End-to-End Model for Time Series Classification In the Presence of Missing Values

    Authors: Pengshuai Yao, Mengna Liu, Xu Cheng, Fan Shi, Huan Li, Xiufeng Liu, Shengyong Chen

    Abstract: Time series classification with missing data is a prevalent issue in time series analysis, as temporal data often contain missing values in practical applications. The traditional two-stage approach, which handles imputation and classification separately, can result in sub-optimal performance as label information is not utilized in the imputation process. On the other hand, a one-stage approach ca… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.