-
CF-PRNet: Coarse-to-Fine Prototype Refining Network for Point Cloud Completion and Reconstruction
Authors:
Zhi Chen,
Tianqi Wei,
Zecheng Zhao,
Jia Syuen Lim,
Yadan Luo,
Hu Zhang,
Xin Yu,
Scott Chapman,
Zi Huang
Abstract:
In modern agriculture, precise monitoring of plants and fruits is crucial for tasks such as high-throughput phenotyping and automated harvesting. This paper addresses the challenge of reconstructing accurate 3D shapes of fruits from partial views, which is common in agricultural settings. We introduce CF-PRNet, a coarse-to-fine prototype refining network, leverages high-resolution 3D data during t…
▽ More
In modern agriculture, precise monitoring of plants and fruits is crucial for tasks such as high-throughput phenotyping and automated harvesting. This paper addresses the challenge of reconstructing accurate 3D shapes of fruits from partial views, which is common in agricultural settings. We introduce CF-PRNet, a coarse-to-fine prototype refining network, leverages high-resolution 3D data during the training phase but requires only a single RGB-D image for real-time inference. Our approach begins by extracting the incomplete point cloud data that constructed from a partial view of a fruit with a series of convolutional blocks. The extracted features inform the generation of scaling vectors that refine two sequentially constructed 3D mesh prototypes - one coarse and one fine-grained. This progressive refinement facilitates the detailed completion of the final point clouds, achieving detailed and accurate reconstructions. CF-PRNet demonstrates excellent performance metrics with a Chamfer Distance of 3.78, an F1 Score of 66.76%, a Precision of 56.56%, and a Recall of 85.31%, and win the first place in the Shape Completion and Reconstruction of Sweet Peppers Challenge.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Multiplex Graph Contrastive Learning with Soft Negatives
Authors:
Zhenhao Zhao,
Minhong Zhu,
Chen Wang,
Sijia Wang,
Jiqiang Zhang,
Li Chen,
Weiran Cai
Abstract:
Graph Contrastive Learning (GCL) seeks to learn nodal or graph representations that contain maximal consistent information from graph-structured data. While node-level contrasting modes are dominating, some efforts commence to explore consistency across different scales. Yet, they tend to lose consistent information and be contaminated by disturbing features. Here, we introduce MUX-GCL, a novel cr…
▽ More
Graph Contrastive Learning (GCL) seeks to learn nodal or graph representations that contain maximal consistent information from graph-structured data. While node-level contrasting modes are dominating, some efforts commence to explore consistency across different scales. Yet, they tend to lose consistent information and be contaminated by disturbing features. Here, we introduce MUX-GCL, a novel cross-scale contrastive learning paradigm that utilizes multiplex representations as effective patches. While this learning mode minimizes contaminating noises, a commensurate contrasting strategy using positional affinities further avoids information loss by correcting false negative pairs across scales. Extensive downstream experiments demonstrate that MUX-GCL yields multiple state-of-the-art results on public datasets. Our theoretical analysis further guarantees the new objective function as a stricter lower bound of mutual information of raw input features and output embeddings, which rationalizes this paradigm. Code is available at https://github.com/MUX-GCL/Code.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Characterization and Design of A Hollow Cylindrical Ultrasonic Motor
Authors:
Zhanyue Zhao,
Yang Wang,
Charles Bales,
Daniel Ruiz-Cadalso,
Howard Zheng,
Cosme Furlong-Vazquez,
Gregory Fischer
Abstract:
Piezoelectric ultrasonic motors perform the advantages of compact design, faster reaction time, and simpler setup compared to other motion units such as pneumatic and hydraulic motors, especially its non-ferromagnetic property makes it a perfect match in MRI-compatible robotics systems compared to traditional DC motors. Hollow shaft motors address the advantages of being lightweight and comparable…
▽ More
Piezoelectric ultrasonic motors perform the advantages of compact design, faster reaction time, and simpler setup compared to other motion units such as pneumatic and hydraulic motors, especially its non-ferromagnetic property makes it a perfect match in MRI-compatible robotics systems compared to traditional DC motors. Hollow shaft motors address the advantages of being lightweight and comparable to solid shafts of the same diameter, low rotational inertia, high tolerance to rotational imbalance due to low weight, and tolerance to high temperature due to low specific mass. This article presents a prototype of a hollow cylindrical ultrasonic motor (HCM) to perform direct drive, eliminate mechanical non-linearity, and reduce the size and complexity of the actuator or end effector assembly. Two equivalent HCMs are presented in this work, and under 50g prepressure on the rotor, it performed 383.3333rpm rotation speed and 57.3504mNm torque output when applying 282$V_{pp}$ driving voltage.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering
Authors:
Dafei Qin,
Hongyang Lin,
Qixuan Zhang,
Kaichun Qiao,
Longwen Zhang,
Zijun Zhao,
Jun Saito,
Jingyi Yu,
Lan Xu,
Taku Komura
Abstract:
We propose GauFace, a novel Gaussian Splatting representation, tailored for efficient animation and rendering of physically-based facial assets. Leveraging strong geometric priors and constrained optimization, GauFace ensures a neat and structured Gaussian representation, delivering high fidelity and real-time facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform.
Then, we in…
▽ More
We propose GauFace, a novel Gaussian Splatting representation, tailored for efficient animation and rendering of physically-based facial assets. Leveraging strong geometric priors and constrained optimization, GauFace ensures a neat and structured Gaussian representation, delivering high fidelity and real-time facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform.
Then, we introduce TransGS, a diffusion transformer that instantly translates physically-based facial assets into the corresponding GauFace representations. Specifically, we adopt a patch-based pipeline to handle the vast number of Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme with UV positional encoding to ensure the throughput and rendering quality of GauFace assets generated by our TransGS. Once trained, TransGS can instantly translate facial assets with lighting conditions to GauFace representation, With the rich conditioning modalities, it also enables editing and animation capabilities reminiscent of traditional CG pipelines.
We conduct extensive evaluations and user studies, compared to traditional offline and online renderers, as well as recent neural rendering methods, which demonstrate the superior performance of our approach for facial asset rendering. We also showcase diverse immersive applications of facial assets using our TransGS approach and GauFace representation, across various platforms like PCs, phones and even VR headsets.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM
Authors:
Qijiong Liu,
Jieming Zhu,
Lu Fan,
Zhou Zhao,
Xiao-Ming Wu
Abstract:
Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tok…
▽ More
Traditional recommendation models often rely on unique item identifiers (IDs) to distinguish between items, which can hinder their ability to effectively leverage item content information and generalize to long-tail or cold-start items. Recently, semantic tokenization has been proposed as a promising solution that aims to tokenize each item's semantic representation into a sequence of discrete tokens. In this way, it preserves the item's semantics within these tokens and ensures that semantically similar items are represented by similar tokens. These semantic tokens have become fundamental in training generative recommendation models. However, existing generative recommendation methods typically involve multiple sub-models for embedding, quantization, and recommendation, leading to an overly complex system. In this paper, we propose to streamline the semantic tokenization and generative recommendation process with a unified framework, dubbed STORE, which leverages a single large language model (LLM) for both tasks. Specifically, we formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task. All these tasks are framed in a generative manner and trained using a single LLM backbone. Extensive experiments have been conducted to validate the effectiveness of our STORE framework across various recommendation tasks and datasets. We will release the source code and configurations for reproducible research.
△ Less
Submitted 13 September, 2024; v1 submitted 11 September, 2024;
originally announced September 2024.
-
Measurements of the $CP$-even fractions of $D^0\toπ^{+}π^{-}π^{0}$ and $D^0\to K^{+}K^{-}π^{0}$ at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (648 additional authors not shown)
Abstract:
The $CP$-even fractions ($F_{+}$) of the decays $D^0\toπ^{+}π^{-}π^{0}$ and $D^0\to K^{+}K^{-}π^{0}$ are measured with a quantum-correlated $ψ(3770)\to D\bar{D}$ data sample collected by the BESIII experiment corresponding to an integrated luminosity of 7.93 $\mathrm{fb}^{-1}$. The results are $F_{+}^{π^{+}π^{-}π^{0}}=0.9406\pm0.0036\pm0.0021$ and $F_{+}^{K^{+}K^{-}π^{0}}=0.631\pm0.014\pm0.011$, w…
▽ More
The $CP$-even fractions ($F_{+}$) of the decays $D^0\toπ^{+}π^{-}π^{0}$ and $D^0\to K^{+}K^{-}π^{0}$ are measured with a quantum-correlated $ψ(3770)\to D\bar{D}$ data sample collected by the BESIII experiment corresponding to an integrated luminosity of 7.93 $\mathrm{fb}^{-1}$. The results are $F_{+}^{π^{+}π^{-}π^{0}}=0.9406\pm0.0036\pm0.0021$ and $F_{+}^{K^{+}K^{-}π^{0}}=0.631\pm0.014\pm0.011$, where the first uncertainties are statistical and the second systematic. These measurements are consistent with the previous determinations, and the uncertainties for $F_{+}^{π^{+}π^{-}π^{0}}$ and $F_{+}^{K^{+}K^{-}π^{0}}$ are reduced by factors of 3.9 and 2.6, respectively. The reported results provide important inputs for the precise measurement of the angle $γ$ of the Cabibbo-Kobayashi-Maskawa matrix and indirect $CP$ violation in charm mixing.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
CipherDM: Secure Three-Party Inference for Diffusion Model Sampling
Authors:
Xin Zhao,
Xiaojun Chen,
Xudong Chen,
He Li,
Tingyu Fan,
Zhendong Zhao
Abstract:
Diffusion Models (DMs) achieve state-of-the-art synthesis results in image generation and have been applied to various fields. However, DMs sometimes seriously violate user privacy during usage, making the protection of privacy an urgent issue. Using traditional privacy computing schemes like Secure Multi-Party Computation (MPC) directly in DMs faces significant computation and communication chall…
▽ More
Diffusion Models (DMs) achieve state-of-the-art synthesis results in image generation and have been applied to various fields. However, DMs sometimes seriously violate user privacy during usage, making the protection of privacy an urgent issue. Using traditional privacy computing schemes like Secure Multi-Party Computation (MPC) directly in DMs faces significant computation and communication challenges. To address these issues, we propose CipherDM, the first novel, versatile and universal framework applying MPC technology to DMs for secure sampling, which can be widely implemented on multiple DM based tasks. We thoroughly analyze sampling latency breakdown, find time-consuming parts and design corresponding secure MPC protocols for computing nonlinear activations including SoftMax, SiLU and Mish. CipherDM is evaluated on popular architectures (DDPM, DDIM) using MNIST dataset and on SD deployed by diffusers. Compared to direct implementation on SPU, our approach improves running time by approximately 1.084\times \sim 2.328\times, and reduces communication costs by approximately 1.212\times \sim 1.791\times.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
HyperSMOTE: A Hypergraph-based Oversampling Approach for Imbalanced Node Classifications
Authors:
Ziming Zhao,
Tiehua Zhang,
Zijian Yi,
Zhishu Shen
Abstract:
Hypergraphs are increasingly utilized in both unimodal and multimodal data scenarios due to their superior ability to model and extract higher-order relationships among nodes, compared to traditional graphs. However, current hypergraph models are encountering challenges related to imbalanced data, as this imbalance can lead to biases in the model towards the more prevalent classes. While the exist…
▽ More
Hypergraphs are increasingly utilized in both unimodal and multimodal data scenarios due to their superior ability to model and extract higher-order relationships among nodes, compared to traditional graphs. However, current hypergraph models are encountering challenges related to imbalanced data, as this imbalance can lead to biases in the model towards the more prevalent classes. While the existing techniques, such as GraphSMOTE, have improved classification accuracy for minority samples in graph data, they still fall short when addressing the unique structure of hypergraphs. Inspired by SMOTE concept, we propose HyperSMOTE as a solution to alleviate the class imbalance issue in hypergraph learning. This method involves a two-step process: initially synthesizing minority class nodes, followed by the nodes integration into the original hypergraph. We synthesize new nodes based on samples from minority classes and their neighbors. At the same time, in order to solve the problem on integrating the new node into the hypergraph, we train a decoder based on the original hypergraph incidence matrix to adaptively associate the augmented node to hyperedges. We conduct extensive evaluation on multiple single-modality datasets, such as Cora, Cora-CA and Citeseer, as well as multimodal conversation dataset MELD to verify the effectiveness of HyperSMOTE, showing an average performance gain of 3.38% and 2.97% on accuracy, respectively.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Investigating Material Interface Diffusion Phenomena through Graph Neural Networks in Applied Materials
Authors:
Zirui Zhao,
Hai-Feng Li
Abstract:
Understanding and predicting interface diffusion phenomena in materials is crucial for various industrial applications, including semiconductor manufacturing, battery technology, and catalysis. In this study, we propose a novel approach utilizing Graph Neural Networks (GNNs) to investigate and model material interface diffusion. We begin by collecting experimental and simulated data on diffusion c…
▽ More
Understanding and predicting interface diffusion phenomena in materials is crucial for various industrial applications, including semiconductor manufacturing, battery technology, and catalysis. In this study, we propose a novel approach utilizing Graph Neural Networks (GNNs) to investigate and model material interface diffusion. We begin by collecting experimental and simulated data on diffusion coefficients, concentration gradients, and other relevant parameters from diverse material systems. The data are preprocessed, and key features influencing interface diffusion are extracted. Subsequently, we construct a GNN model tailored to the diffusion problem, with a graph representation capturing the atomic structure of materials. The model architecture includes multiple graph convolutional layers for feature aggregation and update, as well as optional graph attention layers to capture complex relationships between atoms. We train and validate the GNN model using the preprocessed data, achieving accurate predictions of diffusion coefficients, diffusion rates, concentration profiles, and potential diffusion pathways. Our approach offers insights into the underlying mechanisms of interface diffusion and provides a valuable tool for optimizing material design and engineering. Additionally, our method offers possible strategies to solve the longstanding problems related to materials interface diffusion.
△ Less
Submitted 12 September, 2024; v1 submitted 8 September, 2024;
originally announced September 2024.
-
Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment
Authors:
Zhixian Zhao,
Haifeng Chen,
Xi Li,
Dongmei Jiang,
Lei Xie
Abstract:
Multimodal Emotion Recognition (MER) aims to automatically identify and understand human emotional states by integrating information from various modalities. However, the scarcity of annotated multimodal data significantly hinders the advancement of this research field. This paper presents our solution for the MER-SEMI sub-challenge of MER 2024. First, to better adapt acoustic modality features fo…
▽ More
Multimodal Emotion Recognition (MER) aims to automatically identify and understand human emotional states by integrating information from various modalities. However, the scarcity of annotated multimodal data significantly hinders the advancement of this research field. This paper presents our solution for the MER-SEMI sub-challenge of MER 2024. First, to better adapt acoustic modality features for the MER task, we experimentally evaluate the contributions of different layers of the pre-trained speech model HuBERT in emotion recognition. Based on these observations, we perform Parameter-Efficient Fine-Tuning (PEFT) on the layers identified as most effective for emotion recognition tasks, thereby achieving optimal adaptation for emotion recognition with a minimal number of learnable parameters. Second, leveraging the strengths of the acoustic modality, we propose a feature alignment pre-training method. This approach uses large-scale unlabeled data to train a visual encoder, thereby promoting the semantic alignment of visual features within the acoustic feature space. Finally, using the adapted acoustic features, aligned visual features, and lexical features, we employ an attention mechanism for feature fusion. On the MER2024-SEMI test set, the proposed method achieves a weighted F1 score of 88.90%, ranking fourth among all participating teams, validating the effectiveness of our approach.
△ Less
Submitted 10 September, 2024; v1 submitted 8 September, 2024;
originally announced September 2024.
-
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Authors:
Jiaxin Cheng,
Zixu Zhao,
Tong He,
Tianjun Xiao,
Yicong Zhou,
Zheng Zhang
Abstract:
Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative modeling is layout-to-image (L2I) generation, where predefined layouts of objects guide the generative process. In this study, we introduce a novel regional cross-att…
▽ More
Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative modeling is layout-to-image (L2I) generation, where predefined layouts of objects guide the generative process. In this study, we introduce a novel regional cross-attention module tailored to enrich layout-to-image generation. This module notably improves the representation of layout regions, particularly in scenarios where existing methods struggle with highly complex and detailed textual descriptions. Moreover, while current open-vocabulary L2I methods are trained in an open-set setting, their evaluations often occur in closed-set environments. To bridge this gap, we propose two metrics to assess L2I performance in open-vocabulary scenarios. Additionally, we conduct a comprehensive user study to validate the consistency of these metrics with human preferences.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
POINTS: Improving Your Vision-language Model with Affordable Strategies
Authors:
Yuan Liu,
Zhongyin Zhao,
Ziyuan Zhuang,
Le Tian,
Xiao Zhou,
Jie Zhou
Abstract:
In recent years, vision-language models have made significant strides, excelling in tasks like optical character recognition and geometric problem-solving. However, several critical issues remain: 1) Proprietary models often lack transparency about their architectures, while open-source models need more detailed ablations of their training strategies. 2) Pre-training data in open-source works is u…
▽ More
In recent years, vision-language models have made significant strides, excelling in tasks like optical character recognition and geometric problem-solving. However, several critical issues remain: 1) Proprietary models often lack transparency about their architectures, while open-source models need more detailed ablations of their training strategies. 2) Pre-training data in open-source works is under-explored, with datasets added empirically, making the process cumbersome. 3) Fine-tuning often focuses on adding datasets, leading to diminishing returns. To address these issues, we propose the following contributions: 1) We trained a robust baseline model using the latest advancements in vision-language models, introducing effective improvements and conducting comprehensive ablation and validation for each technique. 2) Inspired by recent work on large language models, we filtered pre-training data using perplexity, selecting the lowest perplexity data for training. This approach allowed us to train on a curated 1M dataset, achieving competitive performance. 3) During visual instruction tuning, we used model soup on different datasets when adding more datasets yielded marginal improvements. These innovations resulted in a 9B parameter model that performs competitively with state-of-the-art models. Our strategies are efficient and lightweight, making them easily adoptable by the community.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
Study of the decay $D^0\rightarrow ρ(770)^-e^+ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (646 additional authors not shown)
Abstract:
We present a study of the semileptonic decay $D^0\rightarrow π^-π^0e^{+}ν_{e}$ using an $e^+e^-$ annihilation data sample of $7.93~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The branching fraction of $D^0\to ρ(770)^-e^+ν_e$ is measured to be $(1.439 \pm 0.033(\rm stat.) \pm 0.027(\rm syst.)) \times10^{-3}$, which is a factor 1.6 more precise tha…
▽ More
We present a study of the semileptonic decay $D^0\rightarrow π^-π^0e^{+}ν_{e}$ using an $e^+e^-$ annihilation data sample of $7.93~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The branching fraction of $D^0\to ρ(770)^-e^+ν_e$ is measured to be $(1.439 \pm 0.033(\rm stat.) \pm 0.027(\rm syst.)) \times10^{-3}$, which is a factor 1.6 more precise than previous measurements. By performing an amplitude analysis, we measure the hadronic form-factor ratios of $D^0\to ρ(770)^-e^+ν_e$ at $q^2=0$ assuming the single-pole-dominance parametrization: $r_{V}=V(0)/A_1(0)=1.548\pm0.079(\rm stat.)\pm0.041(\rm syst.)$ and $r_{2}=A_2(0)/A_1(0)=0.823\pm0.056(\rm stat.)\pm0.026(\rm syst.)$.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
An overview of domain-specific foundation model: key technologies, applications and challenges
Authors:
Haolong Chen,
Hanzhi Chen,
Zijian Zhao,
Kaifeng Han,
Guangxu Zhu,
Yichen Zhao,
Ying Du,
Wei Xu,
Qingjiang Shi
Abstract:
The impressive performance of ChatGPT and other foundation-model-based products in human language understanding has prompted both academia and industry to explore how these models can be tailored for specific industries and application scenarios. This process, known as the customization of domain-specific foundation models, addresses the limitations of general-purpose models, which may not fully c…
▽ More
The impressive performance of ChatGPT and other foundation-model-based products in human language understanding has prompted both academia and industry to explore how these models can be tailored for specific industries and application scenarios. This process, known as the customization of domain-specific foundation models, addresses the limitations of general-purpose models, which may not fully capture the unique patterns and requirements of domain-specific data. Despite its importance, there is a notable lack of comprehensive overview papers on building domain-specific foundation models, while numerous resources exist for general-purpose models. To bridge this gap, this article provides a timely and thorough overview of the methodology for customizing domain-specific foundation models. It introduces basic concepts, outlines the general architecture, and surveys key methods for constructing domain-specific models. Furthermore, the article discusses various domains that can benefit from these specialized models and highlights the challenges ahead. Through this overview, we aim to offer valuable guidance and reference for researchers and practitioners from diverse fields to develop their own customized foundation models.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Fast Algorithms for Fourier extension based on boundary interval data
Authors:
Z. Y. Zhao,
Y. F Wang
Abstract:
This paper present a new algorithm for the computation of Fourier extension based on boundary data, which can obtain a super-algebraic convergent Fourier approximation for non-periodic functions. The algorithm calculates the extension part through boundary data and connects it with the original function to form a periodic smooth function. By testing the key parameters involved, their impact on the…
▽ More
This paper present a new algorithm for the computation of Fourier extension based on boundary data, which can obtain a super-algebraic convergent Fourier approximation for non-periodic functions. The algorithm calculates the extension part through boundary data and connects it with the original function to form a periodic smooth function. By testing the key parameters involved, their impact on the algorithm is clarified and the optimization setting scheme of the parameters is proposed. Compared with FFT, the algorithm only needs to increase the computational complexity by a fixed small amount.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
The Prevalence of Neural Collapse in Neural Multivariate Regression
Authors:
George Andriopoulos,
Zixuan Dong,
Li Guo,
Zifan Zhao,
Keith Ross
Abstract:
Recently it has been observed that neural networks exhibit Neural Collapse (NC) during the final stage of training for the classification problem. We empirically show that multivariate regression, as employed in imitation learning and other applications, exhibits Neural Regression Collapse (NRC), a new form of neural collapse: (NRC1) The last-layer feature vectors collapse to the subspace spanned…
▽ More
Recently it has been observed that neural networks exhibit Neural Collapse (NC) during the final stage of training for the classification problem. We empirically show that multivariate regression, as employed in imitation learning and other applications, exhibits Neural Regression Collapse (NRC), a new form of neural collapse: (NRC1) The last-layer feature vectors collapse to the subspace spanned by the $n$ principal components of the feature vectors, where $n$ is the dimension of the targets (for univariate regression, $n=1$); (NRC2) The last-layer feature vectors also collapse to the subspace spanned by the last-layer weight vectors; (NRC3) The Gram matrix for the weight vectors converges to a specific functional form that depends on the covariance matrix of the targets. After empirically establishing the prevalence of (NRC1)-(NRC3) for a variety of datasets and network architectures, we provide an explanation of these phenomena by modeling the regression task in the context of the Unconstrained Feature Model (UFM), in which the last layer feature vectors are treated as free variables when minimizing the loss function. We show that when the regularization parameters in the UFM model are strictly positive, then (NRC1)-(NRC3) also emerge as solutions in the UFM optimization problem. We also show that if the regularization parameters are equal to zero, then there is no collapse. To our knowledge, this is the first empirical and theoretical study of neural collapse in the context of regression. This extension is significant not only because it broadens the applicability of neural collapse to a new category of problems but also because it suggests that the phenomena of neural collapse could be a universal behavior in deep learning.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Low scale leptogenesis under neutrino $μ$-$τ$ reflection symmetry
Authors:
Yan Shao,
Zhen-hua Zhao
Abstract:
In the literature, the neutrino $μ$-$τ$ reflection symmetry (which has the interesting predictions $θ^{}_{23} =π/4$ and $δ= \pm π/2$ for the atmospherical neutrino mixing angle and Dirac CP phase) is an attractive and widely studied candidate for the flavor symmetries in the neutrino sector. But it is known that, when the seesaw model is furnished with this symmetry, the leptogenesis mechanism (wh…
▽ More
In the literature, the neutrino $μ$-$τ$ reflection symmetry (which has the interesting predictions $θ^{}_{23} =π/4$ and $δ= \pm π/2$ for the atmospherical neutrino mixing angle and Dirac CP phase) is an attractive and widely studied candidate for the flavor symmetries in the neutrino sector. But it is known that, when the seesaw model is furnished with this symmetry, the leptogenesis mechanism (which provides an elegant explanation for the baryon-antibaryon asymmetry of the Universe) can only work in the two-flavor regime (which only holds for the right-handed neutrino masses in the range $10^9-10^{12}$ GeV). This prohibits us to have a low scale seesaw model (which has the potential to be directly accessed by running or upcoming collider experiments) that can have the $μ$-$τ$ reflection symmetry and successful leptogenesis simultaneously. In this paper, for the first time, we demonstrate that the successful leptogenesis may also be achieved in low scale seesaw models furnished with the $μ$-$τ$ reflection symmetry, by means of the flavor non-universality of the conversion efficiencies from the flavored lepton asymmetries to the baryon asymmetry via the sphaleron process. We perform the study in both the resonant leptogenesis regime and the leptogenesis via oscillations (ARS leptogenesis) regime.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Design and Characterization of MRI-compatible Plastic Ultrasonic Motor
Authors:
Zhanyue Zhao,
Charles Bales,
Gregory Fischer
Abstract:
Precise surgical procedures may benefit from intra-operative image guidance using magnetic resonance imaging (MRI). However, the MRI's strong magnetic fields, fast switching gradients, and constrained space pose the need for an MR-guided robotic system to assist the surgeon. Piezoelectric actuators can be used in an MRI environment by utilizing the inverse piezoelectric effect for different applic…
▽ More
Precise surgical procedures may benefit from intra-operative image guidance using magnetic resonance imaging (MRI). However, the MRI's strong magnetic fields, fast switching gradients, and constrained space pose the need for an MR-guided robotic system to assist the surgeon. Piezoelectric actuators can be used in an MRI environment by utilizing the inverse piezoelectric effect for different application purposes. Piezoelectric ultrasonic motor (USM) is one type of MRI-compatible actuator that can actuate these robots with fast response times, compactness, and simple configuration. Although the piezoelectric motors are mostly made of nonferromagnetic material, the generation of eddy currents due to the MRI's gradient fields can lead to magnetic field distortions causing image artifacts. Motor vibrations due to interactions between the MRI's magnetic fields and those generated by the eddy currents can further degrade image quality by causing image artifacts. In this work, a plastic piezoelectric ultrasonic (USM) motor with more degree of MRI compatibility was developed and induced with preliminary optimization. Multiple parameters, namely teeth number, notch size, edge bevel or straight, and surface finish level parameters were used versus the prepressure for the experiment, and the results suggested that using 48 teeth, thin teeth notch with 0.39mm, beveled edge and a surface finish using grit number of approximate 1000 sandpaper performed a better output both in rotary speed and torque. Under this combination, the highest speed reached up to 436.6665rpm when the prepressure was low, and the highest torque reached up to 0.0348Nm when the prepressure was approximately 500g.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Development of Advanced FEM Simulation Technology for Pre-Operative Surgical Planning
Authors:
Zhanyue Zhao,
Yiwei Jiang,
Charles Bales,
Yang Wang,
Gregory Fischer
Abstract:
Intracorporeal needle-based therapeutic ultrasound (NBTU) offers a minimally invasive approach for the thermal ablation of malignant brain tumors, including both primary and metastatic cancers. NBTU utilizes a high-frequency alternating electric field to excite a piezoelectric transducer, generating acoustic waves that cause localized heating and tumor cell ablation, and it provides a more precise…
▽ More
Intracorporeal needle-based therapeutic ultrasound (NBTU) offers a minimally invasive approach for the thermal ablation of malignant brain tumors, including both primary and metastatic cancers. NBTU utilizes a high-frequency alternating electric field to excite a piezoelectric transducer, generating acoustic waves that cause localized heating and tumor cell ablation, and it provides a more precise ablation by delivering lower acoustic power doses directly to targeted tumors while sparing surrounding healthy tissue. Building on our previous work, this study introduces a database for optimizing pre-operative surgical planning by simulating ablation effects in varied tissue environments and develops an extended simulation model incorporating various tumor types and sizes to evaluate thermal damage under trans-tissue conditions. A comprehensive database is created from these simulations, detailing critical parameters such as CEM43 isodose maps, temperature changes, thermal dose areas, and maximum ablation distances for four directional probes. This database serves as a valuable resource for future studies, aiding in complex trajectory planning and parameter optimization for NBTU procedures. Moreover, a novel probe selection method is proposed to enhance pre-surgical planning, providing a strategic approach to selecting probes that maximize therapeutic efficiency and minimize ablation time. By avoiding unnecessary thermal propagation and optimizing probe angles, this method has the potential to improve patient outcomes and streamline surgical procedures. Overall, the findings of this study contribute significantly to the field of NBTU, offering a robust framework for enhancing treatment precision and efficacy in clinical settings.
△ Less
Submitted 9 September, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
VFLGAN-TS: Vertical Federated Learning-based Generative Adversarial Networks for Publication of Vertically Partitioned Time-Series Data
Authors:
Xun Yuan,
Zilong Zhao,
Prosanta Gope,
Biplab Sikdar
Abstract:
In the current artificial intelligence (AI) era, the scale and quality of the dataset play a crucial role in training a high-quality AI model. However, often original data cannot be shared due to privacy concerns and regulations. A potential solution is to release a synthetic dataset with a similar distribution to the private dataset. Nevertheless, in some scenarios, the attributes required to tra…
▽ More
In the current artificial intelligence (AI) era, the scale and quality of the dataset play a crucial role in training a high-quality AI model. However, often original data cannot be shared due to privacy concerns and regulations. A potential solution is to release a synthetic dataset with a similar distribution to the private dataset. Nevertheless, in some scenarios, the attributes required to train an AI model are distributed among different parties, and the parties cannot share the local data for synthetic data construction due to privacy regulations. In PETS 2024, we recently introduced the first Vertical Federated Learning-based Generative Adversarial Network (VFLGAN) for publishing vertically partitioned static data. However, VFLGAN cannot effectively handle time-series data, presenting both temporal and attribute dimensions. In this article, we proposed VFLGAN-TS, which combines the ideas of attribute discriminator and vertical federated learning to generate synthetic time-series data in the vertically partitioned scenario. The performance of VFLGAN-TS is close to that of its counterpart, which is trained in a centralized manner and represents the upper limit for VFLGAN-TS. To further protect privacy, we apply a Gaussian mechanism to make VFLGAN-TS satisfy an $(ε,δ)$-differential privacy. Besides, we develop an enhanced privacy auditing scheme to evaluate the potential privacy breach through the framework of VFLGAN-TS and synthetic datasets.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Deep learning-driven evaluation and prediction of ion-doped NASICON materials for enhanced solid-state battery performance
Authors:
Zirui Zhao,
Xiaoke Wang,
Si Wu,
Pengfei Zhou,
Qian Zhao,
Guanping Xu,
Kaitong Sun,
Hai-Feng Li
Abstract:
We developed a convolutional neural network (CNN) model capable of predicting the performance of various ion-doped NASICON compounds by leveraging extensive datasets from prior experimental investigation.The model demonstrated high accuracy and efficiency in predicting ionic conductivity and electrochemical properties. Key findings include the successful synthesis and validation of three NASICON m…
▽ More
We developed a convolutional neural network (CNN) model capable of predicting the performance of various ion-doped NASICON compounds by leveraging extensive datasets from prior experimental investigation.The model demonstrated high accuracy and efficiency in predicting ionic conductivity and electrochemical properties. Key findings include the successful synthesis and validation of three NASICON materials predicted by the model, with experimental results closely matching the model predictions. This research not only enhances the understanding of ion-doping effects in NASICON materials but also establishes a robust framework for material design and practical applications. It bridges the gap between theoretical predictions and experimental validations.
△ Less
Submitted 8 September, 2024; v1 submitted 1 September, 2024;
originally announced September 2024.
-
Joint Beamforming for Backscatter Integrated Sensing and Communication
Authors:
Zongyao Zhao,
Tiankuo Wei,
Zhenyu Liu,
Xinke Tang,
Xiao-Ping Zhang,
Yuhan Dong
Abstract:
Integrated sensing and communication (ISAC) is a key technology of next generation wireless communication. Backscatter communication (BackCom) plays an important role for internet of things (IoT). Then the integration of ISAC with BackCom technology enables low-power data transmission while enhancing the system sensing ability, which is expected to provide a potentially revolutionary solution for…
▽ More
Integrated sensing and communication (ISAC) is a key technology of next generation wireless communication. Backscatter communication (BackCom) plays an important role for internet of things (IoT). Then the integration of ISAC with BackCom technology enables low-power data transmission while enhancing the system sensing ability, which is expected to provide a potentially revolutionary solution for IoT applications. In this paper, we propose a novel backscatter-ISAC (B-ISAC) system and focus on the joint beamforming design for the system. We formulate the communication and sensing model of the B-ISAC system and derive the metrics of communication and sensing performance respectively, i.e., communication rate and detection probability. We propose a joint beamforming scheme aiming to optimize the communication rate under sensing constraint and power budget. A successive convex approximation (SCA) based algorithm and an iterative algorithm are developed for solving the complicated non-convex optimization problem. Numerical results validate the effectiveness of the proposed scheme and associated algorithms. The proposed B-ISAC system has broad application prospect in IoT scenarios.
△ Less
Submitted 4 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Searching for the massless dark photon in $c\to uγ'$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (648 additional authors not shown)
Abstract:
In the effective field theory, the massless dark photon $γ'$ can only couple with the Standard Model particle through operators of dimension higher than four, thereby offering a high sensitivity to the new physics energy scale. Using $7.9~\rm{fb^{-1}}$ of $e^+e^-$ collision data collected at $\sqrt{s}=3.773$ GeV with the BESIII detector at the BEPCII collider, we measure the effective flavor-chang…
▽ More
In the effective field theory, the massless dark photon $γ'$ can only couple with the Standard Model particle through operators of dimension higher than four, thereby offering a high sensitivity to the new physics energy scale. Using $7.9~\rm{fb^{-1}}$ of $e^+e^-$ collision data collected at $\sqrt{s}=3.773$ GeV with the BESIII detector at the BEPCII collider, we measure the effective flavor-changing neutral current coupling of $cuγ'$ in $D^0\toωγ'$ and $D^0\toγγ'$ processes to search for the massless dark photon. No significant signals are observed, and the upper limits at the 90% confidence level on the massless dark photon branching fraction are set to be $1.1\times10^{-5}$ and $2.0\times10^{-6}$ for $D^0\toωγ'$ and $D^0\toγγ'$, respectively. These results provide the most stringent constraint on the new physics energy scale associated with $cuγ'$ coupling in the world, with the new physics energy scale related parameter $|\mathbb{C}|^2+|\mathbb{C}_5|^2<8.2\times10^{-17}~\rm{GeV}^{-2}$ at the 90% confidence level, playing a unique role in the dark sector search with the charm sector.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Deep Brain Ultrasound Ablation Thermal Dose Modeling with in Vivo Experimental Validation
Authors:
Zhanyue Zhao,
Benjamin Szewczyk,
Matthew Tarasek,
Charles Bales,
Yang Wang,
Ming Liu,
Yiwei Jiang,
Chitresh Bhushan,
Eric Fiveland,
Zahabiya Campwala,
Rachel Trowbridge,
Phillip M. Johansen,
Zachary Olmsted,
Goutam Ghoshal,
Tamas Heffter,
Katie Gandomi,
Farid Tavakkolmoghaddam,
Christopher Nycz,
Erin Jeannotte,
Shweta Mane,
Julia Nalwalk,
E. Clif Burdette,
Jiang Qian,
Desmond Yeo,
Julie Pilitsis
, et al. (1 additional authors not shown)
Abstract:
Intracorporeal needle-based therapeutic ultrasound (NBTU) is a minimally invasive option for intervening in malignant brain tumors, commonly used in thermal ablation procedures. This technique is suitable for both primary and metastatic cancers, utilizing a high-frequency alternating electric field (up to 10 MHz) to excite a piezoelectric transducer. The resulting rapid deformation of the transduc…
▽ More
Intracorporeal needle-based therapeutic ultrasound (NBTU) is a minimally invasive option for intervening in malignant brain tumors, commonly used in thermal ablation procedures. This technique is suitable for both primary and metastatic cancers, utilizing a high-frequency alternating electric field (up to 10 MHz) to excite a piezoelectric transducer. The resulting rapid deformation of the transducer produces an acoustic wave that propagates through tissue, leading to localized high-temperature heating at the target tumor site and inducing rapid cell death. To optimize the design of NBTU transducers for thermal dose delivery during treatment, numerical modeling of the acoustic pressure field generated by the deforming piezoelectric transducer is frequently employed. The bioheat transfer process generated by the input pressure field is used to track the thermal propagation of the applicator over time. Magnetic resonance thermal imaging (MRTI) can be used to experimentally validate these models. Validation results using MRTI demonstrated the feasibility of this model, showing a consistent thermal propagation pattern. However, a thermal damage isodose map is more advantageous for evaluating therapeutic efficacy. To achieve a more accurate simulation based on the actual brain tissue environment, a new finite element method (FEM) simulation with enhanced damage evaluation capabilities was conducted. The results showed that the highest temperature and ablated volume differed between experimental and simulation results by 2.1884°C (3.71%) and 0.0631 cm$^3$ (5.74%), respectively. The lowest Pearson correlation coefficient (PCC) for peak temperature was 0.7117, and the lowest Dice coefficient for the ablated area was 0.7021, indicating a good agreement in accuracy between simulation and experiment.
△ Less
Submitted 4 September, 2024; v1 submitted 3 September, 2024;
originally announced September 2024.
-
Unfolding Videos Dynamics via Taylor Expansion
Authors:
Siyi Chen,
Minkyu Choi,
Zesen Zhao,
Kuan Han,
Qing Qu,
Zhongming Liu
Abstract:
Taking inspiration from physical motion, we present a new self-supervised dynamics learning strategy for videos: Video Time-Differentiation for Instance Discrimination (ViDiDi). ViDiDi is a simple and data-efficient strategy, readily applicable to existing self-supervised video representation learning frameworks based on instance discrimination. At its core, ViDiDi observes different aspects of a…
▽ More
Taking inspiration from physical motion, we present a new self-supervised dynamics learning strategy for videos: Video Time-Differentiation for Instance Discrimination (ViDiDi). ViDiDi is a simple and data-efficient strategy, readily applicable to existing self-supervised video representation learning frameworks based on instance discrimination. At its core, ViDiDi observes different aspects of a video through various orders of temporal derivatives of its frame sequence. These derivatives, along with the original frames, support the Taylor series expansion of the underlying continuous dynamics at discrete times, where higher-order derivatives emphasize higher-order motion features. ViDiDi learns a single neural network that encodes a video and its temporal derivatives into consistent embeddings following a balanced alternating learning algorithm. By learning consistent representations for original frames and derivatives, the encoder is steered to emphasize motion features over static backgrounds and uncover the hidden dynamics in original frames. Hence, video representations are better separated by dynamic features. We integrate ViDiDi into existing instance discrimination frameworks (VICReg, BYOL, and SimCLR) for pretraining on UCF101 or Kinetics and test on standard benchmarks including video retrieval, action recognition, and action detection. The performances are enhanced by a significant margin without the need for large models or extensive datasets.
△ Less
Submitted 7 September, 2024; v1 submitted 3 September, 2024;
originally announced September 2024.
-
PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks
Authors:
Shengchen Zhu,
Yiming Chen,
Peiying Yu,
Xiang Qu,
Yuxiao Zhou,
Yiming Ma,
Zhizhan Zhao,
Yukai Liu,
Hao Mi,
Bin Wang
Abstract:
Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechani…
▽ More
Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechanisms within the convolutional layers enhances the model's capacity to capture fine-grained spatial details, thereby improving its predictive accuracy for meteorological phenomena.
We introduce PuYun, comprising PuYun-Short for 0-5 day forecasts and PuYun-Medium for 5-10 day predictions. This approach enhances the accuracy of 10-day weather forecasting. Through evaluation, we demonstrate that PuYun-Short alone surpasses the performance of both GraphCast and FuXi-Short in generating accurate 10-day forecasts. Specifically, on the 10th day, PuYun-Short reduces the RMSE for Z500 to 720 $m^2/s^2$, compared to 732 $m^2/s^2$ for GraphCast and 740 $m^2/s^2$ for FuXi-Short. Additionally, the RMSE for T2M is reduced to 2.60 K, compared to 2.63 K for GraphCast and 2.65 K for FuXi-Short. Furthermore, when employing a cascaded approach by integrating PuYun-Short and PuYun-Medium, our method achieves superior results compared to the combined performance of FuXi-Short and FuXi-Medium. On the 10th day, the RMSE for Z500 is further reduced to 638 $m^2/s^2$, compared to 641 $m^2/s^2$ for FuXi. These findings underscore the effectiveness of our model ensemble in advancing medium-range weather prediction. Our training code and model will be open-sourced.
△ Less
Submitted 12 September, 2024; v1 submitted 1 September, 2024;
originally announced September 2024.
-
Study of $D^{+} \to K_{S}^{0}K^{*}(892)^{+}$ in $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using a data sample of $e^+e^-$ collisions corresponding to an integrated luminosity of 7.93 $\rm fb^{-1}$ collected with the BESIII detector at the center-of-mass energy 3.773~GeV, we perform the first amplitude analysis of the decay $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$. The absolute branching fraction of $D^{+} \to K_{S}^{0}K_{S}^{0} π^{+}$ is measured to be…
▽ More
Using a data sample of $e^+e^-$ collisions corresponding to an integrated luminosity of 7.93 $\rm fb^{-1}$ collected with the BESIII detector at the center-of-mass energy 3.773~GeV, we perform the first amplitude analysis of the decay $D^{+} \to K_{S}^{0} K_{S}^{0} π^{+}$. The absolute branching fraction of $D^{+} \to K_{S}^{0}K_{S}^{0} π^{+}$ is measured to be $(2.97 \pm 0.09_{\rm stat.} \pm 0.05_{\rm syst.})\times10^{-3}$. The dominant intermediate process is $D^{+} \to K_{S}^{0}K^{*}(892)^{+}$, whose branching fraction is determined to be $(8.72 \pm 0.28_{\rm stat.} \pm 0.15_{\rm syst.}) \times 10^{-3}$, including all the $K^*(892)^+$ decays.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Measurement of Born cross sections of $e^+e^-\toΞ^0\barΞ^0$ and search for charmonium(-like) states at $\sqrt{s}$ = 3.51-4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (648 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data collected by the BESIII detector at BEPCII corresponding to an integrated luminosity of 30 $\rm fb^{-1}$, we measure Born cross sections and effective form factors for the process $e^+e^-\toΞ^0\barΞ^0$ at forty-five center-of-mass energies between 3.51 and 4.95 GeV. The dressed cross section is fitted, assuming a power-law function plus a charmonium(-like) state, i.e.…
▽ More
Using $e^+e^-$ collision data collected by the BESIII detector at BEPCII corresponding to an integrated luminosity of 30 $\rm fb^{-1}$, we measure Born cross sections and effective form factors for the process $e^+e^-\toΞ^0\barΞ^0$ at forty-five center-of-mass energies between 3.51 and 4.95 GeV. The dressed cross section is fitted, assuming a power-law function plus a charmonium(-like) state, i.e., $ψ(3770)$, $ψ(4040)$, $ψ(4160)$, $ψ(4230)$, $ψ(4360)$, $ψ(4415)$ or $ψ(4660)$. No significant charmonium(-like) state decaying into $Ξ^0\barΞ^0$ is observed. Upper limits at the 90% confidence level on the product of the branching fraction and the electronic partial width are provided for each decay. In addition, ratios of the Born cross sections and the effective form factors for $e^+e^-\toΞ^0\barΞ^0$ and $e^+e^-\toΞ^-\barΞ^+$ are also presented to test isospin symmetry and the vector meson dominance model.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.
-
Efficient Multi-task Prompt Tuning for Recommendation
Authors:
Ting Bai,
Le Huang,
Yue Yu,
Cheng Yang,
Cheng Hou,
Zhe Zhao,
Chuan Shi
Abstract:
With the expansion of business scenarios, real recommender systems are facing challenges in dealing with the constantly emerging new tasks in multi-task learning frameworks. In this paper, we attempt to improve the generalization ability of multi-task recommendations when dealing with new tasks. We find that joint training will enhance the performance of the new task but always negatively impact e…
▽ More
With the expansion of business scenarios, real recommender systems are facing challenges in dealing with the constantly emerging new tasks in multi-task learning frameworks. In this paper, we attempt to improve the generalization ability of multi-task recommendations when dealing with new tasks. We find that joint training will enhance the performance of the new task but always negatively impact existing tasks in most multi-task learning methods. Besides, such a re-training mechanism with new tasks increases the training costs, limiting the generalization ability of multi-task recommendation models. Based on this consideration, we aim to design a suitable sharing mechanism among different tasks while maintaining joint optimization efficiency in new task learning. A novel two-stage prompt-tuning MTL framework (MPT-Rec) is proposed to address task irrelevance and training efficiency problems in multi-task recommender systems. Specifically, we disentangle the task-specific and task-sharing information in the multi-task pre-training stage, then use task-aware prompts to transfer knowledge from other tasks to the new task effectively. By freezing parameters in the pre-training tasks, MPT-Rec solves the negative impacts that may be brought by the new task and greatly reduces the training costs. Extensive experiments on three real-world datasets show the effectiveness of our proposed multi-task learning framework. MPT-Rec achieves the best performance compared to the SOTA multi-task learning method. Besides, it maintains comparable model performance but vastly improves the training efficiency (i.e., with up to 10% parameters in the full training way) in the new task learning.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
A note on promotion time cure models with a new biological consideration
Authors:
Zhi Zhao,
Fatih Kızılaslan
Abstract:
We introduce a generalized promotion time cure model motivated by a new biological consideration. The new approach is flexible to model heterogeneous survival data, in particular for addressing intra-sample heterogeneity.
We introduce a generalized promotion time cure model motivated by a new biological consideration. The new approach is flexible to model heterogeneous survival data, in particular for addressing intra-sample heterogeneity.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Search for $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0h_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (653 additional authors not shown)
Abstract:
Using $(2712.4 \pm 14.3) \times 10^6~ψ$(3686) events collected with the BESIII detector operating at the BEPCII collider, we search for the hadronic transition $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0 h_c$. No significant signal is observed. We set the most stringent upper limits to date on the branching fractions $\mathcal{B}(ψ(3686)\to π^0 h_c)\times\mathcal{B}(h_c\toπ^+π^-J/ψ)$ and…
▽ More
Using $(2712.4 \pm 14.3) \times 10^6~ψ$(3686) events collected with the BESIII detector operating at the BEPCII collider, we search for the hadronic transition $h_c \to π^+π^-J/ψ$ via $ψ(3686)\to π^0 h_c$. No significant signal is observed. We set the most stringent upper limits to date on the branching fractions $\mathcal{B}(ψ(3686)\to π^0 h_c)\times\mathcal{B}(h_c\toπ^+π^-J/ψ)$ and $\mathcal{B}(h_c \to π^+π^-J/ψ)$ at the 90$\%$ confidence level, which are determined to be $6.7\times 10^{-7}$ and $9.4 \times10^{-4}$, respectively.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Measurement of the Decay $Ξ^{0}\toΛγ$ with Entangled $Ξ^{0}\barΞ^{0}$ Pairs
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
In this Letter, a systematic study of the weak radiative hyperon decay $Ξ^{0}\toΛγ$ at an electron-positron collider using entangled $Ξ^{0}\barΞ^{0}$ pair events is presented. The absolute branching fraction for this decay has been measured for the first time, and is $\left(1.347 \pm 0.066_{\mathrm stat.}\pm0.054_{\mathrm syst.}\right)\times 10^{-3}$. The decay asymmetry parameter, which character…
▽ More
In this Letter, a systematic study of the weak radiative hyperon decay $Ξ^{0}\toΛγ$ at an electron-positron collider using entangled $Ξ^{0}\barΞ^{0}$ pair events is presented. The absolute branching fraction for this decay has been measured for the first time, and is $\left(1.347 \pm 0.066_{\mathrm stat.}\pm0.054_{\mathrm syst.}\right)\times 10^{-3}$. The decay asymmetry parameter, which characterizes the effect of parity violation in the decay, is determined to be $-0.741 \pm 0.062_{\mathrm stat.}\pm 0.019_{\mathrm syst.}$. The obtained results are consistent with the world average values within the uncertainties, offering valuable insights into the underlying mechanism governing the weak radiative hyperon decays. The charge conjugation parity ($CP$) symmetries of branching fraction and decay asymmetry parameter in the decay are also studied. No statistically significant violation of charge conjugation parity symmetry is observed.
△ Less
Submitted 29 August, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Authors:
Shengpeng Ji,
Ziyue Jiang,
Xize Cheng,
Yifu Chen,
Minghui Fang,
Jialong Zuo,
Qian Yang,
Ruiqi Li,
Ziang Zhang,
Xiaoda Yang,
Rongjie Huang,
Yidi Jiang,
Qian Chen,
Siqi Zheng,
Wen Wang,
Zhou Zhao
Abstract:
Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domai…
▽ More
Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domain: 1)extreme compression. By compressing the layers of quantizers and the temporal dimension of the discrete codec, one-second audio of 24kHz sampling rate requires only a single quantizer with 40 or 75 tokens. 2)improved subjective quality. Despite the reduced number of tokens, WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information. Specifically, we achieve these results by designing a broader VQ space, extended contextual windows, and improved attention networks, as well as introducing a powerful multi-scale discriminator and an inverse Fourier transform structure. We conducted extensive reconstruction experiments in the domains of speech, audio, and music. WavTokenizer exhibited strong performance across various objective and subjective metrics compared to state-of-the-art models. We also tested semantic information, VQ utilization, and adaptability to generative models. Comprehensive ablation studies confirm the necessity of each module in WavTokenizer. The related code, demos, and pre-trained models are available at https://github.com/jishengpeng/WavTokenizer.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Model-independent determination of the strong-phase difference between $D^0$ and $\bar{D}^0 \to π^+π^-π^+π^-$ decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (647 additional authors not shown)
Abstract:
Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a…
▽ More
Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a superposition of flavor eigenstates. The reported results are valuable for measurements of the $C\!P$-violating phase $γ$ (also denoted $φ_3$) in $B^\pm \to DK^\pm$, $D \to π^+π^-π^+π^-$ decays, and the binning schemes are designed to provide good statistical sensitivity to this parameter. The expected uncertainty on $γ$ arising from the precision of the strong-phase measurements, when applied to very large samples of $B$-meson decays, is around $1.5^\circ$ or $2^\circ$, depending on the binning scheme. The binned strong-phase parameters are combined to give a value of $F_+^{4π} = 0.746 \pm 0.010 \pm 0.004$ for the $C\!P$-even fraction of $D^0 \to π^+π^-π^+π^-$ decays, which is around 30\% more precise than the previous best measurement of this quantity.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Improving Adversarial Robustness in Android Malware Detection by Reducing the Impact of Spurious Correlations
Authors:
Hamid Bostani,
Zhengyu Zhao,
Veelasha Moonsamy
Abstract:
Machine learning (ML) has demonstrated significant advancements in Android malware detection (AMD); however, the resilience of ML against realistic evasion attacks remains a major obstacle for AMD. One of the primary factors contributing to this challenge is the scarcity of reliable generalizations. Malware classifiers with limited generalizability tend to overfit spurious correlations derived fro…
▽ More
Machine learning (ML) has demonstrated significant advancements in Android malware detection (AMD); however, the resilience of ML against realistic evasion attacks remains a major obstacle for AMD. One of the primary factors contributing to this challenge is the scarcity of reliable generalizations. Malware classifiers with limited generalizability tend to overfit spurious correlations derived from biased features. Consequently, adversarial examples (AEs), generated by evasion attacks, can modify these features to evade detection. In this study, we propose a domain adaptation technique to improve the generalizability of AMD by aligning the distribution of malware samples and AEs. Specifically, we utilize meaningful feature dependencies, reflecting domain constraints in the feature space, to establish a robust feature space. Training on the proposed robust feature space enables malware classifiers to learn from predefined patterns associated with app functionality rather than from individual features. This approach helps mitigate spurious correlations inherent in the initial feature space. Our experiments conducted on DREBIN, a renowned Android malware detector, demonstrate that our approach surpasses the state-of-the-art defense, Sec-SVM, when facing realistic evasion attacks. In particular, our defense can improve adversarial robustness by up to 55% against realistic evasion attacks compared to Sec-SVM.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Integer Topological Defects Reveal Anti-Symmetric Forces in Active Nematics
Authors:
Zihui Zhao,
Yisong Yao,
He Li,
Yongfeng Zhao,
Yujia Wang,
Hepeng Zhang,
Hugues Chat'e,
Masaki Sano
Abstract:
Cell layers are often categorized as contractile or extensile active nematics but recent experiments on neural progenitor cells with induced $+1$ topological defects challenge this classification. In a bottom-up approach, we first study a relevant particle-level model and then analyze a continuous theory derived from it. We show that both model and theory account qualitatively for the main experim…
▽ More
Cell layers are often categorized as contractile or extensile active nematics but recent experiments on neural progenitor cells with induced $+1$ topological defects challenge this classification. In a bottom-up approach, we first study a relevant particle-level model and then analyze a continuous theory derived from it. We show that both model and theory account qualitatively for the main experimental result, i.e. accumulation of cells at the core of any type of +1 defect. We argue that cell accumulation is essentially due to two generally ignored 'effective active forces'.
We finally discuss the relevance and consequences of our findings in the context of other cellular active nematics experiments and previously proposed theories.
△ Less
Submitted 12 September, 2024; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
Authors:
Nikhil Khani,
Shuo Yang,
Aniruddh Nath,
Yang Liu,
Pendo Abbo,
Li Wei,
Shawn Andrews,
Maciej Kula,
Jarrod Kahn,
Zhe Zhao,
Lichan Hong,
Ed Chi
Abstract:
Knowledge Distillation (KD) is a powerful approach for compressing a large model into a smaller, more efficient model, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper address…
▽ More
Knowledge Distillation (KD) is a powerful approach for compressing a large model into a smaller, more efficient model, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring consistent and reliable generation of high quality teacher labels from a continuous data stream of data.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Distilling Long-tailed Datasets
Authors:
Zhenghao Zhao,
Haoxuan Wang,
Yuzhang Shang,
Kai Wang,
Yan Yan
Abstract:
Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradi…
▽ More
Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradients, leading to the synthesis of similarly imbalanced distilled datasets. Parameter matching, a common technique in DD, involves aligning the learning parameters of the distilled dataset with that of the original dataset. However, in the context of long-tailed datasets, matching biased experts leads to inheriting the imbalance present in the original data, causing the distilled dataset to inadequately represent tail classes. 2) The experts trained on such datasets perform suboptimally on tail classes, resulting in misguided distillation supervision and poor-quality soft-label initialization. To address these issues, we propose a novel long-tailed dataset distillation method, Long-tailed Aware Dataset distillation (LAD). Specifically, we propose Weight Mismatch Avoidance to avoid directly matching the biased expert trajectories. It reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset. Moreover, we propose Adaptive Decoupled Matching, which jointly matches the decoupled backbone and classifier to improve the tail class performance and initialize reliable soft labels. This work pioneers the field of long-tailed dataset distillation (LTDD), marking the first effective effort to distill long-tailed datasets.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Language-specific Calibration for Pruning Multilingual Language Models
Authors:
Simon Kurz,
Jian-Jia Chen,
Lucie Flek,
Zhixue Zhao
Abstract:
Recent advances in large language model (LLM) pruning have shown state-of-the-art compression results in post-training and retraining-free settings while maintaining high predictive performance. However, such research mainly considers calibrating pruning using English text, despite the multilingual nature of modern LLMs and their frequent uses in non-English languages. In this paper, we set out to…
▽ More
Recent advances in large language model (LLM) pruning have shown state-of-the-art compression results in post-training and retraining-free settings while maintaining high predictive performance. However, such research mainly considers calibrating pruning using English text, despite the multilingual nature of modern LLMs and their frequent uses in non-English languages. In this paper, we set out to explore effective strategies for calibrating the pruning of multilingual language models. We present the first comprehensive empirical study, comparing different calibration languages for pruning multilingual models across diverse tasks, models, and state-of-the-art pruning techniques. Our results present practical suggestions, for example, calibrating in the target language can efficiently yield lower perplexity, but does not necessarily benefit downstream tasks. Our further analysis experiments unveil that calibration in the target language mainly contributes to preserving language-specific features related to fluency and coherence, but might not contribute to capturing language-agnostic features such as language understanding and reasoning. Last, we provide practical recommendations for future practitioners.
△ Less
Submitted 28 August, 2024; v1 submitted 26 August, 2024;
originally announced August 2024.
-
IntOPE: Off-Policy Evaluation in the Presence of Interference
Authors:
Yuqi Bai,
Ziyu Zhao,
Minqin Zhu,
Kun Kuang
Abstract:
Off-Policy Evaluation (OPE) is employed to assess the potential impact of a hypothetical policy using logged contextual bandit feedback, which is crucial in areas such as personalized medicine and recommender systems, where online interactions are associated with significant risks and costs. Traditionally, OPE methods rely on the Stable Unit Treatment Value Assumption (SUTVA), which assumes that t…
▽ More
Off-Policy Evaluation (OPE) is employed to assess the potential impact of a hypothetical policy using logged contextual bandit feedback, which is crucial in areas such as personalized medicine and recommender systems, where online interactions are associated with significant risks and costs. Traditionally, OPE methods rely on the Stable Unit Treatment Value Assumption (SUTVA), which assumes that the reward for any given individual is unaffected by the actions of others. However, this assumption often fails in real-world scenarios due to the presence of interference, where an individual's reward is affected not just by their own actions but also by the actions of their peers. This realization reveals significant limitations of existing OPE methods in real-world applications. To address this limitation, we propose IntIPW, an IPW-style estimator that extends the Inverse Probability Weighting (IPW) framework by integrating marginalized importance weights to account for both individual actions and the influence of adjacent entities. Extensive experiments are conducted on both synthetic and real-world data to demonstrate the effectiveness of the proposed IntIPW method.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Semantic Alignment for Multimodal Large Language Models
Authors:
Tao Wu,
Mengze Li,
Jingyuan Chen,
Wei Ji,
Wang Lin,
Jinyang Gao,
Kun Kuang,
Zhou Zhao,
Fei Wu
Abstract:
Research on Multi-modal Large Language Models (MLLMs) towards the multi-image cross-modal instruction has received increasing attention and made significant progress, particularly in scenarios involving closely resembling images (e.g., change captioning). Existing MLLMs typically follow a two-step process in their pipelines: first, extracting visual tokens independently for each input image, and t…
▽ More
Research on Multi-modal Large Language Models (MLLMs) towards the multi-image cross-modal instruction has received increasing attention and made significant progress, particularly in scenarios involving closely resembling images (e.g., change captioning). Existing MLLMs typically follow a two-step process in their pipelines: first, extracting visual tokens independently for each input image, and then aligning these visual tokens from different images with the Large Language Model (LLM) in its textual feature space. However, the independent extraction of visual tokens for each image may result in different semantics being prioritized for different images in the first step, leading to a lack of preservation of linking information among images for subsequent LLM analysis. This issue becomes more serious in scenarios where significant variations exist among the images (e.g., visual storytelling). To address this challenge, we introduce Semantic Alignment for Multi-modal large language models (SAM). By involving the bidirectional semantic guidance between different images in the visual-token extraction process, SAM aims to enhance the preservation of linking information for coherent analysis and align the semantics of different images before feeding them into LLM. As the test bed, we propose a large-scale dataset named MmLINK consisting of 69K samples. Different from most existing datasets for MLLMs fine-tuning, our MmLINK dataset comprises multi-modal instructions with significantly diverse images. Extensive experiments on the group captioning task and the storytelling task prove the effectiveness of our SAM model, surpassing the state-of-the-art methods by a large margin (+37% for group captioning and +22% for storytelling on CIDEr score). Project page: https://mccartney01.github.io/SAM.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Generating Realistic X-ray Scattering Images Using Stable Diffusion and Human-in-the-loop Annotations
Authors:
Zhuowen Zhao,
Xiaoya Chong,
Tanny Chavez,
Alexander Hexemer
Abstract:
We fine-tuned a foundational stable diffusion model using X-ray scattering images and their corresponding descriptions to generate new scientific images from given prompts. However, some of the generated images exhibit significant unrealistic artifacts, commonly known as "hallucinations". To address this issue, we trained various computer vision models on a dataset composed of 60% human-approved g…
▽ More
We fine-tuned a foundational stable diffusion model using X-ray scattering images and their corresponding descriptions to generate new scientific images from given prompts. However, some of the generated images exhibit significant unrealistic artifacts, commonly known as "hallucinations". To address this issue, we trained various computer vision models on a dataset composed of 60% human-approved generated images and 40% experimental images to detect unrealistic images. The classified images were then reviewed and corrected by human experts, and subsequently used to further refine the classifiers in next rounds of training and inference. Our evaluations demonstrate the feasibility of generating high-fidelity, domain-specific images using a fine-tuned diffusion model. We anticipate that generative AI will play a crucial role in enhancing data augmentation and driving the development of digital twins in scientific research facilities.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Ant Backpressure Routing for Wireless Multi-hop Networks with Mixed Traffic Patterns
Authors:
Negar Erfaniantaghvayi,
Zhongyuan Zhao,
Kevin Chan,
Gunjan Verma,
Ananthram Swami,
Santiago Segarra
Abstract:
A mixture of streaming and short-lived traffic presents a common yet challenging scenario for Backpressure routing in wireless multi-hop networks. Although state-of-the-art shortest-path biased backpressure (SP-BP) can significantly improve the latency of backpressure routing while retaining throughput optimality, it still suffers from the last-packet problem due to its inherent per-commodity queu…
▽ More
A mixture of streaming and short-lived traffic presents a common yet challenging scenario for Backpressure routing in wireless multi-hop networks. Although state-of-the-art shortest-path biased backpressure (SP-BP) can significantly improve the latency of backpressure routing while retaining throughput optimality, it still suffers from the last-packet problem due to its inherent per-commodity queue structure and link capacity assignment. To address this challenge, we propose Ant Backpressure (Ant-BP), a fully distributed routing scheme that incorporates the multi-path routing capability of SP-BP into ant colony optimization (ACO) routing, which allows packets of different commodities to share link capacity in a first-in-first-out (FIFO) manner. Numerical evaluations show that Ant-BP can improve the latency and delivery ratio over SP-BP and ACO routing schemes, while achieving the same throughput of SP-BP under low-to-medium traffic loads.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Kondo spectral functions at low-temperatures: A dynamical-exchange-correlation-field perspective
Authors:
Zhen Zhao
Abstract:
We calculate the low-temperature spectral function of the symmetric single impurity Anderson model using a recently proposed dynamical exchange-correlation (xc) field formalism. The xc field, coupled to the one-particle Green's function, is obtained through analytic analysis and numerical extrapolation based on finite clusters. In the Kondo regime, the xc field consists of a complex constant term…
▽ More
We calculate the low-temperature spectral function of the symmetric single impurity Anderson model using a recently proposed dynamical exchange-correlation (xc) field formalism. The xc field, coupled to the one-particle Green's function, is obtained through analytic analysis and numerical extrapolation based on finite clusters. In the Kondo regime, the xc field consists of a complex constant term and a main quasiparticle-like oscillation term. The constant term represents the Hubbard side-band contribution, containing a bath-induced broadening effect, while the quasiparticle-like term is related to the Kondo resonance peak at low-temperature. We illustrate these features in terms of analytical and numerical calculations for small and medium-size finite clusters, and in the thermodynamic limit. The results indicate that the xc field formalism provides a good trade-off between accuracy and complexity in solving impurity problems. Consequently, it can significantly reduce the complexity of the many-body problem faced by first-principles approaches to strongly correlated materials.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Authors:
Zhongyu Zhao,
Menghang Dong,
Rongyu Zhang,
Wenzhao Zheng,
Yunpeng Zhang,
Huanrui Yang,
Dalong Du,
Kurt Keutzer,
Shanghang Zhang
Abstract:
Recent research has demonstrated that Feed-Forward Networks (FFNs) in Large Language Models (LLMs) play a pivotal role in storing diverse linguistic and factual knowledge. Conventional methods frequently face challenges due to knowledge confusion stemming from their monolithic and redundant architectures, which calls for more efficient solutions with minimal computational overhead, particularly fo…
▽ More
Recent research has demonstrated that Feed-Forward Networks (FFNs) in Large Language Models (LLMs) play a pivotal role in storing diverse linguistic and factual knowledge. Conventional methods frequently face challenges due to knowledge confusion stemming from their monolithic and redundant architectures, which calls for more efficient solutions with minimal computational overhead, particularly for LLMs. In this paper, we explore the FFN computation paradigm in LLMs and introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications, while maintaining the same level of performance. Furthermore, we embed a router from the Mixture-of-Experts (MoE), combined with our devised Prior-Approximate (PA) loss term that facilitates the dynamic activation of experts and knowledge adaptation, thereby accelerating computational processes and enhancing performance using minimal training data and fine-tuning steps. FactorLLM thus enables efficient knowledge factorization and activates select groups of experts specifically tailored to designated tasks, emulating the interactive functional segmentation of the human brain. Extensive experiments across various benchmarks demonstrate the effectiveness of our proposed FactorLLM which achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed. Code: https://github.com/zhenwuweihe/FactorLLM.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond
Authors:
Minghao Liu,
Zonglin Di,
Jiaheng Wei,
Zhongruo Wang,
Hengxiang Zhang,
Ruixuan Xiao,
Haoyu Wang,
Jinlong Pang,
Hao Chen,
Ankit Shah,
Hongxin Wei,
Xinlei He,
Zhaowei Zhao,
Haobo Wang,
Lei Feng,
Jindong Wang,
James Davis,
Yang Liu
Abstract:
Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (A…
▽ More
Large-scale data collection is essential for developing personalized training data, mitigating the shortage of training data, and fine-tuning specialized models. However, creating high-quality datasets quickly and accurately remains a challenge due to annotation errors, the substantial time and costs associated with human labor. To address these issues, we propose Automatic Dataset Construction (ADC), an innovative methodology that automates dataset creation with negligible cost and high efficiency. Taking the image classification task as a starting point, ADC leverages LLMs for the detailed class design and code generation to collect relevant samples via search engines, significantly reducing the need for manual annotation and speeding up the data generation process. Despite these advantages, ADC also encounters real-world challenges such as label errors (label noise) and imbalanced data distributions (label bias). We provide open-source software that incorporates existing methods for label error detection, robust learning under noisy and biased data, ensuring a higher-quality training data and more robust model training procedure. Furthermore, we design three benchmark datasets focused on label noise detection, label noise learning, and class-imbalanced learning. These datasets are vital because there are few existing datasets specifically for label noise detection, despite its importance. Finally, we evaluate the performance of existing popular methods on these datasets, thereby facilitating further research in the field.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer
Authors:
Weipeng Jiang,
Zhenting Wang,
Juan Zhai,
Shiqing Ma,
Zhengyu Zhao,
Chao Shen
Abstract:
Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall into two main categories: template-based and optimization-based methods. The former requires significant manual effort and domain knowledge, while the latter, exemplified by Greedy Coordinate Gradient (GCG), which seeks…
▽ More
Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall into two main categories: template-based and optimization-based methods. The former requires significant manual effort and domain knowledge, while the latter, exemplified by Greedy Coordinate Gradient (GCG), which seeks to maximize the likelihood of harmful LLM outputs through token-level optimization, also encounters several limitations: requiring white-box access, necessitating pre-constructed affirmative phrase, and suffering from low efficiency. In this paper, we present ECLIPSE, a novel and efficient black-box jailbreaking method utilizing optimizable suffixes. Drawing inspiration from LLMs' powerful generation and optimization capabilities, we employ task prompts to translate jailbreaking goals into natural language instructions. This guides the LLM to generate adversarial suffixes for malicious queries. In particular, a harmfulness scorer provides continuous feedback, enabling LLM self-reflection and iterative optimization to autonomously and efficiently produce effective suffixes. Experimental results demonstrate that ECLIPSE achieves an average attack success rate (ASR) of 0.92 across three open-source LLMs and GPT-3.5-Turbo, significantly surpassing GCG in 2.4 times. Moreover, ECLIPSE is on par with template-based methods in ASR while offering superior attack efficiency, reducing the average attack overhead by 83%.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
CooPre: Cooperative Pretraining for V2X Cooperative Perception
Authors:
Seth Z. Zhao,
Hao Xiang,
Chenfeng Xu,
Xin Xia,
Bolei Zhou,
Jiaqi Ma
Abstract:
Existing Vehicle-to-Everything (V2X) cooperative perception methods rely on accurate multi-agent 3D annotations. Nevertheless, it is time-consuming and expensive to collect and annotate real-world data, especially for V2X systems. In this paper, we present a self-supervised learning method for V2X cooperative perception, which utilizes the vast amount of unlabeled 3D V2X data to enhance the percep…
▽ More
Existing Vehicle-to-Everything (V2X) cooperative perception methods rely on accurate multi-agent 3D annotations. Nevertheless, it is time-consuming and expensive to collect and annotate real-world data, especially for V2X systems. In this paper, we present a self-supervised learning method for V2X cooperative perception, which utilizes the vast amount of unlabeled 3D V2X data to enhance the perception performance. Beyond simply extending the previous pre-training methods for point-cloud representation learning, we introduce a novel self-supervised Cooperative Pretraining framework (termed as CooPre) customized for a collaborative scenario. We point out that cooperative point-cloud sensing compensates for information loss among agents. This motivates us to design a novel proxy task for the 3D encoder to reconstruct LiDAR point clouds across different agents. Besides, we develop a V2X bird-eye-view (BEV) guided masking strategy which effectively allows the model to pay attention to 3D features across heterogeneous V2X agents (i.e., vehicles and infrastructure) in the BEV space. Noticeably, such a masking strategy effectively pretrains the 3D encoder and is compatible with mainstream cooperative perception backbones. Our approach, validated through extensive experiments on representative datasets (i.e., V2X-Real, V2V4Real, and OPV2V), leads to a performance boost across all V2X settings. Additionally, we demonstrate the framework's improvements in cross-domain transferability, data efficiency, and robustness under challenging scenarios. The code will be made publicly available.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network
Authors:
Zijian Zhao,
Tingwei Chen,
Zhijie Cai,
Xiaoyang Li,
Hang Li,
Qimei Chen,
Guangxu Zhu
Abstract:
In recent years, Wi-Fi sensing has garnered significant attention due to its numerous benefits, such as privacy protection, low cost, and penetration ability. Extensive research has been conducted in this field, focusing on areas such as gesture recognition, people identification, and fall detection. However, many data-driven methods encounter challenges related to domain shift, where the model fa…
▽ More
In recent years, Wi-Fi sensing has garnered significant attention due to its numerous benefits, such as privacy protection, low cost, and penetration ability. Extensive research has been conducted in this field, focusing on areas such as gesture recognition, people identification, and fall detection. However, many data-driven methods encounter challenges related to domain shift, where the model fails to perform well in environments different from the training data. One major factor contributing to this issue is the limited availability of Wi-Fi sensing datasets, which makes models learn excessive irrelevant information and over-fit to the training set. Unfortunately, collecting large-scale Wi-Fi sensing datasets across diverse scenarios is a challenging task. To address this problem, we propose CrossFi, a siamese network-based approach that excels in both in-domain scenario and cross-domain scenario, including few-shot, zero-shot scenarios, and even works in few-shot new-class scenario where testing set contains new categories. The core component of CrossFi is a sample-similarity calculation network called CSi-Net, which improves the structure of the siamese network by using an attention mechanism to capture similarity information, instead of simply calculating the distance or cosine similarity. Based on it, we develop an extra Weight-Net that can generate a template for each class, so that our CrossFi can work in different scenarios. Experimental results demonstrate that our CrossFi achieves state-of-the-art performance across various scenarios. In gesture recognition task, our CrossFi achieves an accuracy of 98.17% in in-domain scenario, 91.72% in one-shot cross-domain scenario, 64.81% in zero-shot cross-domain scenario, and 84.75% in one-shot new-class scenario. To facilitate future research, we will release the code for our model upon publication.
△ Less
Submitted 20 August, 2024; v1 submitted 20 August, 2024;
originally announced August 2024.
-
MambaEVT: Event Stream based Visual Object Tracking using State Space Model
Authors:
Xiao Wang,
Chao wang,
Shiao Wang,
Xixi Wang,
Zhicheng Zhao,
Lin Zhu,
Bo Jiang
Abstract:
Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object locali…
▽ More
Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object localization. In this paper, we propose a novel Mamba-based visual tracking framework that adopts the state space model with linear complexity as a backbone network. The search regions and target template are fed into the vision Mamba network for simultaneous feature extraction and interaction. The output tokens of search regions will be fed into the tracking head for target localization. More importantly, we consider introducing a dynamic template update strategy into the tracking framework using the Memory Mamba network. By considering the diversity of samples in the target template library and making appropriate adjustments to the template memory module, a more effective dynamic template can be integrated. The effective combination of dynamic and static templates allows our Mamba-based tracking algorithm to achieve a good balance between accuracy and computational cost on multiple large-scale datasets, including EventVOT, VisEvent, and FE240hz. The source code will be released on https://github.com/Event-AHU/MambaEVT
△ Less
Submitted 19 August, 2024;
originally announced August 2024.