subscribe to arXiv mailings

AutoJournaling: A Context-Aware Journaling System Leveraging MLLMs on Smartphone Screenshots

Authors: Tianyi Zhang, Shiquan Zhang, Le Fang, Hong Jia, Vassilis Kostakos, Simon D'Alfonso

Abstract: Journaling offers significant benefits, including fostering self-reflection, enhancing writing skills, and aiding in mood monitoring. However, many people abandon the practice because traditional journaling is time-consuming, and detailed life events may be overlooked if not recorded promptly. Given that smartphones are the most widely used devices for entertainment, work, and socialization, they… ▽ More Journaling offers significant benefits, including fostering self-reflection, enhancing writing skills, and aiding in mood monitoring. However, many people abandon the practice because traditional journaling is time-consuming, and detailed life events may be overlooked if not recorded promptly. Given that smartphones are the most widely used devices for entertainment, work, and socialization, they present an ideal platform for innovative approaches to journaling. Despite their ubiquity, the potential of using digital phenotyping, a method of unobtrusively collecting data from digital devices to gain insights into psychological and behavioral patterns, for automated journal generation has been largely underexplored. In this study, we propose AutoJournaling, the first-of-its-kind system that automatically generates journals by collecting and analyzing screenshots from smartphones. This system captures life events and corresponding emotions, offering a novel approach to digital phenotyping. We evaluated AutoJournaling by collecting screenshots every 3 seconds from three students over five days, demonstrating its feasibility and accuracy. AutoJournaling is the first framework to utilize seamlessly collected screenshots for journal generation, providing new insights into psychological states through digital phenotyping. △ Less

Submitted 15 September, 2024; originally announced September 2024.

arXiv:2409.05881 [pdf, other]

Thermodynamics for Reduced Models of Breakable Amyloid Filaments Based on Maximum Entropy Principle

Authors: Xinyu Zhang, Haiyang Jia, Wuyue Yang, Liangrong Peng, Liu Hong

Abstract: Amyloid filaments are associated with neurodegenerative diseases such as Alzheimer's and Parkinson's. Simplified models of amyloid aggregation are crucial because the original mass-action equations involve numerous variables, complicating analysis and understanding. While dynamical aspects of simplified models have been widely studied, their thermodynamic properties are less understood. In this st… ▽ More Amyloid filaments are associated with neurodegenerative diseases such as Alzheimer's and Parkinson's. Simplified models of amyloid aggregation are crucial because the original mass-action equations involve numerous variables, complicating analysis and understanding. While dynamical aspects of simplified models have been widely studied, their thermodynamic properties are less understood. In this study, we explore the Maximum Entropy Principle (MEP)-reduced models, initially developed for dynamical analysis, from a brand-new thermodynamic perspective. Analytical expressions along with numerical simulations demonstrate that the discrete MEP-reduced model strictly retains laws of thermodynamics, which holds true even when filament lengths transit from discrete values to continuous real numbers. Our findings not only clarify the thermodynamic consistency between the MEP-reduced models and the original models of amyloid filaments for the first time, but also suggest avenues for future research into the model-reduction thermodynamics. △ Less

Submitted 25 August, 2024; originally announced September 2024.

Comments: 23 pages, 3 figures

arXiv:2409.03308 [pdf, ps, other]

The Dirichlet problem for a class of curvature equations in Minkowski space

Authors: Mengru Guo, Heming Jiao

Abstract: In this paper, we study the Dirichlet problem for a class of prescribed curvature equations in Minkowski space. We prove the existence of smooth spacelike hypersurfaces with a class of prescribed curvature and general boundary data based on establishing the \emph{a priori} $C^2$ estimates. In this paper, we study the Dirichlet problem for a class of prescribed curvature equations in Minkowski space. We prove the existence of smooth spacelike hypersurfaces with a class of prescribed curvature and general boundary data based on establishing the \emph{a priori} $C^2$ estimates. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2408.16498 [pdf, other]

A Survey on Evaluating Large Language Models in Code Generation Tasks

Authors: Liguo Chen, Qi Guo, Hongrui Jia, Zhengran Zeng, Xin Wang, Yijiang Xu, Jian Wu, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang

Abstract: This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applicatio… ▽ More This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applications in code generation. Next, it details various methods and metrics for assessing the code generation capabilities of LLMs, including code correctness, efficiency, readability, and evaluation methods based on expert review and user experience. The paper also evaluates the widely used benchmark datasets, identifying their limitations and proposing directions for future improvements. Specifically, the paper analyzes the performance of code generation models across different tasks by combining multiple evaluation metrics, such as code compilation/interpretation success rates, unit test pass rates, and performance and efficiency metrics, to comprehensively assess the practical application of LLMs in code generation. Finally, the paper discusses the challenges faced in evaluating LLMs in code generation, particularly how to ensure the comprehensiveness and accuracy of evaluation methods and how to adapt to the evolving practices of software development. These analyses and discussions provide valuable insights for further optimizing and improving the application of LLMs in code generation tasks. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.13747 [pdf, other]

Sharp Asymptotic Stability of Blasius Profile in the Steady Prandtl Equation

Authors: Hao Jia, Zhen Lei, Cheng Yuan

Abstract: This work presents an asymptotic stability result concerning the self-similar Blasius profiles $[\bar{u}, \bar{v}]$ of the stationary Prandtl boundary layer equation. Initially demonstrated by Serrin \cite{MR0282585}, the profiles $[\bar{u}, \bar{v}]$ were shown to act as a self-similar attractor of solutions $[u, v]$ to the Prandtl equation through the use of von Mises transform and maximal princ… ▽ More This work presents an asymptotic stability result concerning the self-similar Blasius profiles $[\bar{u}, \bar{v}]$ of the stationary Prandtl boundary layer equation. Initially demonstrated by Serrin \cite{MR0282585}, the profiles $[\bar{u}, \bar{v}]$ were shown to act as a self-similar attractor of solutions $[u, v]$ to the Prandtl equation through the use of von Mises transform and maximal principle techniques. Specifically, as $x \to \infty$, $\|u - \bar{u}\|_{L^{\infty}_{y}} \to 0$. Iyer \cite{MR4097332} employed refined energy methods to derive an explicit convergence rate for initial data close to Blasius. Wang and Zhang \cite{MR4657422} utilized barrier function methods, removing smallness assumptions but imposing stronger asymptotic conditions on the initial data. It was suggested that the optimal convergence rate should be $\|u-\bar{u}\|_{L^{\infty}_{y}}\lesssim (x+1)^{-\frac{1}{2}}$, treating the stationary Prandtl equation as a 1-D parabolic equation in the entire space. In this study, we establish that $\|u - \bar{u}\|_{L^{\infty}_{y}} \lesssim (x+1)^{-1}$. Our proof relies on discovering nearly conserved low-frequency quantities and inherent degenerate structures at the boundary, which enhance the convergence rate through iteration techniques. Notably, the convergence rate we have demonstrated is optimal. We can find special solutions of Prandtl's equation such that the convergence between the solutions and the Blasius profile is exact, represented as $ (x+1)^{-1} $. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.12297 [pdf]

The Critical Metallization of Hydrogen in Pressurized LaBeH8 Hydride

Authors: Zihan Zhang, Tian Cui, Yansun Yao, Tianchen Ma, Qiwen Jiang, Haojun Jia, Bartomeu Monserrat, Chris J. Pickard, Defang Duan

Abstract: Behaviours of hydrogen, such as fluidity and metallicity, are crucial for our understanding of planetary interiors and the emerging field of high-temperature superconducting hydrides. These behaviours were discovered in complex phase diagrams of hydrogen and hydrides, however, the transition mechanism of behaviours driven by temperature, pressure and chemical compression remain unclear, particular… ▽ More Behaviours of hydrogen, such as fluidity and metallicity, are crucial for our understanding of planetary interiors and the emerging field of high-temperature superconducting hydrides. These behaviours were discovered in complex phase diagrams of hydrogen and hydrides, however, the transition mechanism of behaviours driven by temperature, pressure and chemical compression remain unclear, particularly in the processes of metallization. Until now, a comprehensive theoretical framework to quantify atomization and metallization of hydrogen in phase diagram of hydrides has been lacking. In this study, we address this gap by combining molecular dynamics and electronic structure analysis to propose a theoretical framework, which clarify the content and properties of atomic hydrogen under various temperature and pressure conditions and chemical compression exerted by non-hydrogen elements in hydrides. Applying this framework to the superconducting hydride LaBeH8, we identify three general hydrogen orderings within its phase diagram: molecular, sublattice and warm hydrogens. During the phase transition from molecule to sublattice, hydrogen exhibits different properties from three general hydrogen orderings, such as fast superionicity, metallicity and unusual atomic content response to temperature. These abnormal behaviours were defined as the critical metallization of hydrogen, which not only suggests a potential synthesis route for the metastable phase but also provides valuable insights into the complex synthetic products of superconducting hydrides. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.11499 [pdf, other]

doi 10.1145/3689635

Power-Domain Interference Graph Estimation for Multi-hop BLE Networks

Authors: Haifeng Jia, Yichen Wei, Yibo Pi, Cailian Chen

Abstract: Traditional wisdom for network management allocates network resources separately for the measurement and communication tasks. Heavy measurement tasks may compete limited resources with communication tasks and significantly degrade overall network performance. It is therefore challenging for the interference graph, deemed as incurring heavy measurement overhead, to be used in practice in wireless n… ▽ More Traditional wisdom for network management allocates network resources separately for the measurement and communication tasks. Heavy measurement tasks may compete limited resources with communication tasks and significantly degrade overall network performance. It is therefore challenging for the interference graph, deemed as incurring heavy measurement overhead, to be used in practice in wireless networks. To address this challenge in wireless sensor networks, our core insight is to use power as a new dimension for interference graph estimation (IGE) such that IGE can be done simultaneously with the communication tasks using the same frequency-time resources. We propose to marry power-domain IGE with concurrent flooding to achieve simultaneous measurement and communication in BLE networks, where the power linearity prerequisite for power-domain IGE holds naturally true in concurrent flooding. With extensive experiments, we conclude the necessary conditions for the power linearity to hold and analyze several nonlinearity issues of power related to hardware imperfections. We design and implement network protocols and power control algorithms for IGE in multi-hop BLE networks and conduct experiments to show that the marriage is mutually beneficial for both IGE and concurrent flooding. Furthermore, we demonstrate the potential of IGE in improving channel map convergence and convergecast in BLE networks. △ Less

Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

Comments: This paper is accepted for publication in the ACM Transactions on Sensor Networks (TOSN), and is an extension of our conference paper accepted at EWSN'23 (arXiv:2312.16807)

arXiv:2408.11467 [pdf, ps, other]

How to Read and Update Coded Distributed Storage Robustly and Optimally?

Authors: Haobo Jia, Zhuqing Jia

Abstract: We consider the problem of robust dynamic coded distributed storage (RDCDS) that is associated with the coded distributed storage of a message with $N$ servers where 1) it suffices to recover the message from the storage at any $R_r$ servers; and 2) each of the servers stores a coded portion of the message that is at most $\frac{1}{K_c}$ the size of the message. The goal is to enable two main func… ▽ More We consider the problem of robust dynamic coded distributed storage (RDCDS) that is associated with the coded distributed storage of a message with $N$ servers where 1) it suffices to recover the message from the storage at any $R_r$ servers; and 2) each of the servers stores a coded portion of the message that is at most $\frac{1}{K_c}$ the size of the message. The goal is to enable two main functionalities: the read operation and the update operation of the message. Specifically, at time slot $t$, the user may execute either the read operation or the update operation, where the read operation allows the user to recover the message from the servers, and the update operation allows the user to update the message to the servers in the form of an additive increment so that any up to $X^{(t)}$ colluding servers reveal nothing about the increment. The two functionalities are robust if at any time slot $t$ 1) they tolerate temporarily dropout servers up to certain thresholds (the read threshold is $R_r$ and the update threshold is denoted as $R_u^{(t)}$); and 2) the user may remain oblivious to prior server states. The communication efficiency is measured by the download cost $C_r^{(t)}$ of the read operation and the upload cost $C_u^{(t)}$ of the update operation. Given $K_c$ and $R_r$, we are curious about the optimal $(R_u^{(t)},C_r^{(t)},C_u^{(t)})$ tuple. In this work, we settle the fundamental limits of RDCDS. In particular, denoting the number of dropout servers at time slot $t$ as $|\mathcal{D}^{(t)}|$, we first show that 1) $R_u^{(t)}\geq N-R_r+\lceil K_c\rceil+X^{(t)}$; and 2) $C_r^{(t)}\geq \frac{N-|\mathcal{D}^{(t)}|}{N-R_r+\lceil K_c\rceil-|\mathcal{D}^{(t)}|}, C_u^{(t)}\geq \frac{N-|\mathcal{D}^{(t)}|}{R_r-X^{(t)}-|\mathcal{D}^{(t)}|}$. Then, inspired by the idea of staircase codes, we construct an RDCDS scheme that simultaneously achieves the above lower bounds. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 40 pages, 3 figures

arXiv:2408.07313 [pdf, other]

doi 10.1145/3675094.3678494

Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health

Authors: Yongquan Hu, Shuning Zhang, Ting Dang, Hong Jia, Flora D. Salim, Wen Hu, Aaron J. Quigley

Abstract: Integrating physiological signals such as electroencephalogram (EEG), with other data such as interview audio, may offer valuable multimodal insights into psychological states or neurological disorders. Recent advancements with Large Language Models (LLMs) position them as prospective ``health agents'' for mental health assessment. However, current research predominantly focus on single data modal… ▽ More Integrating physiological signals such as electroencephalogram (EEG), with other data such as interview audio, may offer valuable multimodal insights into psychological states or neurological disorders. Recent advancements with Large Language Models (LLMs) position them as prospective ``health agents'' for mental health assessment. However, current research predominantly focus on single data modalities, presenting an opportunity to advance understanding through multimodal data. Our study aims to advance this approach by investigating multimodal data using LLMs for mental health assessment, specifically through zero-shot and few-shot prompting. Three datasets are adopted for depression and emotion classifications incorporating EEG, facial expressions, and audio (text). The results indicate that multimodal information confers substantial advantages over single modality approaches in mental health assessment. Notably, integrating EEG alongside commonly used LLM modalities such as audio and images demonstrates promising potential. Moreover, our findings reveal that 1-shot learning offers greater benefits compared to zero-shot learning methods. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 6 pages; UbiComp Companion '24, Companion of the 2024 ACM International Joint Conference on Pervasive and Ubiquitous Computing, October 5--9, 2024}{Melbourne, VIC, Australia

arXiv:2408.05208 [pdf, other]

Holographic thermal correlators and quasinormal modes from semiclassical Virasoro blocks

Authors: Hewei Frederic Jia, Mukund Rangamani

Abstract: Motivated by its relevance for thermal correlators in strongly coupled holographic CFTs, we refine and further develop a recent exact analytic approach to black hole perturbation problem, based on the semiclassical Virasoro blocks, or equivalently via AGT relation, the Nekrasov partition functions in the Nekrasov-Shatashvili limit. Focusing on asymptotically $\text{AdS}_5$ black hole backgrounds,… ▽ More Motivated by its relevance for thermal correlators in strongly coupled holographic CFTs, we refine and further develop a recent exact analytic approach to black hole perturbation problem, based on the semiclassical Virasoro blocks, or equivalently via AGT relation, the Nekrasov partition functions in the Nekrasov-Shatashvili limit. Focusing on asymptotically $\text{AdS}_5$ black hole backgrounds, we derive new universal exact expressions for holographic thermal two-point functions, both for scalar operators and conserved currents. Relatedly, we also obtain exact quantization conditions of the associated quasinormal modes (QNMs). Our expressions for the holographic $\text{CFT}_4$ closely resemble the well-known results for 2d thermal CFTs on $\mathbb{R}^{1,1}$. This structural similarity stems from the locality of fusion transformation for Virasoro blocks. We provide numerical checks of our quantization conditions for QNMs. Additionally, we discuss the application of our results to understand specific physical properties of QNMs, including their near-extremal and asymptotic limits. The latter is related to a certain large-momentum regime of semiclassical Virasoro blocks dual to Seiberg-Witten prepotentials. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 69 pages, 3 figures

arXiv:2407.18715 [pdf, other]

BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation

Authors: Peng Hao, Xiaobing Wang, Yingying Jiang, Hanchao Jia, Xiaoshuai Hao

Abstract: Scene Graph Generation (SGG) remains a challenging task due to its compositional property. Previous approaches improve prediction efficiency by learning in an end-to-end manner. However, these methods exhibit limited performance as they assume unidirectional conditioning between entities and predicates, leading to insufficient information interaction. To address this limitation, we propose a novel… ▽ More Scene Graph Generation (SGG) remains a challenging task due to its compositional property. Previous approaches improve prediction efficiency by learning in an end-to-end manner. However, these methods exhibit limited performance as they assume unidirectional conditioning between entities and predicates, leading to insufficient information interaction. To address this limitation, we propose a novel bidirectional conditioning factorization for SGG, introducing efficient interaction between entities and predicates. Specifically, we develop an end-to-end scene graph generation model, Bidirectional Conditioning Transformer (BCTR), to implement our factorization. BCTR consists of two key modules. First, the Bidirectional Conditioning Generator (BCG) facilitates multi-stage interactive feature augmentation between entities and predicates, enabling mutual benefits between the two predictions. Second, Random Feature Alignment (RFA) regularizes the feature space by distilling multi-modal knowledge from pre-trained models, enhancing BCTR's ability on tailed categories without relying on statistical priors. We conduct a series of experiments on Visual Genome and Open Image V6, demonstrating that BCTR achieves state-of-the-art performance on both benchmarks. The code will be available upon acceptance of the paper. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 9 pages, 3 figures

arXiv:2407.13202 [pdf, ps, other]

Scaled packing pressures on subsets for amenable group actions

Authors: Zubiao Xiao, Hongwei Jia, Zhengyu Yin

Abstract: In this paper, we study the properties of the scaled packing topological pressures for topological dynamical system $(X,G)$, where $G$ is a countable discrete infinite amenable group. We show that the scaled packing topological pressures can be determined by the scaled Bowen topological pressures. We obtain Billingsley's Theorem for the scaled packing pressures with a $G$-action. Then we get a var… ▽ More In this paper, we study the properties of the scaled packing topological pressures for topological dynamical system $(X,G)$, where $G$ is a countable discrete infinite amenable group. We show that the scaled packing topological pressures can be determined by the scaled Bowen topological pressures. We obtain Billingsley's Theorem for the scaled packing pressures with a $G$-action. Then we get a variational principle between the scaled packing pressures and the scaled measure-theoretic upper local pressures. Finally, we give some restrictions on the scaled sequence $\mathbf{b}$, then in the case of the set $X_μ$ of generic points, we prove that $$P^{P}(X_μ,\left\{F_{n}\right\},f,\mathbf{b})=h_μ(X)+\int_{X} f \mathrm{d}μ,$$ if $\left\{F_{n}\right\}$ is tempered and $μ$ is a $G$-invariant ergodic Borel probability measure. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 30pages

MSC Class: 11K55; 28D20; 37A15

arXiv:2407.08240 [pdf, other]

Leveraging LLMs to Predict Affective States via Smartphone Sensor Features

Authors: Tianyi Zhang, Songyan Teng, Hong Jia, Simon D'Alfonso

Abstract: As mental health issues for young adults present a pressing public health concern, daily digital mood monitoring for early detection has become an important prospect. An active research area, digital phenotyping, involves collecting and analysing data from personal digital devices such as smartphones (usage and sensors) and wearables to infer behaviours and mental health. Whilst this data is stand… ▽ More As mental health issues for young adults present a pressing public health concern, daily digital mood monitoring for early detection has become an important prospect. An active research area, digital phenotyping, involves collecting and analysing data from personal digital devices such as smartphones (usage and sensors) and wearables to infer behaviours and mental health. Whilst this data is standardly analysed using statistical and machine learning approaches, the emergence of large language models (LLMs) offers a new approach to make sense of smartphone sensing data. Despite their effectiveness across various domains, LLMs remain relatively unexplored in digital mental health, particularly in integrating mobile sensor data. Our study aims to bridge this gap by employing LLMs to predict affect outcomes based on smartphone sensing data from university students. We demonstrate the efficacy of zero-shot and few-shot embedding LLMs in inferring general wellbeing. Our findings reveal that LLMs can make promising predictions of affect measures using solely smartphone sensing data. This research sheds light on the potential of LLMs for affective state prediction, emphasizing the intricate link between smartphone behavioral patterns and affective states. To our knowledge, this is the first work to leverage LLMs for affective state prediction and digital phenotyping tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.06153 [pdf, other]

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundaries of these existing methods. To bridge this gap, we conducted an extensive empirical study evaluating the performance of three leading closed-source LLMs and four popular open-source LLMs on three commonly used benchmarks. Our investigation, which evaluated the length, cyclomatic complexity and API number of the generated code, revealed that these LLMs face challenges in generating successful code for more complex problems, and tend to produce code that is shorter yet more complicated as compared to canonical solutions. Additionally, we developed a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. Furthermore, to better understand the performance of LLMs in real-world projects, we manually created a real-world benchmark comprising 140 code generation tasks. Our analysis highlights distinct differences in bug distributions between actual scenarios and existing benchmarks. Finally, we propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback. Experimental results demonstrate that our approach can significantly mitigate bugs and increase the passing rate by 29.2% after two iterations, indicating substantial potential for LLMs to handle more complex problems. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 17 pages, 7 figures

arXiv:2407.05795 [pdf]

HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels

Authors: Yingying Jiang, Hanchao Jia, Xiaobing Wang, Peng Hao

Abstract: Composed Image Retrieval (CIR) aims to retrieve images based on a query image with text. Current Zero-Shot CIR (ZS-CIR) methods try to solve CIR tasks without using expensive triplet-labeled training datasets. However, the gap between ZS-CIR and triplet-supervised CIR is still large. In this work, we propose Hybrid CIR (HyCIR), which uses synthetic labels to boost the performance of ZS-CIR. A new… ▽ More Composed Image Retrieval (CIR) aims to retrieve images based on a query image with text. Current Zero-Shot CIR (ZS-CIR) methods try to solve CIR tasks without using expensive triplet-labeled training datasets. However, the gap between ZS-CIR and triplet-supervised CIR is still large. In this work, we propose Hybrid CIR (HyCIR), which uses synthetic labels to boost the performance of ZS-CIR. A new label Synthesis pipeline for CIR (SynCir) is proposed, in which only unlabeled images are required. First, image pairs are extracted based on visual similarity. Second, query text is generated for each image pair based on vision-language model and LLM. Third, the data is further filtered in language space based on semantic similarity. To improve ZS-CIR performance, we propose a hybrid training strategy to work with both ZS-CIR supervision and synthetic CIR triplets. Two kinds of contrastive learning are adopted. One is to use large-scale unlabeled image dataset to learn an image-to-text mapping with good generalization. The other is to use synthetic CIR triplets to learn a better mapping for CIR tasks. Our approach achieves SOTA zero-shot performance on the common CIR benchmarks: CIRR and CIRCO. △ Less

Submitted 8 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: 8 pages, 5 figures

arXiv:2407.04418 [pdf, other]

Enabling On-Device LLMs Personalization with Smartphone Sensing

Authors: Shiquan Zhang, Ying Ma, Le Fang, Hong Jia, Simon D'Alfonso, Vassilis Kostakos

Abstract: This demo presents a novel end-to-end framework that combines on-device large language models (LLMs) with smartphone sensing technologies to achieve context-aware and personalized services. The framework addresses critical limitations of current personalization solutions via cloud LLMs, such as privacy concerns, latency and cost, and limited personal information. To achieve this, we innovatively p… ▽ More This demo presents a novel end-to-end framework that combines on-device large language models (LLMs) with smartphone sensing technologies to achieve context-aware and personalized services. The framework addresses critical limitations of current personalization solutions via cloud LLMs, such as privacy concerns, latency and cost, and limited personal information. To achieve this, we innovatively proposed deploying LLMs on smartphones with multimodal sensor data through context-aware sensing and customized prompt engineering, ensuring privacy and enhancing personalization performance. A case study involving a university student demonstrated the capability of the framework to provide tailored recommendations. In addition, we show that the framework achieves the best trade-off in privacy, performance, latency, cost, battery and energy consumption between on-device and cloud LLMs. To the best of our knowledge, this is the first framework to provide on-device LLMs personalization with smartphone sensing. Future work will incorporate more diverse sensor data and involve extensive user studies to enhance personalization. Our proposed framework has the potential to substantially improve user experiences across domains including healthcare, productivity, and entertainment. △ Less

Submitted 23 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

Comments: 5 pages, 3 figures, conference demo paper

arXiv:2407.03063 [pdf, other]

ScreenTK: Seamless Detection of Time-Killing Moments Using Continuous Mobile Screen Text and On-Device LLMs

Authors: Le Fang, Shiquan Zhang, Hong Jia, Jorge Goncalves, Vassilis Kostakos

Abstract: Smartphones have become essential to people's digital lives, providing a continuous stream of information and connectivity. However, this constant flow can lead to moments where users are simply passing time rather than engaging meaningfully. This underscores the importance of developing methods to identify these "time-killing" moments, enabling the delivery of important notifications in a way tha… ▽ More Smartphones have become essential to people's digital lives, providing a continuous stream of information and connectivity. However, this constant flow can lead to moments where users are simply passing time rather than engaging meaningfully. This underscores the importance of developing methods to identify these "time-killing" moments, enabling the delivery of important notifications in a way that minimizes interruptions and enhances user engagement. Recent work has utilized screenshots taken every 5 seconds to detect time-killing activities on smartphones. However, this method often misses to capture phone usage between intervals. We demonstrate that up to 50% of time-killing instances go undetected using screenshots, leading to substantial gaps in understanding user behavior. To address this limitation, we propose a method called ScreenTK that detects time-killing moments by leveraging continuous screen text monitoring and on-device large language models (LLMs). Screen text contains more comprehensive information than screenshots and allows LLMs to summarize detailed phone usage. To verify our framework, we conducted experiments with six participants, capturing 1,034 records of different time-killing moments. Initial results show that our framework outperforms state-of-the-art solutions by 38% in our case study. △ Less

Submitted 24 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.00440 [pdf]

Three-dimensional non-reciprocal transport in photonic topological heterostructure of arbitrary shape

Authors: Mudi Wang, Ruo-Yang Zhang, Chenyu Zhang, Haoran Xue, Hongwei Jia, Jing Hu, Dongyang Wang, Tianshu Jiang, C. T. Chan

Abstract: Electromagnetic wave propagation in three-dimensional space typically suffers omnidirectional scattering when encountering obstacles. In this study, we employed Chern vectors to construct a topological heterostructure, where large-volume non-reciprocal topological transport in three-dimension is achieved. The shape of the cross-section in the heterostructure can be arbitrary designed, and we exper… ▽ More Electromagnetic wave propagation in three-dimensional space typically suffers omnidirectional scattering when encountering obstacles. In this study, we employed Chern vectors to construct a topological heterostructure, where large-volume non-reciprocal topological transport in three-dimension is achieved. The shape of the cross-section in the heterostructure can be arbitrary designed, and we experimentally observed the distinctive cross-shaped field pattern transport, non-reciprocal energy harvesting, and most importantly, the remarkable ability of electromagnetic wave to traverse obstacles and abrupt structure changes without encountering reflections in 3D space. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 17 pages, 3 figures

arXiv:2406.18947 [pdf, ps, other]

The Variable Muckenhoupt Weight Revisited

Authors: Hongchao Jia, Xianjie Yan

Abstract: Let $p(\cdot):\ \mathbb R^n\to(0,\infty)$ be a variable exponent function and $X$ a ball quasi-Banach function space. In this paper, we first study the relationship between two kinds of variable weights $\mathcal{W}_{p(\cdot)}(\mathbb{R}^n)$ and $A_{p(\cdot)}(\mathbb{R}^n)$. Then, by regarding the weighted variable Lebesgue space $L^{p(\cdot)}_ω(\mathbb{R}^n)$ with… ▽ More Let $p(\cdot):\ \mathbb R^n\to(0,\infty)$ be a variable exponent function and $X$ a ball quasi-Banach function space. In this paper, we first study the relationship between two kinds of variable weights $\mathcal{W}_{p(\cdot)}(\mathbb{R}^n)$ and $A_{p(\cdot)}(\mathbb{R}^n)$. Then, by regarding the weighted variable Lebesgue space $L^{p(\cdot)}_ω(\mathbb{R}^n)$ with $ω\in\mathcal{W}_{p(\cdot)}(\mathbb{R}^n)$ as a special case of $X$ and applying known results of the Hardy-type space $H_{X}(\mathbb{R}^n)$ associated with $X$, we further obtain several equivalent characterizations of the weighted variable Hardy space $H^{p(\cdot)}_ω(\rn)$ and the boundedness of some sublinear operators on $H^{p(\cdot)}_ω(\rn)$. All of these results coincide with or improve existing ones, or are completely new. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 25 pages

MSC Class: 42B25; 42B30; 42B20; 46E30

arXiv:2406.18900 [pdf, other]

The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges

Authors: Okan Bulut, Maggie Beiting-Parrish, Jodi M. Casabianca, Sharon C. Slater, Hong Jiao, Dan Song, Christopher M. Ormerod, Deborah Gbemisola Fabiyi, Rodica Ivan, Cole Walsh, Oscar Rios, Joshua Wilson, Seyma N. Yildirim-Erbasli, Tarid Wongvorachan, Joyce Xinle Liu, Bin Tan, Polina Morilova

Abstract: The integration of artificial intelligence (AI) in educational measurement has revolutionized assessment methods, enabling automated scoring, rapid content analysis, and personalized feedback through machine learning and natural language processing. These advancements provide timely, consistent feedback and valuable insights into student performance, thereby enhancing the assessment experience. Ho… ▽ More The integration of artificial intelligence (AI) in educational measurement has revolutionized assessment methods, enabling automated scoring, rapid content analysis, and personalized feedback through machine learning and natural language processing. These advancements provide timely, consistent feedback and valuable insights into student performance, thereby enhancing the assessment experience. However, the deployment of AI in education also raises significant ethical concerns regarding validity, reliability, transparency, fairness, and equity. Issues such as algorithmic bias and the opacity of AI decision-making processes pose risks of perpetuating inequalities and affecting assessment outcomes. Responding to these concerns, various stakeholders, including educators, policymakers, and organizations, have developed guidelines to ensure ethical AI use in education. The National Council of Measurement in Education's Special Interest Group on AI in Measurement and Education (AIME) also focuses on establishing ethical standards and advancing research in this area. In this paper, a diverse group of AIME members examines the ethical implications of AI-powered tools in educational measurement, explores significant challenges such as automation bias and environmental impact, and proposes solutions to ensure AI's responsible and effective use in education. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 59 pages, 3 figures, a joint work of the Special Interest Group on Artificial Intelligence in Measurement and Education (AIME) from the National Council of Measurement in Education (NCME)

arXiv:2406.12917 [pdf, other]

doi 10.1117/12.3019835

The Black Hole Explorer: Motivation and Vision

Authors: Michael D. Johnson, Kazunori Akiyama, Rebecca Baturin, Bryan Bilyeu, Lindy Blackburn, Don Boroson, Alejandro Cardenas-Avendano, Andrew Chael, Chi-kwan Chan, Dominic Chang, Peter Cheimets, Cathy Chou, Sheperd S. Doeleman, Joseph Farah, Peter Galison, Ronald Gamble, Charles F. Gammie, Zachary Gelles, Jose L. Gomez, Samuel E. Gralla, Paul Grimes, Leonid I. Gurvits, Shahar Hadar, Kari Haworth, Kazuhiro Hada , et al. (43 additional authors not shown)

Abstract: We present the Black Hole Explorer (BHEX), a mission that will produce the sharpest images in the history of astronomy by extending submillimeter Very-Long-Baseline Interferometry (VLBI) to space. BHEX will discover and measure the bright and narrow "photon ring" that is predicted to exist in images of black holes, produced from light that has orbited the black hole before escaping. This discovery… ▽ More We present the Black Hole Explorer (BHEX), a mission that will produce the sharpest images in the history of astronomy by extending submillimeter Very-Long-Baseline Interferometry (VLBI) to space. BHEX will discover and measure the bright and narrow "photon ring" that is predicted to exist in images of black holes, produced from light that has orbited the black hole before escaping. This discovery will expose universal features of a black hole's spacetime that are distinct from the complex astrophysics of the emitting plasma, allowing the first direct measurements of a supermassive black hole's spin. In addition to studying the properties of the nearby supermassive black holes M87* and Sgr A*, BHEX will measure the properties of dozens of additional supermassive black holes, providing crucial insights into the processes that drive their creation and growth. BHEX will also connect these supermassive black holes to their relativistic jets, elucidating the power source for the brightest and most efficient engines in the universe. BHEX will address fundamental open questions in the physics and astrophysics of black holes that cannot be answered without submillimeter space VLBI. The mission is enabled by recent technological breakthroughs, including the development of ultra-high-speed downlink using laser communications, and it leverages billions of dollars of existing ground infrastructure. We present the motivation for BHEX, its science goals and associated requirements, and the pathway to launch within the next decade. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Proceedings for SPIE Astronomical Telescopes and Instrumentation

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.06443 [pdf, other]

LLM Dataset Inference: Did you train on my dataset?

Authors: Pratyush Maini, Hengrui Jia, Nicolas Papernot, Adam Dziedzic

Abstract: The proliferation of large language models (LLMs) in the real world has come with a rise in copyright cases against companies for training their models on unlicensed data from the internet. Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of… ▽ More The proliferation of large language models (LLMs) in the real world has come with a rise in copyright cases against companies for training their models on unlicensed data from the internet. Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of these MIAs is confounded by selecting non-members (text sequences not used for training) belonging to a different distribution from the members (e.g., temporally shifted recent Wikipedia articles compared with ones used to train the model). This distribution shift makes membership inference appear successful. However, most MIA methods perform no better than random guessing when discriminating between members and non-members from the same distribution (e.g., in this case, the same period of time). Even when MIAs work, we find that different MIAs succeed at inferring membership of samples from different distributions. Instead, we propose a new dataset inference method to accurately identify the datasets used to train large language models. This paradigm sits realistically in the modern-day copyright landscape, where authors claim that an LLM is trained over multiple documents (such as a book) written by them, rather than one particular paragraph. While dataset inference shares many of the challenges of membership inference, we solve it by selectively combining the MIAs that provide positive signal for a given distribution, and aggregating them to perform a statistical test on a given dataset. Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values < 0.1, without any false positives. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Code is available at \href{https://github.com/pratyushmaini/llm_dataset_inference/

arXiv:2406.04594 [pdf, other]

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Authors: Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the… ▽ More The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the training tasks. The inability to quickly identify the faulty components results in a substantial waste of GPU resources. Secondly, since GPUs must wait for parameter synchronization to complete before proceeding to the next round of computation, network congestions can greatly increase the waiting time for GPUs. To address these challenges, this paper introduces a communication-driven solution, namely the C4. The key insights of C4 are two folds. First, in parallel training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies are certainly due to some form of hardware malfunction. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving few large flows, allows C4 to efficiently execute traffic planning, substantially reducing network congestion. C4 has been extensively implemented across our production systems, cutting error-induced overhead by roughly 30% and enhancing runtime performance by about 15% for certain applications with moderate communication costs. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03929 [pdf, ps, other]

Ringing Thick Braneworld with Finite Extra Dimension

Authors: Hai-Long Jia, Wen-Di Guo, Qin Tan, Yu-Xiao Liu

Abstract: In this work, we investigate the quasinormal modes of the Poincaré thick brane with a finite extra dimension. Unlike the case with an infinite extra dimension, the gravitational effective potential exhibits three distinct shapes within different ranges of the parameter $n$ in the warp factor: harmonic oscillator potential, Pöschl-Teller potential, and volcano-like potential. We then study various… ▽ More In this work, we investigate the quasinormal modes of the Poincaré thick brane with a finite extra dimension. Unlike the case with an infinite extra dimension, the gravitational effective potential exhibits three distinct shapes within different ranges of the parameter $n$ in the warp factor: harmonic oscillator potential, Pöschl-Teller potential, and volcano-like potential. We then study various types of perturbations in this system. Utilizing a combination of analytical, semi-analytical, and numerical methods, we obtain the quasinormal modes of the perturbed fields. Our findings reveal a set of discrete quasinormal modes for the thick brane, similar to those of black holes. Interestingly, when $n=1$, the quasinormal modes exhibit purely imaginary behavior. This study may provide a new way to detect the existence of extra dimensions. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.01014 [pdf, other]

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Authors: Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

Abstract: Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the tw… ▽ More Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the two major navigation challenges in mobile device operation tasks, task progress navigation and focus content navigation, are significantly complicated under the single-agent architecture of existing work. This is due to the overly long token sequences and the interleaved text-image data format, which limit performance. To address these navigation challenges effectively, we propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. The planning agent generates task progress, making the navigation of history operations more efficient. To retain focus content, we design a memory unit that updates with task progress. Additionally, to correct erroneous operations, the reflection agent observes the outcomes of each operation and handles any mistakes accordingly. Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent. The code is open-sourced at https://github.com/X-PLUG/MobileAgent. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 22 pages, 11 figures, 10 Tables

arXiv:2406.00659 [pdf, other]

High Performance Operation of a Direct-Current and Superconducting Radio-Frequency Combined Photocathode Gun

Authors: H. Jia, T. Li, T. Wang, Y. Zhao, X. Zhang, H. Xu, Z. Liu, J. Liu, L. Lin, H. Xie, L. Feng, F. Wang, F. Zhu, J. Hao, S. Quan, K. Liu, S. Huang

Abstract: Superconducting radio-frequency (SRF) guns are promising candidates to deliver high brightness continuous-wave (CW) electron beams for new generations of coherent linac light sources, ultrafast electron diffractions, MeV pulsed beam applications, etc. To solve the compatibility problem of semiconductor photocathodes, a hybrid gun combining a direct-current gap and an SRF cavity has been developed.… ▽ More Superconducting radio-frequency (SRF) guns are promising candidates to deliver high brightness continuous-wave (CW) electron beams for new generations of coherent linac light sources, ultrafast electron diffractions, MeV pulsed beam applications, etc. To solve the compatibility problem of semiconductor photocathodes, a hybrid gun combining a direct-current gap and an SRF cavity has been developed. The gun, employing K2CsSb photocathodes driven by a green laser, has been brought into stable CW operation with a dark current below 100 pA, delivering electron beams at an energy gain of 2.4 MeV, an electron bunch charge of 100 pC, and a repetition rate of 1 MHz. A normalized beam emittance of 0.54 mm-mrad has been achieved at the bunch charge of 100 pC and peak current of about 6 A. CW operation at 81.25 MHz repetition rate has also been tested with the maximum average beam current reaching 3 mA. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 6 pages, 5 figures

arXiv:2406.00440 [pdf, other]

Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

Authors: Xuanchen Li, Yuhao Cheng, Xingyu Ren, Haozhe Jia, Di Xu, Wenhan Zhu, Yichao Yan

Abstract: 4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant… ▽ More 4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: https://xuanchenli.github.io/Topo4D/. △ Less

Submitted 15 July, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20641 [pdf, other]

Query Provenance Analysis for Robust and Efficient Query-based Black-box Attack Defense

Authors: Shaofei Li, Ziqi Zhang, Haomin Jia, Ding Li, Yao Guo, Xiangqun Chen

Abstract: Query-based black-box attacks have emerged as a significant threat to machine learning systems, where adversaries can manipulate the input queries to generate adversarial examples that can cause misclassification of the model. To counter these attacks, researchers have proposed Stateful Defense Models (SDMs) for detecting adversarial query sequences and rejecting queries that are "similar" to the… ▽ More Query-based black-box attacks have emerged as a significant threat to machine learning systems, where adversaries can manipulate the input queries to generate adversarial examples that can cause misclassification of the model. To counter these attacks, researchers have proposed Stateful Defense Models (SDMs) for detecting adversarial query sequences and rejecting queries that are "similar" to the history queries. Existing state-of-the-art (SOTA) SDMs (e.g., BlackLight and PIHA) have shown great effectiveness in defending against these attacks. However, recent studies have shown that they are vulnerable to Oracle-guided Adaptive Rejection Sampling (OARS) attacks, which is a stronger adaptive attack strategy. It can be easily integrated with existing attack algorithms to evade the SDMs by generating queries with fine-tuned direction and step size of perturbations utilizing the leaked decision information from the SDMs. In this paper, we propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient SDMs. QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences. To utilize the query provenance, we propose an efficient query provenance analysis algorithm with dynamic management. We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms. The results show that QPA outperforms the baselines in terms of defense effectiveness and efficiency on both non-adaptive and adaptive attacks. Specifically, QPA reduces the Attack Success Rate (ASR) of OARS to 4.08%, comparing to 77.63% and 87.72% for BlackLight and PIHA, respectively. Moreover, QPA also achieves 7.67x and 2.25x higher throughput than BlackLight and PIHA. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.10980 [pdf]

Research on the Quantum confinement of Carriers in the Type-I Quantum Wells Structure

Authors: Xinxin Li, Zhen Deng, Yang Jiang, Chunhua Du, Haiqiang Jia, Wenxin Wang, Hong Chen

Abstract: Quantum confinement is recognized to be an inherent property in low-dimensional structures. Traditionally it is believed that the carriers trapped within the well cannot escape due to the discrete energy levels. However, our previous research has revealed efficient carrier escape in low-dimensional structures, contradicting this conventional understanding. In this study, we review the energy band… ▽ More Quantum confinement is recognized to be an inherent property in low-dimensional structures. Traditionally it is believed that the carriers trapped within the well cannot escape due to the discrete energy levels. However, our previous research has revealed efficient carrier escape in low-dimensional structures, contradicting this conventional understanding. In this study, we review the energy band structure of quantum wells considering it as a superposition of the bulk material dispersion and quantization energy dispersion resulting from the quantum confinement across the whole Brillouin zone. By accounting for all wave vectors, we obtain a certain distribution of carrier energy at each quantization energy level, giving rise to the energy subbands. These results enable carriers to escape from the well under the influence of an electric field. Additionally, we have compiled a comprehensive summary of various energy band scenarios in quantum well structures, relevant to carrier transport. Such a new interpretation holds significant value in deepening our comprehension of low-dimensional energy bands, discovering new physical phenomena, and designing novel devices with superior performance. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 16 pages, 3 figures and 1 table

arXiv:2405.08804 [pdf, other]

Photon Ring Interferometric Signatures Beyond The Universal Regime

Authors: He Jia, Eliot Quataert, Alexandru Lupsasca, George N. Wong

Abstract: We calculate the interferometric signatures of black hole photon rings beyond the universal regime by perturbatively including the effects of finite ring width. Our approach first slices a thick ring into a series of thin rings, each of which falls within the universal regime. We thus calculate the visibility of the thick ring by aggregating the contributions from each thin ring, and then perturba… ▽ More We calculate the interferometric signatures of black hole photon rings beyond the universal regime by perturbatively including the effects of finite ring width. Our approach first slices a thick ring into a series of thin rings, each of which falls within the universal regime. We thus calculate the visibility of the thick ring by aggregating the contributions from each thin ring, and then perturbatively expand the result into polynomials of the baseline length $u$. We show that the visibility amplitude of a thick ring depends on its "center-of-light" diameter; it also includes additional higher-order corrections due to the width of the ring, with the leading correction terms proportional to $u^2$ for the envelope and $u^3$ for the phase. We apply our method to images ray traced from general-relativistic magnetohydrodynamic (GRMHD) simulations and demonstrate that incorporating the higher-order corrections is crucial for accurately modeling the visibility of the first photon ring around M87*. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 10+6 pages, 7+3 figures, to be submitted

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.00438 [pdf, other]

MetaRM: Shifted Distributions Alignment via Meta-Learning

Authors: Shihan Dou, Yan Liu, Enyu Zhou, Tianlong Li, Haoxiang Jia, Limao Xiong, Xin Zhao, Junjie Ye, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract: The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model shifts, leading to the RM's reduced ability to distinguish between responses. This issue is further compounded when the RM, trained on a specific data… ▽ More The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model shifts, leading to the RM's reduced ability to distinguish between responses. This issue is further compounded when the RM, trained on a specific data distribution, struggles to generalize to examples outside of that distribution. These two issues can be united as a challenge posed by the shifted distribution of the environment. To surmount this challenge, we introduce MetaRM, a method leveraging meta-learning to align the RM with the shifted environment distribution. MetaRM is designed to train the RM by minimizing data loss, particularly for data that can improve the differentiation ability to examples of the shifted target distribution. Extensive experiments demonstrate that MetaRM significantly improves the RM's distinguishing ability in iterative RLHF optimization, and also provides the capacity to identify subtle differences in out-of-distribution samples. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 11 pages, 6 figures. arXiv admin note: text overlap with arXiv:2401.06080

arXiv:2405.00428 [pdf, other]

CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection

Authors: Shihan Dou, Yueming Wu, Haoxiang Jia, Yuhao Zhou, Yan Liu, Yang Liu

Abstract: With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement). Therefore, it is important to conduct code clone detection to discover similar code pairs. Many approaches have been proposed to detect code clones where token-base… ▽ More With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement). Therefore, it is important to conduct code clone detection to discover similar code pairs. Many approaches have been proposed to detect code clones where token-based tools can scale to big code. However, due to the lack of program details, they cannot handle more complicated code clones, i.e., semantic code clones. In this paper, we introduce CC2Vec, a novel code encoding method designed to swiftly identify simple code clones while also enhancing the capability for semantic code clone detection. To retain the program details between tokens, CC2Vec divides them into different categories (i.e., typed tokens) according to the syntactic types and then applies two self-attention mechanism layers to encode them. To resist changes in the code structure of semantic code clones, CC2Vec performs contrastive learning to reduce the differences introduced by different code implementations. We evaluate CC2Vec on two widely used datasets (i.e., BigCloneBench and Google Code Jam) and the results report that our method can effectively detect simple code clones. In addition, CC2Vec not only attains comparable performance to widely used semantic code clone detection systems such as ASTNN, SCDetector, and FCCA by simply fine-tuning, but also significantly surpasses these methods in both detection efficiency. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 21 pages, 7 figures

arXiv:2404.17701 [pdf, other]

doi 10.1088/1748-0221/19/08/P08023

Embedded FPGA Developments in 130nm and 28nm CMOS for Machine Learning in Particle Detector Readout

Authors: Julia Gonski, Aseem Gupta, Haoyi Jia, Hyunjoon Kim, Lorenzo Rota, Larry Ruckman, Angelo Dragone, Ryan Herbst

Abstract: Embedded field programmable gate array (eFPGA) technology allows the implementation of reconfigurable logic within the design of an application-specific integrated circuit (ASIC). This approach offers the low power and efficiency of an ASIC along with the ease of FPGA configuration, particularly beneficial for the use case of machine learning in the data pipeline of next-generation collider experi… ▽ More Embedded field programmable gate array (eFPGA) technology allows the implementation of reconfigurable logic within the design of an application-specific integrated circuit (ASIC). This approach offers the low power and efficiency of an ASIC along with the ease of FPGA configuration, particularly beneficial for the use case of machine learning in the data pipeline of next-generation collider experiments. An open-source framework called "FABulous" was used to design eFPGAs using 130 nm and 28 nm CMOS technology nodes, which were subsequently fabricated and verified through testing. The capability of an eFPGA to act as a front-end readout chip was assessed using simulation of high energy particles passing through a silicon pixel sensor. A machine learning-based classifier, designed for reduction of sensor data at the source, was synthesized and configured onto the eFPGA. A successful proof-of-concept was demonstrated through reproduction of the expected algorithm result on the eFPGA with perfect accuracy. Further development of the eFPGA technology and its application to collider detector readout is discussed. △ Less

Submitted 28 August, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

Comments: 16 pages, 12 figures

Journal ref: Journal of Instrumentation, Volume 19, P08023 (August 2024)

arXiv:2404.13991 [pdf, other]

5GC$^2$ache: Improving 5G UPF Performance via Cache Optimization

Authors: Haonan Jia, Meng Wang, Biyi Li, Yirui Liu, Junchen Guo, Pengyu Zhang

Abstract: Last Level Cache (LLC) is a precious and critical resource that impacts the performance of applications running on top of CPUs. In this paper, we reveal the significant impact of LLC on the performance of the 5G user plane function (UPF) when running a cloudified 5G core on general-purposed servers. With extensive measurements showing that the throughput can degrade by over 50\% when the precious… ▽ More Last Level Cache (LLC) is a precious and critical resource that impacts the performance of applications running on top of CPUs. In this paper, we reveal the significant impact of LLC on the performance of the 5G user plane function (UPF) when running a cloudified 5G core on general-purposed servers. With extensive measurements showing that the throughput can degrade by over 50\% when the precious LLC resource of UPF is not properly allocated, we identify three categories of performance degradation caused by incorrect LLC usage: DMA leakage problem, hot/cold mbuf problem and cache contention. To address these problems, we introduce the design and implementation of 5GC$^2$ache that monitors the LLC status as well as the throughput performance and dynamically adjusts key parameters of the LLC resource allocation. Our experiments show that 5GC$^2$ache enables a commercial 5G core to increase its throughput to 76.41Gbps, 39.41\% higher than the original performance and 29.55\% higher than the state-of-the-art. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13430 [pdf, other]

React-OT: Optimal Transport for Generating Transition State in Chemical Reactions

Authors: Chenru Duan, Guan-Horng Liu, Yuanqi Du, Tianrong Chen, Qiyuan Zhao, Haojun Jia, Carla P. Gomes, Evangelos A. Theodorou, Heather J. Kulik

Abstract: Transition states (TSs) are transient structures that are key in understanding reaction mechanisms and designing catalysts but challenging to be captured in experiments. Alternatively, many optimization algorithms have been developed to search for TSs computationally. Yet the cost of these algorithms driven by quantum chemistry methods (usually density functional theory) is still high, posing chal… ▽ More Transition states (TSs) are transient structures that are key in understanding reaction mechanisms and designing catalysts but challenging to be captured in experiments. Alternatively, many optimization algorithms have been developed to search for TSs computationally. Yet the cost of these algorithms driven by quantum chemistry methods (usually density functional theory) is still high, posing challenges for their applications in building large reaction networks for reaction exploration. Here we developed React-OT, an optimal transport approach for generating unique TS structures from reactants and products. React-OT generates highly accurate TS structures with a median structural root mean square deviation (RMSD) of 0.053Å and median barrier height error of 1.06 kcal/mol requiring only 0.4 second per reaction. The RMSD and barrier height error is further improved by roughly 25% through pretraining React-OT on a large reaction dataset obtained with a lower level of theory, GFN2-xTB. We envision the great accuracy and fast inference of React-OT useful in targeting TSs when exploring chemical reactions with unknown mechanisms. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 5 figures, 1 table

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.01941 [pdf, other]

LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Authors: Haoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li

Abstract: Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape… ▽ More Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape from lensless data. In this paper, we propose the first end-to-end framework to recover 3D human poses and shapes from lensless measurements to our knowledge. We specifically design a multi-scale lensless feature decoder to decode the lensless measurements through the optically encoded mask for efficient feature extraction. We also propose a double-head auxiliary supervision mechanism to improve the estimation accuracy of human limb ends. Besides, we establish a lensless imaging system and verify the effectiveness of our method on various datasets acquired by our lensless imaging system. △ Less

Submitted 8 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024. More results available at https://cic.tju.edu.cn/faculty/likun/projects/LPSNet

arXiv:2403.14487 [pdf, other]

DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

Authors: Yueru Jia, Yuhui Yuan, Aosong Cheng, Chuke Wang, Ji Li, Huizhu Jia, Shanghang Zhang

Abstract: Recently, how to achieve precise image editing has attracted increasing attention, especially given the remarkable success of text-to-image generation models. To unify various spatial-aware image editing abilities into one framework, we adopt the concept of layers from the design domain to manipulate objects flexibly with various operations. The key insight is to transform the spatial-aware image… ▽ More Recently, how to achieve precise image editing has attracted increasing attention, especially given the remarkable success of text-to-image generation models. To unify various spatial-aware image editing abilities into one framework, we adopt the concept of layers from the design domain to manipulate objects flexibly with various operations. The key insight is to transform the spatial-aware image editing task into a combination of two sub-tasks: multi-layered latent decomposition and multi-layered latent fusion. First, we segment the latent representations of the source images into multiple layers, which include several object layers and one incomplete background layer that necessitates reliable inpainting. To avoid extra tuning, we further explore the inner inpainting ability within the self-attention mechanism. We introduce a key-masking self-attention scheme that can propagate the surrounding context information into the masked region while mitigating its impact on the regions outside the mask. Second, we propose an instruction-guided latent fusion that pastes the multi-layered latent representations onto a canvas latent. We also introduce an artifact suppression scheme in the latent space to enhance the inpainting quality. Due to the inherent modular advantages of such multi-layered representations, we can achieve accurate image editing, and we demonstrate that our approach consistently surpasses the latest spatial editing methods, including Self-Guidance and DiffEditor. Last, we show that our approach is a unified framework that supports various accurate image editing tasks on more than six different editing tasks. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: technical report, 15 pages, webpage: https://design-edit.github.io/

arXiv:2403.13248 [pdf, other]

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Authors: Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, Lichao Sun

Abstract: Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority bein… ▽ More Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents. △ Less

Submitted 22 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.13104 [pdf, ps, other]

Uniform vorticity depletion and inviscid damping for periodic shear flows in the high Reynolds number regime

Authors: Rajendra Beekie, Shan Chen, Hao Jia

Abstract: We study the dynamics of the two dimensional Navier-Stokes equations linearized around a shear flow on a (non-square) torus which possesses exactly two non-degenerate critical points. We obtain linear inviscid damping and vorticity depletion estimates for the linearized flow that are uniform with respect to the viscosity, and enhanced dissipation type decay estimates. The main task is to understan… ▽ More We study the dynamics of the two dimensional Navier-Stokes equations linearized around a shear flow on a (non-square) torus which possesses exactly two non-degenerate critical points. We obtain linear inviscid damping and vorticity depletion estimates for the linearized flow that are uniform with respect to the viscosity, and enhanced dissipation type decay estimates. The main task is to understand the associated Rayleigh and Orr-Sommerfeld equations, under the natural assumption that the linearized operator around the shear flow in the inviscid case has no discrete eigenvalues. The key difficulty is to understand the behavior of the solution to Orr-Sommerfeld equations in three distinct regimes depending on the spectral parameter: the non-degenerate case when the spectral parameter is away from the critical values, the intermediate case when the spectral parameter is close to but still separated from the critical values, and the most singular case when the spectral parameter is inside the viscous layer. △ Less

Submitted 26 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 70 pages; comments welcome; Several typos and small technical glitches fixed

arXiv:2403.12363 [pdf, other]

E-DoH: Elegantly Detecting the Depths of Open DoH Service on the Internet

Authors: Cong Dong, Jiahai Yang, Yun Li, Yue Wu, Yufan Chen, Chenglong Li, Haoran Jiao, Xia Yin, Yuling Liu

Abstract: In recent years, DNS over Encrypted (DoE) methods have been regarded as a novel trend within the realm of the DNS ecosystem. In these DoE methods, DNS over HTTPS (DoH) provides encryption to protect data confidentiality while providing better obfuscation to avoid censorship by multiplexing port 443 with web services. This development introduced certain inconveniences in discovering publicly availa… ▽ More In recent years, DNS over Encrypted (DoE) methods have been regarded as a novel trend within the realm of the DNS ecosystem. In these DoE methods, DNS over HTTPS (DoH) provides encryption to protect data confidentiality while providing better obfuscation to avoid censorship by multiplexing port 443 with web services. This development introduced certain inconveniences in discovering publicly available DoH services. In this paper, we propose the E-DoH method for elegant and efficient DoH service detection. First, we optimized the probing mechanism to enable a single DoH connection to accomplish multiple tasks including service discovery, correctness validation and dependency construction. Second, we propose an efficient DoH detection tool. This tool can enhance probing efficiency while significantly reduce the required traffic volume. Third, based on the above optimization methods, we conducted an exploration of the IPv4 space and performed an in-depth analysis of DoH based on the collected information. Through experiments, our approach demonstrates a remarkable 80% improvement in time efficiency, and only requires 4%-20% traffic volume to complete the detection task. In wild detection, our approach discovered 46k DoH services, which nearly doubles the number discovered by the state-of-the-art. Based on the collected data, we present several intriguing conclusions about the current DoH service ecosystem. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10010 [pdf, other]

doi 10.1103/PhysRevLett.132.131002

Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components. △ Less

Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

Journal ref: Physical Review Letters 132, 131002 (2024)

arXiv:2403.09397 [pdf, ps, other]

Remarks on the rate of linear vortex symmetrization

Authors: Hao Jia

Abstract: We reformulate results from the paper ``Linear vortex symmetrization: The spectral density function" by Ionescu and the author in simplified forms and derive rigorously the bounds given in Bassom and Gilbert (J. Fluid Mech., 1998), which provided interesting insights on the vortex symmetrization phenomenon. We reformulate results from the paper ``Linear vortex symmetrization: The spectral density function" by Ionescu and the author in simplified forms and derive rigorously the bounds given in Bassom and Gilbert (J. Fluid Mech., 1998), which provided interesting insights on the vortex symmetrization phenomenon. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 10 pages, comments welcome. arXiv admin note: text overlap with arXiv:2109.12815

arXiv:2403.07500 [pdf, other]

Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation

Authors: Likun Li, Haoqi Zeng, Changpeng Yang, Haozhe Jia, Di Xu

Abstract: The objective of personalization and stylization in text-to-image is to instruct a pre-trained diffusion model to analyze new concepts introduced by users and incorporate them into expected styles. Recently, parameter-efficient fine-tuning (PEFT) approaches have been widely adopted to address this task and have greatly propelled the development of this field. Despite their popularity, existing eff… ▽ More The objective of personalization and stylization in text-to-image is to instruct a pre-trained diffusion model to analyze new concepts introduced by users and incorporate them into expected styles. Recently, parameter-efficient fine-tuning (PEFT) approaches have been widely adopted to address this task and have greatly propelled the development of this field. Despite their popularity, existing efficient fine-tuning methods still struggle to achieve effective personalization and stylization in T2I generation. To address this issue, we propose block-wise Low-Rank Adaptation (LoRA) to perform fine-grained fine-tuning for different blocks of SD, which can generate images faithful to input prompts and target identity and also with desired style. Extensive experiments demonstrate the effectiveness of the proposed method. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.01444 [pdf, other]

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos

Authors: Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, Wei Xing

Abstract: Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method… ▽ More Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the naïve approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods. △ Less

Submitted 11 June, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: CVPR 2024 Accepted (Highlight). Project Page: https://sjojok.github.io/3dgstream

arXiv:2403.00486 [pdf, other]

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Authors: Xianqi Wang, Gangwei Xu, Hao Jia, Xin Yang

Abstract: Stereo matching methods based on iterative optimization, like RAFT-Stereo and IGEV-Stereo, have evolved into a cornerstone in the field of stereo matching. However, these methods struggle to simultaneously capture high-frequency information in edges and low-frequency information in smooth regions due to the fixed receptive field. As a result, they tend to lose details, blur edges, and produce fals… ▽ More Stereo matching methods based on iterative optimization, like RAFT-Stereo and IGEV-Stereo, have evolved into a cornerstone in the field of stereo matching. However, these methods struggle to simultaneously capture high-frequency information in edges and low-frequency information in smooth regions due to the fixed receptive field. As a result, they tend to lose details, blur edges, and produce false matches in textureless areas. In this paper, we propose Selective Recurrent Unit (SRU), a novel iterative update operator for stereo matching. The SRU module can adaptively fuse hidden disparity information at multiple frequencies for edge and smooth regions. To perform adaptive fusion, we introduce a new Contextual Spatial Attention (CSA) module to generate attention maps as fusion weights. The SRU empowers the network to aggregate hidden disparity information across multiple frequencies, mitigating the risk of vital hidden disparity information loss during iterative processes. To verify SRU's universality, we apply it to representative iterative stereo matching methods, collectively referred to as Selective-Stereo. Our Selective-Stereo ranks $1^{st}$ on KITTI 2012, KITTI 2015, ETH3D, and Middlebury leaderboards among all published methods. Code is available at https://github.com/Windsrain/Selective-Stereo. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

Showing 1–50 of 463 results for author: Jia, H