subscribe to arXiv mailings

CAMAv2: A Vision-Centric Approach for Static Map Element Annotation

Authors: Shiyuan Chen, Jiaxin Zhang, Ruohong Mei, Yingfeng Cai, Haoran Yin, Tao Chen, Wei Sui, Cong Yang

Abstract: The recent development of online static map element (a.k.a. HD map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. For instance, the manual labelled (low efficiency) nuScenes still contains misalignment and inconsistency between the HD… ▽ More The recent development of online static map element (a.k.a. HD map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. For instance, the manual labelled (low efficiency) nuScenes still contains misalignment and inconsistency between the HD maps and images (e.g., around 8.03 pixels reprojection error on average). To this end, we present CAMAv2: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs, our proposed framework can still generate high-quality 3D annotations of static map elements. Specifically, the annotation can achieve high reprojection accuracy across all surrounding cameras and is spatial-temporal consistent across the whole sequence. We apply our proposed framework to the popular nuScenes dataset to provide efficient and highly accurate annotations. Compared with the original nuScenes static map element, our CAMAv2 annotations achieve lower reprojection errors (e.g., 4.96 vs. 8.03 pixels). Models trained with annotations from CAMAv2 also achieve lower reprojection errors (e.g., 5.62 vs. 8.43 pixels). △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2309.11754

arXiv:2405.13571 [pdf, other]

Incomplete Multimodal Industrial Anomaly Detection via Cross-Modal Distillation

Authors: Wenbo Sui, Daniel Lichau, Josselin Lefèvre, Harold Phelippeau

Abstract: Recent studies of multimodal industrial anomaly detection (IAD) based on 3D point clouds and RGB images have highlighted the importance of exploiting the redundancy and complementarity among modalities for accurate classification and segmentation. However, achieving multimodal IAD in practical production lines remains a work in progress. It is essential to consider the trade-offs between the costs… ▽ More Recent studies of multimodal industrial anomaly detection (IAD) based on 3D point clouds and RGB images have highlighted the importance of exploiting the redundancy and complementarity among modalities for accurate classification and segmentation. However, achieving multimodal IAD in practical production lines remains a work in progress. It is essential to consider the trade-offs between the costs and benefits associated with the introduction of new modalities while ensuring compatibility with current processes. Existing quality control processes combine rapid in-line inspections, such as optical and infrared imaging with high-resolution but time-consuming near-line characterization techniques, including industrial CT and electron microscopy to manually or semi-automatically locate and analyze defects in the production of Li-ion batteries and composite materials. Given the cost and time limitations, only a subset of the samples can be inspected by all in-line and near-line methods, and the remaining samples are only evaluated through one or two forms of in-line inspection. To fully exploit data for deep learning-driven automatic defect detection, the models must have the ability to leverage multimodal training and handle incomplete modalities during inference. In this paper, we propose CMDIAD, a Cross-Modal Distillation framework for IAD to demonstrate the feasibility of a Multi-modal Training, Few-modal Inference (MTFI) pipeline. Our findings show that the MTFI pipeline can more effectively utilize incomplete multimodal information compared to applying only a single modality for training and inference. Moreover, we investigate the reasons behind the asymmetric performance improvement using point clouds or RGB images as the main modality of inference. This provides a foundation for our future multimodal dataset construction with additional modalities from manufacturing scenarios. △ Less

Submitted 15 August, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2403.15026 [pdf, other]

VRSO: Visual-Centric Reconstruction for Static Object Annotation

Authors: Chenyao Yu, Yingfeng Cai, Jiaxin Zhang, Hui Kong, Wei Sui, Cong Yang

Abstract: As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labelling over the dense LiDAR point clouds and reference images.… ▽ More As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labelling over the dense LiDAR point clouds and reference images. Though most public driving datasets adopt this strategy to provide SOD ground truth (GT), it is still expensive and time-consuming in practice. This paper introduces VRSO, a visual-centric approach for static object annotation. Experiments on the Waymo Open Dataset show that the mean reprojection error from VRSO annotation is only 2.6 pixels, around four times lower than the Waymo Open Dataset labels (10.6 pixels). VRSO is distinguished in low cost, high efficiency, and high quality: (1) It recovers static objects in 3D space with only camera images as input, and (2) manual annotation is barely involved since GT for SOD tasks is generated based on an automatic reconstruction and annotation pipeline. △ Less

Submitted 29 August, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted at 2024 IEEE International Conference on Intelligent Robots and Systems (IROS)

arXiv:2402.06854 [pdf, other]

Gyroscope-Assisted Motion Deblurring Network

Authors: Simin Luan, Cong Yang, Zeyd Boukhers, Xue Qin, Dongfeng Cheng, Wei Sui, Zhijun Li

Abstract: Image research has shown substantial attention in deblurring networks in recent years. Yet, their practical usage in real-world deblurring, especially motion blur, remains limited due to the lack of pixel-aligned training triplets (background, blurred image, and blur heat map) and restricted information inherent in blurred images. This paper presents a simple yet efficient framework to synthetic a… ▽ More Image research has shown substantial attention in deblurring networks in recent years. Yet, their practical usage in real-world deblurring, especially motion blur, remains limited due to the lack of pixel-aligned training triplets (background, blurred image, and blur heat map) and restricted information inherent in blurred images. This paper presents a simple yet efficient framework to synthetic and restore motion blur images using Inertial Measurement Unit (IMU) data. Notably, the framework includes a strategy for training triplet generation, and a Gyroscope-Aided Motion Deblurring (GAMD) network for blurred image restoration. The rationale is that through harnessing IMU data, we can determine the transformation of the camera pose during the image exposure phase, facilitating the deduction of the motion trajectory (aka. blur trajectory) for each point inside the three-dimensional space. Thus, the synthetic triplets using our strategy are inherently close to natural motion blur, strictly pixel-aligned, and mass-producible. Through comprehensive experiments, we demonstrate the advantages of the proposed framework: only two-pixel errors between our synthetic and real-world blur trajectories, a marked improvement (around 33.17%) of the state-of-the-art deblurring method MIMO on Peak Signal-to-Noise Ratio (PSNR). △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2309.11754 [pdf, other]

A Vision-Centric Approach for Static Map Element Annotation

Authors: Jiaxin Zhang, Shiyuan Chen, Haoran Yin, Ruohong Mei, Xuan Liu, Cong Yang, Qian Zhang, Wei Sui

Abstract: The recent development of online static map element (a.k.a. HD Map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. To this end, we present CAMA: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs… ▽ More The recent development of online static map element (a.k.a. HD Map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. To this end, we present CAMA: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs, our proposed framework can still generate high-quality 3D annotations of static map elements. Specifically, the annotation can achieve high reprojection accuracy across all surrounding cameras and is spatial-temporal consistent across the whole sequence. We apply our proposed framework to the popular nuScenes dataset to provide efficient and highly accurate annotations. Compared with the original nuScenes static map element, models trained with annotations from CAMA achieve lower reprojection errors (e.g., 4.73 vs. 8.03 pixels). △ Less

Submitted 16 February, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted at 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2306.11368 [pdf, other]

RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation

Authors: Ruohong Mei, Wei Sui, Jiaxin Zhang, Xue Qin, Gang Wang, Tao Peng, Cong Yang

Abstract: In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational ef… ▽ More In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational efficiency, we propose a waypoint sampling strategy, enabling RoMe to reconstruct vast environments by focusing on sub-areas and subsequently merging them. Furthermore, we incorporate an extrinsic optimization module to enhance the robustness against inaccuracies in extrinsic calibration. Our extensive evaluations of both public datasets and wild data underscore RoMe's superiority in terms of speed, accuracy, and robustness. For instance, it costs only 2 GPU hours to recover a road surface of 600*600 square meters from thousands of images. Notably, RoMe's capability extends beyond mere reconstruction, offering significant value for autolabeling tasks in autonomous driving applications. All related data and code are available at https://github.com/DRosemei/RoMe. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Published in: IEEE Transactions on Intelligent Vehicles

arXiv:2306.07467 [pdf, other]

ELF Codes: Concatenated Codes with an Expurgating Linear Function as the Outer Code

Authors: Richard Wesel, Amaael Antonini, Linfang Wang, Wenhui Sui, Brendan Towell, Holden Grissett

Abstract: An expurgating linear function (ELF) is a linear outer code that disallows the low-weight codewords of the inner code. ELFs can be designed either to maximize the minimum distance or to minimize the codeword error rate (CER) of the expurgated code. A list-decoding sieve of the inner code starting from the noiseless all-zeros codeword is an efficient way to identify ELFs that maximize the minimum d… ▽ More An expurgating linear function (ELF) is a linear outer code that disallows the low-weight codewords of the inner code. ELFs can be designed either to maximize the minimum distance or to minimize the codeword error rate (CER) of the expurgated code. A list-decoding sieve of the inner code starting from the noiseless all-zeros codeword is an efficient way to identify ELFs that maximize the minimum distance of the expurgated code. For convolutional inner codes, this paper provides distance spectrum union (DSU) upper bounds on the CER of the concatenated code. For short codeword lengths, ELFs transform a good inner code into a great concatenated code. For a constant message size of $K=64$ bits or constant codeword blocklength of $N=152$ bits, an ELF can reduce the gap at CER $10^{-6}$ between the DSU and the random-coding union (RCU) bounds from over 1 dB for the inner code alone to 0.23 dB for the concatenated code. The DSU bounds can also characterize puncturing that mitigates the rate overhead of the ELF while maintaining the DSU-to-RCU gap. The reduction in DSU-to-RCU gap comes with a minimal increase in average complexity at desired CER operating points. List Viterbi decoding guided by the ELF approaches maximum likelihood (ML) decoding of the concatenated code, and average list size converges to 1 as SNR increases. Thus, average complexity is similar to Viterbi decoding on the trellis of the inner code at high SNR. For rare large-magnitude noise events, which occur less often than the FER of the inner code, a deep search in the list finds the ML codeword. △ Less

Submitted 1 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: 6 arXiv pages (actual ISTC paper is 5 pages with more compressed spacing), 6 figures, accepted to the 2023 International Symposium on Techniques in Coding. Latest version is Camera-Ready version for ISTC edited for clarity and to reflect reviewer suggestions and references were added

arXiv:2304.09807 [pdf, other]

VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

Authors: Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

Abstract: High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geo… ▽ More High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene. VMA is highly efficient and extensible, requiring negligible human effort, and flexible in terms of spatial scale and element type. We quantitatively and qualitatively validate the annotation performance on real-world urban and highway scenes, as well as NYC Planimetric Database. VMA can significantly improve map generation efficiency and require little human effort. On average VMA takes 160min for annotating a scene with a range of hundreds of meters, and reduces 52.3% of the human cost, showing great application value. Code: https://github.com/hustvl/VMA. △ Less

Submitted 27 August, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: https://github.com/hustvl/VMA

arXiv:2212.04224 [pdf, other]

doi 10.3390/s22239375

Towards Accurate Ground Plane Normal Estimation from Ego-Motion

Authors: Jiaxin Zhang, Wei Sui, Qian Zhang, Tao Chen, Cong Yang

Abstract: In this paper, we introduce a novel approach for ground plane normal estimation of wheeled vehicles. In practice, the ground plane is dynamically changed due to braking and unstable road surface. As a result, the vehicle pose, especially the pitch angle, is oscillating from subtle to obvious. Thus, estimating ground plane normal is meaningful since it can be encoded to improve the robustness of va… ▽ More In this paper, we introduce a novel approach for ground plane normal estimation of wheeled vehicles. In practice, the ground plane is dynamically changed due to braking and unstable road surface. As a result, the vehicle pose, especially the pitch angle, is oscillating from subtle to obvious. Thus, estimating ground plane normal is meaningful since it can be encoded to improve the robustness of various autonomous driving tasks (e.g., 3D object detection, road surface reconstruction, and trajectory planning). Our proposed method only uses odometry as input and estimates accurate ground plane normal vectors in real time. Particularly, it fully utilizes the underlying connection between the ego pose odometry (ego-motion) and its nearby ground plane. Built on that, an Invariant Extended Kalman Filter (IEKF) is designed to estimate the normal vector in the sensor's coordinate. Thus, our proposed method is simple yet efficient and supports both camera- and inertial-based odometry algorithms. Its usability and the marked improvement of robustness are validated through multiple experiments on public datasets. For instance, we achieve state-of-the-art accuracy on KITTI dataset with the estimated vector error of 0.39°. Our code is available at github.com/manymuch/ground_normal_filter. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Journal ref: Sensors 2022, 22(23), 9375;

arXiv:2212.04064 [pdf, ps, other]

CRC-Aided High-Rate Convolutional Codes With Short Blocklengths for List Decoding

Authors: Wenhui Sui, Brendan Towell, Ava Asmani, Hengjie Yang, Holden Grissett, Richard D. Wesel

Abstract: Recently, rate-1/n zero-terminated (ZT) and tail-biting (TB) convolutional codes (CCs) with cyclic redundancy check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRC polynomials for rate- (n-1)/n ZT and TB CCs with short blocklengths. This paper considers both standard rate-(n-1)/n CC polynomials and rat… ▽ More Recently, rate-1/n zero-terminated (ZT) and tail-biting (TB) convolutional codes (CCs) with cyclic redundancy check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRC polynomials for rate- (n-1)/n ZT and TB CCs with short blocklengths. This paper considers both standard rate-(n-1)/n CC polynomials and rate- (n-1)/n designs resulting from puncturing a rate-1/2 code. The CRC polynomials are chosen to maximize the minimum distance d_min and minimize the number of nearest neighbors A_(d_min) . For the standard rate-(n-1)/n codes, utilization of the dual trellis proposed by Yamada et al. lowers the complexity of CRC-aided serial list Viterbi decoding (SLVD). CRC-aided SLVD of the TBCCs closely approaches the RCU bound at a blocklength of 128. This paper compares the FER performance (gap to the RCU bound) and complexity of the CRC-aided standard and punctured ZTCCs and TBCCs. This paper also explores the complexity-performance trade-off for three TBCC decoders: a single-trellis approach, a multi-trellis approach, and a modified single-trellis approach with pre-processing using the wrap around Viterbi algorithm. △ Less

Submitted 9 October, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2111.07929

arXiv:2112.08635 [pdf, other]

Road-aware Monocular Structure from Motion and Homography Estimation

Authors: Wei Sui, Teng Chen, Jiaxin Zhang, Jiao Lu, Qian Zhang

Abstract: Structure from motion (SFM) and ground plane homography estimation are critical to autonomous driving and other robotics applications. Recently, much progress has been made in using deep neural networks for SFM and homography estimation respectively. However, directly applying existing methods for ground plane homography estimation may fail because the road is often a small part of the scene. Besi… ▽ More Structure from motion (SFM) and ground plane homography estimation are critical to autonomous driving and other robotics applications. Recently, much progress has been made in using deep neural networks for SFM and homography estimation respectively. However, directly applying existing methods for ground plane homography estimation may fail because the road is often a small part of the scene. Besides, the performances of deep SFM approaches are still inferior to traditional methods. In this paper, we propose a method that learns to solve both problems in an end-to-end manner, improving performance on both. The proposed networks consist of a Depth-CNN, a Pose-CNN and a Ground-CNN. The Depth-CNN and Pose-CNN estimate dense depth map and ego-motion respectively, solving SFM, while the Pose-CNN and Ground-CNN followed by a homography layer solve the ground plane estimation problem. By enforcing coherency between SFM and homography estimation results, the whole network can be trained end to end using photometric loss and homography loss without any groundtruth except the road segmentation provided by an off-the-shelf segmenter. Comprehensive experiments are conducted on KITTI benchmark to demonstrate promising results compared with various state-of-the-art approaches. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 10 pages

arXiv:2111.11089 [pdf, other]

doi 10.1109/TIP.2023.3289323

Monocular Road Planar Parallax Estimation

Authors: Haobo Yuan, Teng Chen, Wei Sui, Jiafeng Xie, Lefei Zhang, Yuan Li, Qian Zhang

Abstract: Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following ex… ▽ More Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. However, the former is expensive, and the latter lacks the use of geometry information for the scene. In this paper, instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the omnipresent road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $γ$ map (the ratio of height to depth) for 3D reconstruction. The $γ$ map has the potential to construct a two-dimensional transformation between two consecutive frames. It implies planar parallax and can be combined with the road plane serving as a reference to estimate the 3D structure by warping the consecutive frames. Furthermore, we introduce a novel cross-attention module to make the network better perceive the displacements caused by planar parallax. To verify the effectiveness of our method, we sample data from the Waymo Open Dataset and construct annotations related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios. △ Less

Submitted 9 July, 2023; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted by IEEE TIP

arXiv:2111.07929 [pdf, other]

High-Rate Convolutional Codes with CRC-Aided List Decoding for Short Blocklengths

Authors: Wenhui Sui, Hengjie Yang, Brendan Towell, Ava Asmani, Richard D. Wesel

Abstract: Recently, rate-$1/ω$ zero-terminated and tail-biting convolutional codes (ZTCCs and TBCCs) with cyclic-redundancy-check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRCs for rate-$(ω-1)/ω$ CCs with short blocklengths, considering both the ZT and TB cases. The CRC design seeks to optimize the frame error… ▽ More Recently, rate-$1/ω$ zero-terminated and tail-biting convolutional codes (ZTCCs and TBCCs) with cyclic-redundancy-check (CRC)-aided list decoding have been shown to closely approach the random-coding union (RCU) bound for short blocklengths. This paper designs CRCs for rate-$(ω-1)/ω$ CCs with short blocklengths, considering both the ZT and TB cases. The CRC design seeks to optimize the frame error rate (FER) performance of the code resulting from the concatenation of the CRC and the CC. Utilization of the dual trellis proposed by Yamada \emph{et al.} lowers the complexity of CRC-aided serial list Viterbi decoding (SLVD) of ZTCCs and TBCCs. CRC-aided SLVD of the TBCCs closely approaches the RCU bound at a blocklength of $128$. △ Less

Submitted 7 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: 6 pages; submitted to 2022 IEEE International Conference on Communications (ICC 2022)

arXiv:2103.10029 [pdf, other]

doi 10.1109/ICRA48506.2021.9561642

Deep Online Correction for Monocular Visual Odometry

Authors: Jiaxin Zhang, Wei Sui, Xinggang Wang, Wenming Meng, Hongmei Zhu, Qian Zhang

Abstract: In this work, we propose a novel deep online correction (DOC) framework for monocular visual odometry. The whole pipeline has two stages: First, depth maps and initial poses are obtained from convolutional neural networks (CNNs) trained in self-supervised manners. Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inferenc… ▽ More In this work, we propose a novel deep online correction (DOC) framework for monocular visual odometry. The whole pipeline has two stages: First, depth maps and initial poses are obtained from convolutional neural networks (CNNs) trained in self-supervised manners. Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inference phases. The benefits of our proposed method are twofold: 1) Different from online-learning methods, DOC does not need to calculate gradient propagation for parameters of CNNs. Thus, it saves more computation resources during inference phases. 2) Unlike hybrid methods that combine CNNs with traditional methods, DOC fully relies on deep learning (DL) frameworks. Though without complex back-end optimization modules, our method achieves outstanding performance with relative transform error (RTE) = 2.0% on KITTI Odometry benchmark for Seq. 09, which outperforms traditional monocular VO frameworks and is comparable to hybrid methods. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: Accepted at 2021 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:1907.09136 [pdf]

Combustion Phasing Modelling and Control for Compression Ignition Engines with High Dilution and Boost Levels

Authors: Wenbo Sui, Carrie M. Hall

Abstract: Because fuel efficiency is significantly impacted by the timing of combustion in internal combustion engines, accurate control of combustion phasing is critical. In this paper, a nonlinear combustion phasing model is introduced and calibrated, and both a feedforward model-based control strategy and an adaptive model-based control strategy are investigated for combustion phasing control. The combus… ▽ More Because fuel efficiency is significantly impacted by the timing of combustion in internal combustion engines, accurate control of combustion phasing is critical. In this paper, a nonlinear combustion phasing model is introduced and calibrated, and both a feedforward model-based control strategy and an adaptive model-based control strategy are investigated for combustion phasing control. The combustion phasing model combines a knock integral model, burn duration model and a Wiebe function to predict the combustion phasing of a diesel engine. This model is simplified to be more suitable for combustion phasing control and is calibrated and validated using simulations and experimental data that include conditions with high exhaust gas recirculation fractions and high boost levels. Based on this model, an adaptive nonlinear model-based controller is designed for closed-loop control, and a feedforward model-based controller is designed for open-loop control. These two control approaches were tested in simulations. The simulation results show that during transient changes the CA50 (the crank angle at which 50% of the mass of fuel has burned) can reach steady state in no more than 5 cycles and the steady state errors are less than +/-0.1 crank angle degree (CAD) for adaptive control, and less than +/-0.5 CAD for feedforward model-based control. △ Less

Submitted 18 July, 2019; originally announced July 2019.

Comments: Journal of Automobile Engineering, 2018

arXiv:1907.07747 [pdf]

doi 10.1177/1468087418804087

Cylinder-Specific Model-Based Control of Combustion Phasing for Multiple-Cylinder Diesel Engines Operating with High Dilution and Boost Levels

Authors: Wenbo Sui, Carrie M. Hall, Gina Kapadia

Abstract: Accurate control of combustion phasing is indispensable for diesel engines due to the strong impact of combustion timing on efficiency. In this work, a non-linear combustion phasing model is developed and integrated with a cylinder-specific model of intake gas. The combustion phasing model uses a knock integral model, a burn duration model and a Wiebe function to predict CA50 (the crank angle at w… ▽ More Accurate control of combustion phasing is indispensable for diesel engines due to the strong impact of combustion timing on efficiency. In this work, a non-linear combustion phasing model is developed and integrated with a cylinder-specific model of intake gas. The combustion phasing model uses a knock integral model, a burn duration model and a Wiebe function to predict CA50 (the crank angle at which 50% of the mass of fuel has burned). Meanwhile, the intake gas property model predicts the EGR fraction and the in-cylinder pressure and temperature at intake valve closing (IVC) for different cylinders. As such, cylinder-to-cylinder variation of the pressure and temperature at intake valves closing is also considered in this model. This combined model is simplified for controller design and validated. Based on these models, two combustion phasing control strategies are explored. The first is an adaptive controller that is designed for closed-loop control and the second is a feedforward model-based control strategy for open-loop control. These two control approaches were tested in simulations for all six cylinders and the results demonstrate that the CA50 can reach steady state conditions within 10 cycles. In addition, the steady state errors are less than +/-0.1 crank angle degree (CAD) with the adaptive control approach, and less than +/-1.3 CAD with feedforward model-based control. The impact of errors on the control algorithms is also discussed in the paper. △ Less

Submitted 17 July, 2019; originally announced July 2019.

Comments: nternational Journal of Engine Research, 2018

arXiv:1906.05791 [pdf]

doi 10.1115/1.4041871

Modeling and Control of Combustion Phasing in Dual-Fuel Compression Ignition Engines

Authors: Wenbo Sui, Jorge Pulpeiro González, Carrie M. Hall

Abstract: Dual fuel engines can achieve high efficiencies and low emissions but also can encounter high cylinder-to-cylinder variations on multi-cylinder engines. In order to avoid these variations, they require a more complex method for combustion phasing control such as model-based control. Since the combustion process in these engines is complex, typical models of the system are complex as well and there… ▽ More Dual fuel engines can achieve high efficiencies and low emissions but also can encounter high cylinder-to-cylinder variations on multi-cylinder engines. In order to avoid these variations, they require a more complex method for combustion phasing control such as model-based control. Since the combustion process in these engines is complex, typical models of the system are complex as well and there is a need for simpler, computationally efficient, control-oriented models of the dual fuel combustion process. In this paper, a mean-value combustion phasing model is designed and calibrated and two control strategies are proposed. Combustion phasing is predicted using a knock integral model, burn duration model and a Wiebe function and this model is used in both an adaptive closed loop controller and an open loop controller. These two control methodologies are tested and compared in simulations. Both control strategies are able to reach steady state in 5 cycles after a transient and have steady state errors in CA50 that are less than 0.1 crank angle degree (CAD) with the adaptive control strategy and less than 1.5 CAD with the model-based feedforward control method. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Journal ref: J. Eng. Gas Turbines Power 141(5), 051005 (Nov 28, 2018)

arXiv:1811.08611 [pdf, other]

A Novel Integrated Framework for Learning both Text Detection and Recognition

Authors: Wanchen Sui, Qing Zhang, Jun Yang, Wei Chu

Abstract: In this paper, we propose a novel integrated framework for learning both text detection and recognition. For most of the existing methods, detection and recognition are treated as two isolated tasks and trained separately, since parameters of detection and recognition models are different and two models target to optimize their own loss functions during individual training processes. In contrast t… ▽ More In this paper, we propose a novel integrated framework for learning both text detection and recognition. For most of the existing methods, detection and recognition are treated as two isolated tasks and trained separately, since parameters of detection and recognition models are different and two models target to optimize their own loss functions during individual training processes. In contrast to those methods, by sharing model parameters, we merge the detection model and recognition model into a single end-to-end trainable model and train the joint model for two tasks simultaneously. The shared parameters not only help effectively reduce the computational load in inference process, but also improve the end-to-end text detection-recognition accuracy. In addition, we design a simpler and faster sequence learning method for the recognition network based on a succession of stacked convolutional layers without any recurrent structure, this is proved feasible and dramatically improves inference speed. Extensive experiments on different datasets demonstrate that the proposed method achieves very promising results. △ Less

Submitted 21 November, 2018; originally announced November 2018.

arXiv:1602.04502 [pdf, other]

Do We Need Binary Features for 3D Reconstruction?

Authors: Bin Fan, Qingqun Kong, Wei Sui, Zhiheng Wang, Xinchao Wang, Shiming Xiang, Chunhong Pan, Pascal Fua

Abstract: Binary features have been incrementally popular in the past few years due to their low memory footprints and the efficient computation of Hamming distance between binary descriptors. They have been shown with promising results on some real time applications, e.g., SLAM, where the matching operations are relative few. However, in computer vision, there are many applications such as 3D reconstructio… ▽ More Binary features have been incrementally popular in the past few years due to their low memory footprints and the efficient computation of Hamming distance between binary descriptors. They have been shown with promising results on some real time applications, e.g., SLAM, where the matching operations are relative few. However, in computer vision, there are many applications such as 3D reconstruction requiring lots of matching operations between local features. Therefore, a natural question is that is the binary feature still a promising solution to this kind of applications? To get the answer, this paper conducts a comparative study of binary features and their matching methods on the context of 3D reconstruction in a recently proposed large scale mutliview stereo dataset. Our evaluations reveal that not all binary features are capable of this task. Most of them are inferior to the classical SIFT based method in terms of reconstruction accuracy and completeness with a not significant better computational performance. △ Less

Submitted 14 February, 2016; originally announced February 2016.

Showing 1–19 of 19 results for author: Sui, W