-
Lookism: The overlooked bias in computer vision
Authors:
Aditya Gulati,
Bruno Lepri,
Nuria Oliver
Abstract:
In recent years, there have been significant advancements in computer vision which have led to the widespread deployment of image recognition and generation systems in socially relevant applications, from hiring to security screening. However, the prevalence of biases within these systems has raised significant ethical and social concerns. The most extensively studied biases in this context are re…
▽ More
In recent years, there have been significant advancements in computer vision which have led to the widespread deployment of image recognition and generation systems in socially relevant applications, from hiring to security screening. However, the prevalence of biases within these systems has raised significant ethical and social concerns. The most extensively studied biases in this context are related to gender, race and age. Yet, other biases are equally pervasive and harmful, such as lookism, i.e., the preferential treatment of individuals based on their physical appearance. Lookism remains under-explored in computer vision but can have profound implications not only by perpetuating harmful societal stereotypes but also by undermining the fairness and inclusivity of AI technologies. Thus, this paper advocates for the systematic study of lookism as a critical bias in computer vision models. Through a comprehensive review of existing literature, we identify three areas of intersection between lookism and computer vision. We illustrate them by means of examples and a user study. We call for an interdisciplinary approach to address lookism, urging researchers, developers, and policymakers to prioritize the development of equitable computer vision systems that respect and reflect the diversity of human appearances.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Large Language Models for Multimodal Deformable Image Registration
Authors:
Mingrui Ma,
Weijie Wang,
Jie Ning,
Jianfeng He,
Nicu Sebe,
Bruno Lepri
Abstract:
The challenge of Multimodal Deformable Image Registration (MDIR) lies in the conversion and alignment of features between images of different modalities. Generative models (GMs) cannot retain the necessary information enough from the source modality to the target one, while non-GMs struggle to align features across these two modalities. In this paper, we propose a novel coarse-to-fine MDIR framewo…
▽ More
The challenge of Multimodal Deformable Image Registration (MDIR) lies in the conversion and alignment of features between images of different modalities. Generative models (GMs) cannot retain the necessary information enough from the source modality to the target one, while non-GMs struggle to align features across these two modalities. In this paper, we propose a novel coarse-to-fine MDIR framework,LLM-Morph, which is applicable to various pre-trained Large Language Models (LLMs) to solve these concerns by aligning the deep features from different modal medical images. Specifically, we first utilize a CNN encoder to extract deep visual features from cross-modal image pairs, then we use the first adapter to adjust these tokens, and use LoRA in pre-trained LLMs to fine-tune their weights, both aimed at eliminating the domain gap between the pre-trained LLMs and the MDIR task. Third, for the alignment of tokens, we utilize other four adapters to transform the LLM-encoded tokens into multi-scale visual features, generating multi-scale deformation fields and facilitating the coarse-to-fine MDIR task. Extensive experiments in MR-CT Abdomen and SR-Reg Brain datasets demonstrate the effectiveness of our framework and the potential of pre-trained LLMs for MDIR task. Our code is availabel at: https://github.com/ninjannn/LLM-Morph.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Hopfield Networks for Asset Allocation
Authors:
Carlo Nicolini,
Monisha Gopalan,
Jacopo Staiano,
Bruno Lepri
Abstract:
We present the first application of modern Hopfield networks to the problem of portfolio optimization. We performed an extensive study based on combinatorial purged cross-validation over several datasets and compared our results to both traditional and deep-learning-based methods for portfolio selection. Compared to state-of-the-art deep-learning methods such as Long-Short Term Memory networks and…
▽ More
We present the first application of modern Hopfield networks to the problem of portfolio optimization. We performed an extensive study based on combinatorial purged cross-validation over several datasets and compared our results to both traditional and deep-learning-based methods for portfolio selection. Compared to state-of-the-art deep-learning methods such as Long-Short Term Memory networks and Transformers, we find that the proposed approach performs on par or better, while providing faster training times and better stability. Our results show that Modern Hopfield Networks represent a promising approach to portfolio optimization, allowing for an efficient, scalable, and robust solution for asset allocation, risk management, and dynamic rebalancing.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Physical partisan proximity outweighs online ties in predicting US voting outcomes
Authors:
Marco Tonin,
Bruno Lepri,
Michele Tizzoni
Abstract:
Affective polarization and increasing social divisions affect social mixing and the spread of information across online and physical spaces, reinforcing social and electoral cleavages and influencing political outcomes. Here, using aggregated and de-identified co-location and online network data, we investigate the relationship between partisan exposure and voting patterns in the USA by comparing…
▽ More
Affective polarization and increasing social divisions affect social mixing and the spread of information across online and physical spaces, reinforcing social and electoral cleavages and influencing political outcomes. Here, using aggregated and de-identified co-location and online network data, we investigate the relationship between partisan exposure and voting patterns in the USA by comparing three dimensions of partisan exposure: physical proximity and exposure to the same social contexts, online social ties, and residential sorting. By leveraging various statistical modeling approaches, we consistently find that partisan exposure in the physical space, as captured by co-location patterns, more accurately predicts electoral outcomes in US counties, outperforming online and residential exposures across metropolitan and non-metro areas. Moreover, our results show that physical partisan proximity is the best predictor of voting patterns in swing counties, where the election results are most uncertain. We also estimate county-level experienced partisan segregation and examine its relationship with individuals' demographic and socioeconomic characteristics. Focusing on metropolitan areas, our results confirm the presence of extensive partisan segregation in the US and show that offline partisan isolation, both considering physical encounters or residential sorting, is higher than online segregation and is primarily associated with educational attainment. Our findings emphasize the importance of physical space in understanding the relationship between social networks and political behavior, in contrast to the intense scrutiny focused on online social networks and elections.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
What is Beautiful is Still Good: The Attractiveness Halo Effect in the era of Beauty Filters
Authors:
Aditya Gulati,
Marina Martinez-Garcia,
Daniel Fernandez,
Miguel Angel Lozano,
Bruno Lepri,
Nuria Oliver
Abstract:
The impact of cognitive biases on decision-making in the digital world remains under-explored despite its well-documented effects in physical contexts. This study addresses this gap by investigating the attractiveness halo effect using AI-based beauty filters. We conduct a large-scale online user study involving 2,748 participants who rated facial images from a diverse set of 462 distinct individu…
▽ More
The impact of cognitive biases on decision-making in the digital world remains under-explored despite its well-documented effects in physical contexts. This study addresses this gap by investigating the attractiveness halo effect using AI-based beauty filters. We conduct a large-scale online user study involving 2,748 participants who rated facial images from a diverse set of 462 distinct individuals in two conditions: original and attractive after applying a beauty filter. Our study reveals that the same individuals receive statistically significantly higher ratings of attractiveness and other traits, such as intelligence and trustworthiness, in the attractive condition. We also study the impact of age, gender, and ethnicity and identify a weakening of the halo effect in the beautified condition, resolving conflicting findings from the literature and suggesting that filters could mitigate this cognitive bias. Finally, our findings raise ethical concerns regarding the use of beauty filters.
△ Less
Submitted 29 May, 2024;
originally announced July 2024.
-
Separation Power of Equivariant Neural Networks
Authors:
Marco Pacini,
Xiaowen Dong,
Bruno Lepri,
Gabriele Santin
Abstract:
The separation power of a machine learning model refers to its capacity to distinguish distinct inputs, and it is often employed as a proxy for its expressivity. In this paper, we propose a theoretical framework to investigate the separation power of equivariant neural networks with point-wise activations. Using the proposed framework, we can derive an explicit description of inputs indistinguisha…
▽ More
The separation power of a machine learning model refers to its capacity to distinguish distinct inputs, and it is often employed as a proxy for its expressivity. In this paper, we propose a theoretical framework to investigate the separation power of equivariant neural networks with point-wise activations. Using the proposed framework, we can derive an explicit description of inputs indistinguishable by a family of neural networks with given architecture, demonstrating that it remains unaffected by the choice of non-polynomial activation function employed. We are able to understand the role played by activation functions in separability. Indeed, we show that all non-polynomial activations, such as ReLU and sigmoid, are equivalent in terms of expressivity, and that they reach maximum discrimination capacity. We demonstrate how assessing the separation power of an equivariant neural network can be simplified to evaluating the separation power of minimal representations. We conclude by illustrating how these minimal components form a hierarchy in separation power.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Large Language Models are Zero-Shot Next Location Predictors
Authors:
Ciro Beneduce,
Bruno Lepri,
Massimiliano Luca
Abstract:
Predicting the locations an individual will visit in the future is crucial for solving many societal issues like disease diffusion and reduction of pollution. However, next-location predictors require a significant amount of individual-level information that may be scarce or unavailable in some scenarios (e.g., cold-start). Large Language Models (LLMs) have shown good generalization and reasoning…
▽ More
Predicting the locations an individual will visit in the future is crucial for solving many societal issues like disease diffusion and reduction of pollution. However, next-location predictors require a significant amount of individual-level information that may be scarce or unavailable in some scenarios (e.g., cold-start). Large Language Models (LLMs) have shown good generalization and reasoning capabilities and are rich in geographical knowledge, allowing us to believe that these models can act as zero-shot next-location predictors. We tested more than 15 LLMs on three real-world mobility datasets and we found that LLMs can obtain accuracies up to 36.2%, a significant relative improvement of almost 640% when compared to other models specifically designed for human mobility. We also test for data contamination and explored the possibility of using LLMs as text-based explainers for next-location prediction, showing that, regardless of the model size, LLMs can explain their decision.
△ Less
Submitted 23 August, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
UVMap-ID: A Controllable and Personalized UV Map Generative Model
Authors:
Weijie Wang,
Jichao Zhang,
Chang Liu,
Xia Li,
Xingqian Xu,
Humphrey Shi,
Nicu Sebe,
Bruno Lepri
Abstract:
Recently, diffusion models have made significant strides in synthesizing realistic 2D human images based on provided text prompts. Building upon this, researchers have extended 2D text-to-image diffusion models into the 3D domain for generating human textures (UV Maps). However, some important problems about UV Map Generative models are still not solved, i.e., how to generate personalized texture…
▽ More
Recently, diffusion models have made significant strides in synthesizing realistic 2D human images based on provided text prompts. Building upon this, researchers have extended 2D text-to-image diffusion models into the 3D domain for generating human textures (UV Maps). However, some important problems about UV Map Generative models are still not solved, i.e., how to generate personalized texture maps for any given face image, and how to define and evaluate the quality of these generated texture maps. To solve the above problems, we introduce a novel method, UVMap-ID, which is a controllable and personalized UV Map generative model. Unlike traditional large-scale training methods in 2D, we propose to fine-tune a pre-trained text-to-image diffusion model which is integrated with a face fusion module for achieving ID-driven customized generation. To support the finetuning strategy, we introduce a small-scale attribute-balanced training dataset, including high-quality textures with labeled text and Face ID. Additionally, we introduce some metrics to evaluate the multiple aspects of the textures. Finally, both quantitative and qualitative analyses demonstrate the effectiveness of our method in controllable and personalized UV Map generation. Code is publicly available via https://github.com/twowwj/UVMap-ID.
△ Less
Submitted 9 August, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Exploiting Preference Elicitation in Interactive and User-centered Algorithmic Recourse: An Initial Exploration
Authors:
Seyedehdelaram Esfahani,
Giovanni De Toni,
Bruno Lepri,
Andrea Passerini,
Katya Tentori,
Massimo Zancanaro
Abstract:
Algorithmic Recourse aims to provide actionable explanations, or recourse plans, to overturn potentially unfavourable decisions taken by automated machine learning models. In this paper, we propose an interaction paradigm based on a guided interaction pattern aimed at both eliciting the users' preferences and heading them toward effective recourse interventions. In a fictional task of money lendin…
▽ More
Algorithmic Recourse aims to provide actionable explanations, or recourse plans, to overturn potentially unfavourable decisions taken by automated machine learning models. In this paper, we propose an interaction paradigm based on a guided interaction pattern aimed at both eliciting the users' preferences and heading them toward effective recourse interventions. In a fictional task of money lending, we compare this approach with an exploratory interaction pattern based on a combination of alternative plans and the possibility of freely changing the configurations by the users themselves. Our results suggest that users may recognize that the guided interaction paradigm improves efficiency. However, they also feel less freedom to experiment with "what-if" scenarios. Nevertheless, the time spent on the purely exploratory interface tends to be perceived as a lack of efficiency, which reduces attractiveness, perspicuity, and dependability. Conversely, for the guided interface, more time on the interface seems to increase its attractiveness, perspicuity, and dependability while not impacting the perceived efficiency. That might suggest that this type of interfaces should combine these two approaches by trying to support exploratory behavior while gently pushing toward a guided effective solution.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph
Authors:
Marco Bronzini,
Carlo Nicolini,
Bruno Lepri,
Jacopo Staiano,
Andrea Passerini
Abstract:
Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode fact…
▽ More
Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.
△ Less
Submitted 6 August, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Mixing Individual and Collective Behaviours to Predict Out-of-Routine Mobility
Authors:
Sebastiano Bontorin,
Simone Centellegher,
Riccardo Gallotti,
Luca Pappalardo,
Bruno Lepri,
Massimiliano Luca
Abstract:
Predicting human displacements is crucial for addressing various societal challenges, including urban design, traffic congestion, epidemic management, and migration dynamics. While predictive models like deep learning and Markov models offer insights into individual mobility, they often struggle with out-of-routine behaviours. Our study introduces an approach that dynamically integrates individual…
▽ More
Predicting human displacements is crucial for addressing various societal challenges, including urban design, traffic congestion, epidemic management, and migration dynamics. While predictive models like deep learning and Markov models offer insights into individual mobility, they often struggle with out-of-routine behaviours. Our study introduces an approach that dynamically integrates individual and collective mobility behaviours, leveraging collective intelligence to enhance prediction accuracy. Evaluating the model on millions of privacy-preserving trajectories across three US cities, we demonstrate its superior performance in predicting out-of-routine mobility, surpassing even advanced deep learning methods. Spatial analysis highlights the model's effectiveness near urban areas with a high density of points of interest, where collective behaviours strongly influence mobility. During disruptive events like the COVID-19 pandemic, our model retains predictive capabilities, unlike individual-based models. By bridging the gap between individual and collective behaviours, our approach offers transparent and accurate predictions, crucial for addressing contemporary mobility challenges.
△ Less
Submitted 6 August, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
The long-term and disparate impact of job loss on individual mobility behaviour
Authors:
Simone Centellegher,
Marco De Nadai,
Marco Tonin,
Bruno Lepri,
Lorenzo Lucchini
Abstract:
In today's interconnected world of widespread mobility, ubiquitous social interaction, and rapid information dissemination, the demand for individuals to swiftly adapt their behaviors has increased dramatically. Timely decision-making faces new challenges due to the necessity of using finely temporal-resolved anonymised individual data to keep up with fast-paced behavioural changes. To tackle this…
▽ More
In today's interconnected world of widespread mobility, ubiquitous social interaction, and rapid information dissemination, the demand for individuals to swiftly adapt their behaviors has increased dramatically. Timely decision-making faces new challenges due to the necessity of using finely temporal-resolved anonymised individual data to keep up with fast-paced behavioural changes. To tackle this issue, we propose a general framework that leverages privacy-enhanced GPS data from mobile devices alongside census information to infer the employment status of individuals over time. By analysing the mobility patterns of employed and unemployed individuals, we unveil significant differences in behaviours between the two groups, showing a contraction in visited locations and a general decline in the exploratory behaviour of unemployed individuals. Remarkably, these differences intensify over time since job loss, particularly affecting individuals from more vulnerable demographic groups. These findings highlight the importance of early monitoring of unemployed individuals who may face enduring levels of distress. Overall, our findings shed light on the dynamics of employment-related behaviour, emphasizing the importance of implementing timely interventions to support the unemployed and vulnerable populations.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Authors:
Carlo Nicolini,
Jacopo Staiano,
Bruno Lepri,
Raffaele Marino
Abstract:
A substantial gap persists in understanding the reasons behind the exceptional performance of the Transformer architecture in NLP. A particularly unexplored area involves the mechanistic description of how the distribution of parameters evolves over time during training. In this work we suggest that looking at the time evolution of the statistic distribution of model parameters, and specifically a…
▽ More
A substantial gap persists in understanding the reasons behind the exceptional performance of the Transformer architecture in NLP. A particularly unexplored area involves the mechanistic description of how the distribution of parameters evolves over time during training. In this work we suggest that looking at the time evolution of the statistic distribution of model parameters, and specifically at bifurcation effects, can help understanding the model quality, potentially reducing training costs and evaluation efforts and empirically showing the reasons behind the effectiveness of weights sparsification.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Putting Context in Context: the Impact of Discussion Structure on Text Classification
Authors:
Nicolò Penzo,
Antonio Longa,
Bruno Lepri,
Sara Tonelli,
Marco Guerini
Abstract:
Current text classification approaches usually focus on the content to be classified. Contextual aspects (both linguistic and extra-linguistic) are usually neglected, even in tasks based on online discussions. Still in many cases the multi-party and multi-turn nature of the context from which these elements are selected can be fruitfully exploited. In this work, we propose a series of experiments…
▽ More
Current text classification approaches usually focus on the content to be classified. Contextual aspects (both linguistic and extra-linguistic) are usually neglected, even in tasks based on online discussions. Still in many cases the multi-party and multi-turn nature of the context from which these elements are selected can be fruitfully exploited. In this work, we propose a series of experiments on a large dataset for stance detection in English, in which we evaluate the contribution of different types of contextual information, i.e. linguistic, structural and temporal, by feeding them as natural language input into a transformer-based model. We also experiment with different amounts of training data and analyse the topology of local discussion networks in a privacy-compliant way. Results show that structural information can be highly beneficial to text classification but only under certain circumstances (e.g. depending on the amount of training data and on discussion chain complexity). Indeed, we show that contextual information on smaller datasets from other classification tasks does not yield significant improvements. Our framework, based on local discussion networks, allows the integration of structural information, while minimising user profiling, thus preserving their privacy.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
A Characterization Theorem for Equivariant Networks with Point-wise Activations
Authors:
Marco Pacini,
Xiaowen Dong,
Bruno Lepri,
Gabriele Santin
Abstract:
Equivariant neural networks have shown improved performance, expressiveness and sample complexity on symmetrical domains. But for some specific symmetries, representations, and choice of coordinates, the most common point-wise activations, such as ReLU, are not equivariant, hence they cannot be employed in the design of equivariant neural networks. The theorem we present in this paper describes al…
▽ More
Equivariant neural networks have shown improved performance, expressiveness and sample complexity on symmetrical domains. But for some specific symmetries, representations, and choice of coordinates, the most common point-wise activations, such as ReLU, are not equivariant, hence they cannot be employed in the design of equivariant neural networks. The theorem we present in this paper describes all possible combinations of finite-dimensional representations, choice of coordinates and point-wise activations to obtain an exactly equivariant layer, generalizing and strengthening existing characterizations. Notable cases of practical relevance are discussed as corollaries. Indeed, we prove that rotation-equivariant networks can only be invariant, as it happens for any network which is equivariant with respect to connected compact groups. Then, we discuss implications of our findings when applied to important instances of exactly equivariant networks. First, we completely characterize permutation equivariant networks such as Invariant Graph Networks with point-wise nonlinearities and their geometric counterparts, highlighting a plethora of models whose expressive power and performance are still unknown. Second, we show that feature spaces of disentangled steerable convolutional neural networks are trivial representations.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Zero-Shot Point Cloud Registration
Authors:
Weijie Wang,
Guofeng Mei,
Bin Ren,
Xiaoshui Huang,
Fabio Poiesi,
Luc Van Gool,
Nicu Sebe,
Bruno Lepri
Abstract:
Learning-based point cloud registration approaches have significantly outperformed their traditional counterparts. However, they typically require extensive training on specific datasets. In this paper, we propose , the first zero-shot point cloud registration approach that eliminates the need for training on point cloud datasets. The cornerstone of ZeroReg is the novel transfer of image features…
▽ More
Learning-based point cloud registration approaches have significantly outperformed their traditional counterparts. However, they typically require extensive training on specific datasets. In this paper, we propose , the first zero-shot point cloud registration approach that eliminates the need for training on point cloud datasets. The cornerstone of ZeroReg is the novel transfer of image features from keypoints to the point cloud, enriched by aggregating information from 3D geometric neighborhoods. Specifically, we extract keypoints and features from 2D image pairs using a frozen pretrained 2D backbone. These features are then projected in 3D, and patches are constructed by searching for neighboring points. We integrate the geometric and visual features of each point using our novel parameter-free geometric decoder. Subsequently, the task of determining correspondences between point clouds is formulated as an optimal transport problem. Extensive evaluations of ZeroReg demonstrate its competitive performance against both traditional and learning-based methods. On benchmarks such as 3DMatch, 3DLoMatch, and ScanNet, ZeroReg achieves impressive Recall Ratios (RR) of over 84%, 46%, and 75%, respectively.
△ Less
Submitted 8 December, 2023; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Glitter or Gold? Deriving Structured Insights from Sustainability Reports via Large Language Models
Authors:
Marco Bronzini,
Carlo Nicolini,
Bruno Lepri,
Andrea Passerini,
Jacopo Staiano
Abstract:
Over the last decade, several regulatory bodies have started requiring the disclosure of non-financial information from publicly listed companies, in light of the investors' increasing attention to Environmental, Social, and Governance (ESG) issues. Publicly released information on sustainability practices is often disclosed in diverse, unstructured, and multi-modal documentation. This poses a cha…
▽ More
Over the last decade, several regulatory bodies have started requiring the disclosure of non-financial information from publicly listed companies, in light of the investors' increasing attention to Environmental, Social, and Governance (ESG) issues. Publicly released information on sustainability practices is often disclosed in diverse, unstructured, and multi-modal documentation. This poses a challenge in efficiently gathering and aligning the data into a unified framework to derive insights related to Corporate Social Responsibility (CSR). Thus, using Information Extraction (IE) methods becomes an intuitive choice for delivering insightful and actionable data to stakeholders. In this study, we employ Large Language Models (LLMs), In-Context Learning, and the Retrieval-Augmented Generation (RAG) paradigm to extract structured insights related to ESG aspects from companies' sustainability reports. We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights. These analyses revealed that ESG criteria cover a wide range of topics, exceeding 500, often beyond those considered in existing categorizations, and are addressed by companies through a variety of initiatives. Moreover, disclosure similarities emerged among companies from the same region or sector, validating ongoing hypotheses in the ESG literature. Lastly, by incorporating additional company attributes into our analyses, we investigated which factors impact the most on companies' ESG ratings, showing that ESG disclosure affects the obtained ratings more than other financial or company data.
△ Less
Submitted 16 January, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Autonomous and Human-Driven Vehicles Interacting in a Roundabout: A Quantitative and Qualitative Evaluation
Authors:
Laura Ferrarotti,
Massimiliano Luca,
Gabriele Santin,
Giorgio Previati,
Gianpiero Mastinu,
Massimiliano Gobbi,
Elena Campi,
Lorenzo Uccello,
Antonino Albanese,
Praveen Zalaya,
Alessandro Roccasalva,
Bruno Lepri
Abstract:
Optimizing traffic dynamics in an evolving transportation landscape is crucial, particularly in scenarios where autonomous vehicles (AVs) with varying levels of autonomy coexist with human-driven cars. While optimizing Reinforcement Learning (RL) policies for such scenarios is becoming more and more common, little has been said about realistic evaluations of such trained policies. This paper prese…
▽ More
Optimizing traffic dynamics in an evolving transportation landscape is crucial, particularly in scenarios where autonomous vehicles (AVs) with varying levels of autonomy coexist with human-driven cars. While optimizing Reinforcement Learning (RL) policies for such scenarios is becoming more and more common, little has been said about realistic evaluations of such trained policies. This paper presents an evaluation of the effects of AVs penetration among human drivers in a roundabout scenario, considering both quantitative and qualitative aspects. In particular, we learn a policy to minimize traffic jams (i.e., minimize the time to cross the scenario) and to minimize pollution in a roundabout in Milan, Italy. Through empirical analysis, we demonstrate that the presence of AVs} can reduce time and pollution levels. Furthermore, we qualitatively evaluate the learned policy using a cutting-edge cockpit to assess its performance in near-real-world conditions. To gauge the practicality and acceptability of the policy, we conduct evaluations with human participants using the simulator, focusing on a range of metrics like traffic smoothness and safety perception. In general, our findings show that human-driven vehicles benefit from optimizing AVs dynamics. Also, participants in the study highlight that the scenario with 80% AVs is perceived as safer than the scenario with 20%. The same result is obtained for traffic smoothness perception.
△ Less
Submitted 23 February, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Turn Fake into Real: Adversarial Head Turn Attacks Against Deepfake Detection
Authors:
Weijie Wang,
Zhengyu Zhao,
Nicu Sebe,
Bruno Lepri
Abstract:
Malicious use of deepfakes leads to serious public concerns and reduces people's trust in digital media. Although effective deepfake detectors have been proposed, they are substantially vulnerable to adversarial attacks. To evaluate the detector's robustness, recent studies have explored various attacks. However, all existing attacks are limited to 2D image perturbations, which are hard to transla…
▽ More
Malicious use of deepfakes leads to serious public concerns and reduces people's trust in digital media. Although effective deepfake detectors have been proposed, they are substantially vulnerable to adversarial attacks. To evaluate the detector's robustness, recent studies have explored various attacks. However, all existing attacks are limited to 2D image perturbations, which are hard to translate into real-world facial changes. In this paper, we propose adversarial head turn (AdvHeat), the first attempt at 3D adversarial face views against deepfake detectors, based on face view synthesis from a single-view fake image. Extensive experiments validate the vulnerability of various detectors to AdvHeat in realistic, black-box scenarios. For example, AdvHeat based on a simple random search yields a high attack success rate of 96.8% with 360 searching steps. When additional query access is allowed, we can further reduce the step budget to 50. Additional analyses demonstrate that AdvHeat is better than conventional attacks on both the cross-detector transferability and robustness to defenses. The adversarial images generated by AdvHeat are also shown to have natural looks. Our code, including that for generating a multi-view dataset consisting of 360 synthetic views for each of 1000 IDs from FaceForensics++, is available at https://github.com/twowwj/AdvHeaT.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
Temporal clustering of social interactions trades-off disease spreading and knowledge diffusion
Authors:
Giulia Cencetti,
Lorenzo Lucchini,
Gabriele Santin,
Federico Battiston,
Esteban Moro,
Alex Pentland,
Bruno Lepri
Abstract:
Non-pharmaceutical measures such as preventive quarantines, remote working, school and workplace closures, lockdowns, etc. have shown effectivenness from an epidemic control perspective; however they have also significant negative consequences on social life and relationships, work routines, and community engagement. In particular, complex ideas, work and school collaborations, innovative discover…
▽ More
Non-pharmaceutical measures such as preventive quarantines, remote working, school and workplace closures, lockdowns, etc. have shown effectivenness from an epidemic control perspective; however they have also significant negative consequences on social life and relationships, work routines, and community engagement. In particular, complex ideas, work and school collaborations, innovative discoveries, and resilient norms formation and maintenance, which often require face-to-face interactions of two or more parties to be developed and synergically coordinated, are particularly affected. In this study, we propose an alternative hybrid solution that balances the slowdown of epidemic diffusion with the preservation of face-to-face interactions. Our approach involves a two-step partitioning of the population. First, we tune the level of node clustering, creating "social bubbles" with increased contacts within each bubble and fewer outside, while maintaining the average number of contacts in each network. Second, we tune the level of temporal clustering by pairing, for a certain time interval, nodes from specific social bubbles. Our results demonstrate that a hybrid approach can achieve better trade-offs between epidemic control and complex knowledge diffusion. The versatility of our model enables tuning and refining clustering levels to optimally achieve the desired trade-off, based on the potentially changing characteristics of a disease or knowledge diffusion process.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Adaptation of Student Behavioural Routines during COVID-19: A Multimodal Approach
Authors:
Nicolò A. Girardini,
Simone Centellegher,
Andrea Passerini,
Ivano Bison,
Fausto Giunchiglia,
Bruno Lepri
Abstract:
One population group that had to significantly adapt and change their behaviour during the COVID-19 pandemic is students. While previous studies have extensively investigated the impact of the pandemic on their psychological well-being and academic performance, limited attention has been given to their activity routines. In this work, we analyze students' behavioural changes by examining qualitati…
▽ More
One population group that had to significantly adapt and change their behaviour during the COVID-19 pandemic is students. While previous studies have extensively investigated the impact of the pandemic on their psychological well-being and academic performance, limited attention has been given to their activity routines. In this work, we analyze students' behavioural changes by examining qualitative and quantitative differences in their daily routines between two distinct periods (2018 and 2020). Using an Experience Sampling Method (ESM) that captures multimodal self-reported data on students' activity, locations and sociality, we apply Non-Negative Matrix Factorization (NMF) to extract meaningful behavioural components, and quantified the variations in behaviour between students in 2018 and 2020. Surprisingly, despite the presence of COVID-19 restrictions, we find minimal changes in the activities performed by students, and the diversity of activities also remains largely unaffected. Leveraging the richness of the data at our disposal, we discovered that activities adaptation to the pandemic primarily occurred in the location and sociality dimensions.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
T2TD: Text-3D Generation Model based on Prior Knowledge Guidance
Authors:
Weizhi Nie,
Ruidong Chen,
Weijie Wang,
Bruno Lepri,
Nicu Sebe
Abstract:
In recent years, 3D models have been utilized in many applications, such as auto-driver, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the ability of human beings to complement…
▽ More
In recent years, 3D models have been utilized in many applications, such as auto-driver, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the ability of human beings to complement visual information details from ambiguous descriptions based on their own experience, we propose a novel text-3D generation model (T2TD), which introduces the related shapes or textual information as the prior knowledge to improve the performance of the 3D generation model. In this process, we first introduce the text-3D knowledge graph to save the relationship between 3D models and textual semantic information, which can provide the related shapes to guide the target 3D model generation. Second, we integrate an effective causal inference model to select useful feature information from these related shapes, which removes the unrelated shape information and only maintains feature information that is strongly relevant to the textual description. Meanwhile, to effectively integrate multi-modal prior knowledge into textual information, we adopt a novel multi-layer transformer structure to progressively fuse related shape and textual information, which can effectively compensate for the lack of structural information in the text and enhance the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
The Rhythms of Transient Relationships: Allocating time between weekdays and weekends
Authors:
Valentín Vergara Hidd,
Mailun Zhang,
Simone Centellegher,
Sam G. B. Roberts,
Bruno Lepri,
Eduardo López
Abstract:
A fundamental question of any new relationship is, will it last? Transient relationships, recently defined by the authors, are an ideal type of social tie to explore this question: these relationships are characterized by distinguishable starting and ending temporal points, linking the question of tie longevity to relationship finite lifetime. In this study, we use mobile phone data sets from the…
▽ More
A fundamental question of any new relationship is, will it last? Transient relationships, recently defined by the authors, are an ideal type of social tie to explore this question: these relationships are characterized by distinguishable starting and ending temporal points, linking the question of tie longevity to relationship finite lifetime. In this study, we use mobile phone data sets from the UK and Italy to analyze the weekly allocation of time invested in maintaining transient relationships. We find that more relationships are created during weekdays, with a greater proportion of them receiving more contact during these days of the week in the long term. The smaller group of relationships that receive more phone calls during the weekend tend to remain active for more time. We uncover a sorting process by which some ties are moved from weekdays to weekends and vice versa, mostly in the first half of the relationship. This process also carries more information about the ultimate lifetime of a tie than the part of the week when the relationship started, which suggests an early evaluation period that leads to a decision on how to allocate time to different types of transient ties.
△ Less
Submitted 28 August, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Socioeconomic disparities in mobility behavior during the COVID-19 pandemic in developing countries
Authors:
Lorenzo Lucchini,
Ollin Langle-Chimal,
Lorenzo Candeago,
Lucio Melito,
Alex Chunet,
Aleister Montfort,
Bruno Lepri,
Nancy Lozano-Gracia,
Samuel P. Fraiberger
Abstract:
Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. By combining geolocation data from mobile phones and population census for 6 middle-income countries a…
▽ More
Mobile phone data have played a key role in quantifying human mobility during the COVID-19 pandemic. Existing studies on mobility patterns have primarily focused on regional aggregates in high-income countries, obfuscating the accentuated impact of the pandemic on the most vulnerable populations. By combining geolocation data from mobile phones and population census for 6 middle-income countries across 3 continents between March and December 2020, we uncovered common disparities in the behavioral response to the pandemic across socioeconomic groups. When the pandemic hit, urban users living in low-wealth neighborhoods were less likely to respond by self-isolating at home, relocating to rural areas, or refraining from commuting to work. The gap in the behavioral responses between socioeconomic groups persisted during the entire observation period. Among low-wealth users, those who used to commute to work in high-wealth neighborhoods pre-pandemic were particularly at risk, facing both the reduction in activity in high-wealth neighborhood and being more likely to be affected by public transport closures due to their longer commute. While confinement policies were predominantly country-wide, these results suggest a role for place-based policies informed by mobility data to target aid to the most vulnerable.
△ Less
Submitted 19 October, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Analysis of COVID-19 first wave in the US based on demographic, mobility, and environmental variables
Authors:
Dario Spiller,
Gabriele Santin,
Alessandro Sebastianelli,
Lorenzo Lucchini,
Riccardo Gallotti,
Brennan Lake,
Silvia Liberata Ullo,
Bertrand Le Saux,
Bruno Lepri
Abstract:
COVID-19 had a strong and disruptive impact on our society, and yet further analyses on most relevant factors explaining the spread of the pandemic are needed. Interdisciplinary studies linking epidemiological, mobility, environmental, and socio-demographic data analysis can help understanding how historical conditions, concurrent social policies and environmental factors impacted on the evolution…
▽ More
COVID-19 had a strong and disruptive impact on our society, and yet further analyses on most relevant factors explaining the spread of the pandemic are needed. Interdisciplinary studies linking epidemiological, mobility, environmental, and socio-demographic data analysis can help understanding how historical conditions, concurrent social policies and environmental factors impacted on the evolution of the pandemic crisis. This work deals with a regression analysis linking COVID-19 mortality to socio-demographic, mobility, and environmental data in the US during the first half of 2020, i.e., during the COVID-19 pandemic first wave. This study can provide very useful insights about risk factors enhancing mortality rates before non-pharmaceutical interventions or vaccination campaigns took place. Our cross-sectional ecological regression analysis demonstrates that, when considering the entire US area, the socio-demographic variables globally play the most important role with respect to environmental and mobility variables in describing COVID-19 mortality. Compared to the complete generalized linear model considering all socio-demographic, mobility, and environmental data, the regression based only on socio-demographic data provides a better approximation and proves to be a better explanatory model when compared to the mobility-based and environmental-based models. However, when looking at single entries within each of the three groups, we see that the mobility data can become relevant descriptive predictors at local scale, as in New Jersey where the time spent at work is one of the most relevant explanatory variables, while environmental data play contradictory roles.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Graph Neural Networks for temporal graphs: State of the art, open challenges, and opportunities
Authors:
Antonio Longa,
Veronica Lachi,
Gabriele Santin,
Monica Bianchini,
Bruno Lepri,
Pietro Lio,
Franco Scarselli,
Andrea Passerini
Abstract:
Graph Neural Networks (GNNs) have become the leading paradigm for learning on (static) graph-structured data. However, many real-world systems are dynamic in nature, since the graph and node/edge attributes change over time. In recent years, GNN-based models for temporal graphs have emerged as a promising area of research to extend the capabilities of GNNs. In this work, we provide the first compr…
▽ More
Graph Neural Networks (GNNs) have become the leading paradigm for learning on (static) graph-structured data. However, many real-world systems are dynamic in nature, since the graph and node/edge attributes change over time. In recent years, GNN-based models for temporal graphs have emerged as a promising area of research to extend the capabilities of GNNs. In this work, we provide the first comprehensive overview of the current state-of-the-art of temporal GNN, introducing a rigorous formalization of learning settings and tasks and a novel taxonomy categorizing existing approaches in terms of how the temporal aspect is represented and processed. We conclude the survey with a discussion of the most relevant open challenges for the field, from both research and application perspectives.
△ Less
Submitted 8 July, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Emergence of complex network topologies from flow-weighted optimization of network efficiency
Authors:
Sebastiano Bontorin,
Giulia Cencetti,
Riccardo Gallotti,
Bruno Lepri,
Manlio De Domenico
Abstract:
Transportation and distribution networks are a class of spatial networks that have been of interest in recent years. These networks are often characterized by the presence of complex structures such as central loops paired with peripheral branches, which can appear both in natural and man-made systems, such as subway and railway networks. In this study, we investigate the conditions for the emerge…
▽ More
Transportation and distribution networks are a class of spatial networks that have been of interest in recent years. These networks are often characterized by the presence of complex structures such as central loops paired with peripheral branches, which can appear both in natural and man-made systems, such as subway and railway networks. In this study, we investigate the conditions for the emergence of these non-trivial topological structures in the context of human transportation in cities. We propose a minimal model for spatial networks generation, where a network lattice acts as a spatial substrate and edge velocities and distances define an effective temporal distance which quantifies the efficiency in exploring the urban space. Complex network topologies can be recovered from the optimization of joint network paths and we study how the interplay between a flow probability between two nodes in space and the associated travel cost influences the resulting optimal network. In the perspective of urban transportation we simulate these flows by means of human mobility models to obtain Origin-Destination matrices. We find that when using simple lattices, the obtained optimal topologies transition from tree-like structures to more regular networks, depending on the spatial range of flows. Remarkably, we find that branches paired to large loops structures appear as optimal structures when the network is optimized for an interplay between heterogeneous mobility patterns of small range travels and longer range ones typical of commuting. Finally, we show that our framework is able to recover the statistical spatial properties of the Greater London Area subway network.
△ Less
Submitted 20 January, 2023;
originally announced January 2023.
-
Inequality, Crime and Public Health: A Survey of Emerging Trends in Urban Data Science
Authors:
Massimiliano Luca,
Gian Maria Campedelli,
Simone Centellegher,
Michele Tizzoni,
Bruno Lepri
Abstract:
Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations' Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales tha…
▽ More
Urban agglomerations are constantly and rapidly evolving ecosystems, with globalization and increasing urbanization posing new challenges in sustainable urban development well summarized in the United Nations' Sustainable Development Goals (SDGs). The advent of the digital age generated by modern alternative data sources provides new tools to tackle these challenges with spatio-temporal scales that were previously unavailable with census statistics. In this review, we present how new digital data sources are employed to provide data-driven insights to study and track (i) urban crime and public safety; (ii) socioeconomic inequalities and segregation; and (iii) public health, with a particular focus on the city scale.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Explaining the Explainers in Graph Neural Networks: a Comparative Study
Authors:
Antonio Longa,
Steve Azzolin,
Gabriele Santin,
Giulia Cencetti,
Pietro Liò,
Bruno Lepri,
Andrea Passerini
Abstract:
Following a fast initial breakthrough in graph based learning, Graph Neural Networks (GNNs) have reached a widespread application in many science and engineering fields, prompting the need for methods to understand their decision process.
GNN explainers have started to emerge in recent years, with a multitude of methods both novel or adapted from other domains. To sort out this plethora of alter…
▽ More
Following a fast initial breakthrough in graph based learning, Graph Neural Networks (GNNs) have reached a widespread application in many science and engineering fields, prompting the need for methods to understand their decision process.
GNN explainers have started to emerge in recent years, with a multitude of methods both novel or adapted from other domains. To sort out this plethora of alternative approaches, several studies have benchmarked the performance of different explainers in terms of various explainability metrics. However, these earlier works make no attempts at providing insights into why different GNN architectures are more or less explainable, or which explainer should be preferred in a given setting.
In this survey, we fill these gaps by devising a systematic experimental study, which tests ten explainers on eight representative architectures trained on six carefully designed graph and node classification datasets. With our results we provide key insights on the choice and applicability of GNN explainers, we isolate key components that make them usable and successful and provide recommendations on how to avoid common interpretation pitfalls. We conclude by highlighting open questions and directions of possible future research.
△ Less
Submitted 1 July, 2024; v1 submitted 27 October, 2022;
originally announced October 2022.
-
BIASeD: Bringing Irrationality into Automated System Design
Authors:
Aditya Gulati,
Miguel Angel Lozano,
Bruno Lepri,
Nuria Oliver
Abstract:
Human perception, memory and decision-making are impacted by tens of cognitive biases and heuristics that influence our actions and decisions. Despite the pervasiveness of such biases, they are generally not leveraged by today's Artificial Intelligence (AI) systems that model human behavior and interact with humans. In this theoretical paper, we claim that the future of human-machine collaboration…
▽ More
Human perception, memory and decision-making are impacted by tens of cognitive biases and heuristics that influence our actions and decisions. Despite the pervasiveness of such biases, they are generally not leveraged by today's Artificial Intelligence (AI) systems that model human behavior and interact with humans. In this theoretical paper, we claim that the future of human-machine collaboration will entail the development of AI systems that model, understand and possibly replicate human cognitive biases. We propose the need for a research agenda on the interplay between human cognitive biases and Artificial Intelligence. We categorize existing cognitive biases from the perspective of AI systems, identify three broad areas of interest and outline research directions for the design of AI systems that have a better understanding of our own biases.
△ Less
Submitted 1 December, 2023; v1 submitted 30 September, 2022;
originally announced October 2022.
-
Smooth image-to-image translations with latent space interpolations
Authors:
Yahui Liu,
Enver Sangineto,
Yajing Chen,
Linchao Bao,
Haoxian Zhang,
Nicu Sebe,
Bruno Lepri,
Marco De Nadai
Abstract:
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain. One important, desired characteristic of these transformations, is their graduality, which corresponds to a smooth change between the source and the target image when their respective latent-space representations are linearly interpolated. However, state-of-the-art methods usually…
▽ More
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain. One important, desired characteristic of these transformations, is their graduality, which corresponds to a smooth change between the source and the target image when their respective latent-space representations are linearly interpolated. However, state-of-the-art methods usually perform poorly when evaluated using inter-domain interpolations, often producing abrupt changes in the appearance or non-realistic intermediate images. In this paper, we argue that one of the main reasons behind this problem is the lack of sufficient inter-domain training data and we propose two different regularization methods to alleviate this issue: a new shrinkage loss, which compacts the latent space, and a Mixup data-augmentation strategy, which flattens the style representations between domains. We also propose a new metric to quantitatively evaluate the degree of the interpolation smoothness, an aspect which is not sufficiently covered by the existing I2I translation metrics. Using both our proposed metric and standard evaluation protocols, we show that our regularization techniques can improve the state-of-the-art multi-domain I2I translations by a large margin. Our code will be made publicly available upon the acceptance of this article.
△ Less
Submitted 14 March, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Rethinking the Learning Paradigm for Facial Expression Recognition
Authors:
Weijie Wang,
Nicu Sebe,
Bruno Lepri
Abstract:
Due to the subjective crowdsourcing annotations and the inherent inter-class similarity of facial expressions, the real-world Facial Expression Recognition (FER) datasets usually exhibit ambiguous annotation. To simplify the learning paradigm, most previous methods convert ambiguous annotation results into precise one-hot annotations and train FER models in an end-to-end supervised manner. In this…
▽ More
Due to the subjective crowdsourcing annotations and the inherent inter-class similarity of facial expressions, the real-world Facial Expression Recognition (FER) datasets usually exhibit ambiguous annotation. To simplify the learning paradigm, most previous methods convert ambiguous annotation results into precise one-hot annotations and train FER models in an end-to-end supervised manner. In this paper, we rethink the existing training paradigm and propose that it is better to use weakly supervised strategies to train FER models with original ambiguous annotation.
△ Less
Submitted 3 September, 2024; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Play&Go Corporate: An End-to-End Solution for Facilitating Urban Cyclability
Authors:
Antonio Bucchiarone,
Simone Bassanelli,
Massimiliano Luca,
Simone Centellegher,
Piergiorgio Cipriano,
Luca Giovannini,
Bruno Lepri,
Annapaola Marconi
Abstract:
Mobility plays a fundamental role in modern cities. How citizens experience the urban environment, access city core services, and participate in city life, strongly depends on its mobility organization and efficiency. The challenges that municipalities face are very ambitious: on the one hand, administrators must guarantee their citizens the right to mobility and to easily access local services; o…
▽ More
Mobility plays a fundamental role in modern cities. How citizens experience the urban environment, access city core services, and participate in city life, strongly depends on its mobility organization and efficiency. The challenges that municipalities face are very ambitious: on the one hand, administrators must guarantee their citizens the right to mobility and to easily access local services; on the other hand, they need to minimize the economic, social, and environmental costs of the mobility system. Municipalities are increasingly facing problems of traffic congestion, road safety, energy dependency and air pollution, and therefore encouraging a shift towards sustainable mobility habits based on active mobility is of central importance. Active modes, such as cycling, should be particularly encouraged, especially for local recurrent journeys (e.g., home--to--school, home--to--work). In this context, addressing and mitigating commuter-generated traffic requires engaging public and private stakeholders through innovative and collaborative approaches that focus not only on supply (e.g., roads and vehicles) but also on transportation demand management. In this paper, we present an end-to-end solution, called Play&Go Corporate, for enabling urban cyclability and its concrete exploitation in the realization of a home-to-work sustainable mobility campaign (i.e., Bike2Work) targeting employees of public and private companies. To evaluate the effectiveness of the proposed solution we developed two analyses: the first to carefully analyze the user experience and any behaviour change related to the Bike2Work mobility campaign, and the second to demonstrate how exploiting the collected data we can potentially inform and guide the involved municipality (i.e., Ferrara, a city in Northern Italy) in improving urban cyclability.
△ Less
Submitted 22 April, 2023; v1 submitted 6 September, 2022;
originally announced September 2022.
-
Spatial Entropy as an Inductive Bias for Vision Transformers
Authors:
Elia Peruzzo,
Enver Sangineto,
Yahui Liu,
Marco De Nadai,
Wei Bi,
Bruno Lepri,
Nicu Sebe
Abstract:
Recent work on Vision Transformers (VTs) showed that introducing a local inductive bias in the VT architecture helps reducing the number of samples necessary for training. However, the architecture modifications lead to a loss of generality of the Transformer backbone, partially contradicting the push towards the development of uniform architectures, shared, e.g., by both the Computer Vision and t…
▽ More
Recent work on Vision Transformers (VTs) showed that introducing a local inductive bias in the VT architecture helps reducing the number of samples necessary for training. However, the architecture modifications lead to a loss of generality of the Transformer backbone, partially contradicting the push towards the development of uniform architectures, shared, e.g., by both the Computer Vision and the Natural Language Processing areas. In this work, we propose a different and complementary direction, in which a local bias is introduced using an auxiliary self-supervised task, performed jointly with standard supervised training. Specifically, we exploit the observation that the attention maps of VTs, when trained with self-supervision, can contain a semantic segmentation structure which does not spontaneously emerge when training is supervised. Thus, we explicitly encourage the emergence of this spatial clustering as a form of training regularization. In more detail, we exploit the assumption that, in a given image, objects usually correspond to few connected regions, and we propose a spatial formulation of the information entropy to quantify this object-based inductive bias. By minimizing the proposed spatial entropy, we include an additional self-supervised signal during training. Using extensive experiments, we show that the proposed regularization leads to equivalent or better results than other VT proposals which include a local bias by changing the basic Transformer architecture, and it can drastically boost the VT final accuracy when using small-medium training sets. The code is available at https://github.com/helia95/SAR.
△ Less
Submitted 14 March, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Personalized Algorithmic Recourse with Preference Elicitation
Authors:
Giovanni De Toni,
Paolo Viappiani,
Stefano Teso,
Bruno Lepri,
Andrea Passerini
Abstract:
Algorithmic Recourse (AR) is the problem of computing a sequence of actions that -- once performed by a user -- overturns an undesirable machine decision. It is paramount that the sequence of actions does not require too much effort for users to implement. Yet, most approaches to AR assume that actions cost the same for all users, and thus may recommend unfairly expensive recourse plans to certain…
▽ More
Algorithmic Recourse (AR) is the problem of computing a sequence of actions that -- once performed by a user -- overturns an undesirable machine decision. It is paramount that the sequence of actions does not require too much effort for users to implement. Yet, most approaches to AR assume that actions cost the same for all users, and thus may recommend unfairly expensive recourse plans to certain users. Prompted by this observation, we introduce PEAR, the first human-in-the-loop approach capable of providing personalized algorithmic recourse tailored to the needs of any end-user. PEAR builds on insights from Bayesian Preference Elicitation to iteratively refine an estimate of the costs of actions by asking choice set queries to the target user. The queries themselves are computed by maximizing the Expected Utility of Selection, a principled measure of information gain accounting for uncertainty on both the cost estimate and the user's responses. PEAR integrates elicitation into a Reinforcement Learning agent coupled with Monte Carlo Tree Search to quickly identify promising recourse plans. Our empirical evaluation on real-world datasets highlights how PEAR produces high-quality personalized recourse in only a handful of iterations.
△ Less
Submitted 23 January, 2024; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Generating fine-grained surrogate temporal networks
Authors:
Antonio Longa,
Giulia Cencetti,
Sune Lehmann,
Andrea Passerini,
Bruno Lepri
Abstract:
Temporal networks are essential for modeling and understanding systems whose behavior varies in time, from social interactions to biological systems. Often, however, real-world data are prohibitively expensive to collect in a large scale or unshareable due to privacy concerns. A promising way to bypass the problem consists in generating arbitrarily large and anonymized synthetic graphs with the pr…
▽ More
Temporal networks are essential for modeling and understanding systems whose behavior varies in time, from social interactions to biological systems. Often, however, real-world data are prohibitively expensive to collect in a large scale or unshareable due to privacy concerns. A promising way to bypass the problem consists in generating arbitrarily large and anonymized synthetic graphs with the properties of real-world networks, namely `surrogate networks'. Until now, the generation of realistic surrogate temporal networks has remained an open problem, due to the difficulty of capturing both the temporal and topological properties of the input network, as well as their correlations, in a scalable model. Here, we propose a novel and simple method for generating surrogate temporal networks. Our method decomposes the input network into star-like structures evolving in time. Then those structures are used as building blocks to generate a surrogate temporal network. Our model vastly outperforms current methods across multiple examples of temporal networks in terms of both topological and dynamical similarity. We further show that beyond generating realistic interaction patterns, our method is able to capture intrinsic temporal periodicity of temporal networks, all with an execution time lower than competing methods by multiple orders of magnitude. The simplicity of our algorithm makes it easily interpretable, extendable and algorithmically scalable.
△ Less
Submitted 22 August, 2023; v1 submitted 18 May, 2022;
originally announced May 2022.
-
The Stability of Transient Relationships
Authors:
Valentin Vergara Hidd,
Eduardo Lopez,
Simone Centellegher,
Sam Roberts,
Bruno Lepri,
Robin Dunbar
Abstract:
In contrast to long-term relationships, far less is known about the temporal evolution of transient relationships, although these constitute a substantial fraction of people's communication networks. Previous literature suggests that ratings of relationship emotional intensity decay gradually until the relationship ends. Using mobile phone data from three countries (US, UK, and Italy), we demonstr…
▽ More
In contrast to long-term relationships, far less is known about the temporal evolution of transient relationships, although these constitute a substantial fraction of people's communication networks. Previous literature suggests that ratings of relationship emotional intensity decay gradually until the relationship ends. Using mobile phone data from three countries (US, UK, and Italy), we demonstrate that the volume of communication between ego and its transient alters does not display such a systematic decay, instead showing a lack of any dominant trends. This means that the communication volume of egos to groups of similar transient alters is stable. We show that alters with longer lifetimes in ego's network receive more calls, with the lifetime of the relationship being predictable from call volume within the first few weeks of first contact. This is observed across all three countries, which include samples of egos at different life stages. The relation between early call volume and lifetime is consistent with the suggestion that individuals initially engage with a new alter so as to evaluate their potential as a tie in terms of homophily.
△ Less
Submitted 23 May, 2023; v1 submitted 11 April, 2022;
originally announced April 2022.
-
A Framework for Verifiable and Auditable Federated Anomaly Detection
Authors:
Gabriele Santin,
Inna Skarbovsky,
Fabiana Fournier,
Bruno Lepri
Abstract:
Federated Leaning is an emerging approach to manage cooperation between a group of agents for the solution of Machine Learning tasks, with the goal of improving each agent's performance without disclosing any data. In this paper we present a novel algorithmic architecture that tackle this problem in the particular case of Anomaly Detection (or classification or rare events), a setting where typica…
▽ More
Federated Leaning is an emerging approach to manage cooperation between a group of agents for the solution of Machine Learning tasks, with the goal of improving each agent's performance without disclosing any data. In this paper we present a novel algorithmic architecture that tackle this problem in the particular case of Anomaly Detection (or classification or rare events), a setting where typical applications often comprise data with sensible information, but where the scarcity of anomalous examples encourages collaboration. We show how Random Forests can be used as a tool for the development of accurate classifiers with an effective insight-sharing mechanism that does not break the data integrity. Moreover, we explain how the new architecture can be readily integrated in a blockchain infrastructure to ensure the verifiable and auditable execution of the algorithm. Furthermore, we discuss how this work may set the basis for a more general approach for the design of federated ensemble-learning methods beyond the specific task and architecture discussed in this paper.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Reprogramming FairGANs with Variational Auto-Encoders: A New Transfer Learning Model
Authors:
Beatrice Nobile,
Gabriele Santin,
Bruno Lepri,
Pierpaolo Brutti
Abstract:
Fairness-aware GANs (FairGANs) exploit the mechanisms of Generative Adversarial Networks (GANs) to impose fairness on the generated data, freeing them from both disparate impact and disparate treatment. Given the model's advantages and performance, we introduce a novel learning framework to transfer a pre-trained FairGAN to other tasks. This reprogramming process has the goal of maintaining the Fa…
▽ More
Fairness-aware GANs (FairGANs) exploit the mechanisms of Generative Adversarial Networks (GANs) to impose fairness on the generated data, freeing them from both disparate impact and disparate treatment. Given the model's advantages and performance, we introduce a novel learning framework to transfer a pre-trained FairGAN to other tasks. This reprogramming process has the goal of maintaining the FairGAN's main targets of data utility, classification utility, and data fairness, while widening its applicability and ease of use. In this paper we present the technical extensions required to adapt the original architecture to this new framework (and in particular the use of Variational Auto-Encoders), and discuss the benefits, trade-offs, and limitations of the new model.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Trajectory Test-Train Overlap in Next-Location Prediction Datasets
Authors:
Massimiliano Luca,
Luca Pappalardo,
Bruno Lepri,
Gianni Barlacchi
Abstract:
Next-location prediction, consisting of forecasting a user's location given their historical trajectories, has important implications in several fields, such as urban planning, geo-marketing, and disease spreading. Several predictors have been proposed in the last few years to address it, including last-generation ones based on deep learning. This paper tests the generalization capability of these…
▽ More
Next-location prediction, consisting of forecasting a user's location given their historical trajectories, has important implications in several fields, such as urban planning, geo-marketing, and disease spreading. Several predictors have been proposed in the last few years to address it, including last-generation ones based on deep learning. This paper tests the generalization capability of these predictors on public mobility datasets, stratifying the datasets by whether the trajectories in the test set also appear fully or partially in the training set. We consistently discover a severe problem of trajectory overlapping in all analyzed datasets, highlighting that predictors memorize trajectories while having limited generalization capacities. We thus propose a methodology to rerank the outputs of the next-location predictors based on spatial mobility patterns. With these techniques, we significantly improve the predictors' generalization capability, with a relative improvement on the accuracy up to 96.15% on the trajectories that cannot be memorized (i.e., low overlap with the training set).
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Generating Synthetic Mobility Networks with Generative Adversarial Networks
Authors:
Giovanni Mauro,
Massimiliano Luca,
Antonio Longa,
Bruno Lepri,
Luca Pappalardo
Abstract:
The increasingly crucial role of human displacements in complex societal phenomena, such as traffic congestion, segregation, and the diffusion of epidemics, is attracting the interest of scientists from several disciplines. In this article, we address mobility network generation, i.e., generating a city's entire mobility network, a weighted directed graph in which nodes are geographic locations an…
▽ More
The increasingly crucial role of human displacements in complex societal phenomena, such as traffic congestion, segregation, and the diffusion of epidemics, is attracting the interest of scientists from several disciplines. In this article, we address mobility network generation, i.e., generating a city's entire mobility network, a weighted directed graph in which nodes are geographic locations and weighted edges represent people's movements between those locations, thus describing the entire mobility set flows within a city. Our solution is MoGAN, a model based on Generative Adversarial Networks (GANs) to generate realistic mobility networks. We conduct extensive experiments on public datasets of bike and taxi rides to show that MoGAN outperforms the classical Gravity and Radiation models regarding the realism of the generated networks. Our model can be used for data augmentation and performing simulations and what-if analysis.
△ Less
Submitted 14 December, 2022; v1 submitted 22 February, 2022;
originally announced February 2022.
-
Synthesizing explainable counterfactual policies for algorithmic recourse with program synthesis
Authors:
Giovanni De Toni,
Bruno Lepri,
Andrea Passerini
Abstract:
Being able to provide counterfactual interventions - sequences of actions we would have had to take for a desirable outcome to happen - is essential to explain how to change an unfavourable decision by a black-box machine learning model (e.g., being denied a loan request). Existing solutions have mainly focused on generating feasible interventions without providing explanations on their rationale.…
▽ More
Being able to provide counterfactual interventions - sequences of actions we would have had to take for a desirable outcome to happen - is essential to explain how to change an unfavourable decision by a black-box machine learning model (e.g., being denied a loan request). Existing solutions have mainly focused on generating feasible interventions without providing explanations on their rationale. Moreover, they need to solve a separate optimization problem for each user. In this paper, we take a different approach and learn a program that outputs a sequence of explainable counterfactual actions given a user description and a causal graph. We leverage program synthesis techniques, reinforcement learning coupled with Monte Carlo Tree Search for efficient exploration, and rule learning to extract explanations for each recommended action. An experimental evaluation on synthetic and real-world datasets shows how our approach generates effective interventions by making orders of magnitude fewer queries to the black-box classifier with respect to existing solutions, with the additional benefit of complementing them with interpretable explanations.
△ Less
Submitted 12 October, 2022; v1 submitted 18 January, 2022;
originally announced January 2022.
-
Modeling International Mobility using Roaming Cell Phone Traces during COVID-19 Pandemic
Authors:
Massimiliano Luca,
Bruno Lepri,
Enrique Frias-Martinez,
Andra Lutu
Abstract:
Most of the studies related to human mobility are focused on intra-country mobility. However, there are many scenarios (e.g., spreading diseases, migration) in which timely data on international commuters are vital. Mobile phones represent a unique opportunity to monitor international mobility flows in a timely manner and with proper spatial aggregation. This work proposes using roaming data gener…
▽ More
Most of the studies related to human mobility are focused on intra-country mobility. However, there are many scenarios (e.g., spreading diseases, migration) in which timely data on international commuters are vital. Mobile phones represent a unique opportunity to monitor international mobility flows in a timely manner and with proper spatial aggregation. This work proposes using roaming data generated by mobile phones to model incoming and outgoing international mobility. We use the gravity and radiation models to capture mobility flows before and during the introduction of non-pharmaceutical interventions. However, traditional models have some limitations: for instance, mobility restrictions are not explicitly captured and may play a crucial role. To overtake such limitations, we propose the COVID Gravity Model (CGM), namely an extension of the traditional gravity model that is tailored for the pandemic scenario. This proposed approach overtakes, in terms of accuracy, the traditional models by 126.9% for incoming mobility and by 63.9% when modeling outgoing mobility flows.
△ Less
Submitted 21 March, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
An Efficient Procedure for Mining Egocentric Temporal Motifs
Authors:
Antonio Longa,
Giulia Cencetti,
Bruno Lepri,
Andrea Passerini
Abstract:
Temporal graphs are structures which model relational data between entities that change over time. Due to the complex structure of data, mining statistically significant temporal subgraphs, also known as temporal motifs, is a challenging task. In this work, we present an efficient technique for extracting temporal motifs in temporal networks. Our method is based on the novel notion of egocentric t…
▽ More
Temporal graphs are structures which model relational data between entities that change over time. Due to the complex structure of data, mining statistically significant temporal subgraphs, also known as temporal motifs, is a challenging task. In this work, we present an efficient technique for extracting temporal motifs in temporal networks. Our method is based on the novel notion of egocentric temporal neighborhoods, namely multi-layer structures centered on an ego node. Each temporal layer of the structure consists of the first-order neighborhood of the ego node, and corresponding nodes in sequential layers are connected by an edge. The strength of this approach lies in the possibility of encoding these structures into a unique bit vector, thus bypassing the problem of graph isomorphism in searching for temporal motifs. This allows our algorithm to mine substantially larger motifs with respect to alternative approaches. Furthermore, by bringing the focus on the temporal dynamics of the interactions of a specific node, our model allows to mine temporal motifs which are visibly interpretable. Experiments on a number of complex networks of social interactions confirm the advantage of the proposed approach over alternative non-egocentric solutions. The egocentric procedure is indeed more efficient in revealing similarities and discrepancies among different social environments, independently of the different technologies used to collect data, which instead affect standard non-egocentric measures.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation
Authors:
Yahui Liu,
Yajing Chen,
Linchao Bao,
Nicu Sebe,
Bruno Lepri,
Marco De Nadai
Abstract:
Recently, there has been an increasing interest in image editing methods that employ pre-trained unconditional image generators (e.g., StyleGAN). However, applying these methods to translate images to multiple visual domains remains challenging. Existing works do not often preserve the domain-invariant part of the image (e.g., the identity in human face translations), they do not usually handle mu…
▽ More
Recently, there has been an increasing interest in image editing methods that employ pre-trained unconditional image generators (e.g., StyleGAN). However, applying these methods to translate images to multiple visual domains remains challenging. Existing works do not often preserve the domain-invariant part of the image (e.g., the identity in human face translations), they do not usually handle multiple domains, or do not allow for multi-modal translations. This work proposes an implicit style function (ISF) to straightforwardly achieve multi-modal and multi-domain image-to-image translation from pre-trained unconditional generators. The ISF manipulates the semantics of an input latent code to make the image generated from it lying in the desired visual domain. Our results in human face and animal manipulations show significantly improved results over the baselines. Our model enables cost-effective multi-modal unsupervised image-to-image translations at high resolution using pre-trained unconditional GANs. The code and data are available at: \url{https://github.com/yhlleo/stylegan-mmuit}.
△ Less
Submitted 23 February, 2022; v1 submitted 26 September, 2021;
originally announced September 2021.
-
Click to Move: Controlling Video Generation with Sparse Motion
Authors:
Pierfrancesco Ardino,
Marco De Nadai,
Bruno Lepri,
Elisa Ricci,
Stéphane Lathuilière
Abstract:
This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs…
▽ More
This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input. Notably, our proposed deep architecture incorporates a Graph Convolution Network (GCN) modelling the movements of all the objects in the scene in a holistic manner and effectively combining the sparse user motion information and image features. Experimental results show that C2M outperforms existing methods on two publicly available datasets, thus demonstrating the effectiveness of our GCN framework at modelling object interactions. The source code is publicly available at https://github.com/PierfrancescoArdino/C2M.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Living in a pandemic: adaptation of individual mobility and social activity in the US
Authors:
Lorenzo Lucchini,
Simone Centellegher,
Luca Pappalardo,
Riccardo Gallotti,
Filippo Privitera,
Bruno Lepri,
Marco De Nadai
Abstract:
The non-pharmaceutical interventions (NPIs), aimed at reducing the diffusion of the COVID-19 pandemic, has dramatically influenced our behaviour in everyday life. In this work, we study how individuals adapted their daily movements and person-to-person contact patterns over time in response to the COVID-19 pandemic and the NPIs. We leverage longitudinal GPS mobility data of hundreds of thousands o…
▽ More
The non-pharmaceutical interventions (NPIs), aimed at reducing the diffusion of the COVID-19 pandemic, has dramatically influenced our behaviour in everyday life. In this work, we study how individuals adapted their daily movements and person-to-person contact patterns over time in response to the COVID-19 pandemic and the NPIs. We leverage longitudinal GPS mobility data of hundreds of thousands of anonymous individuals in four US states and empirically show the dramatic disruption in people's life. We find that local interventions did not just impact the number of visits to different venues but also how people experience them. Individuals spend less time in venues, preferring simpler and more predictable routines and reducing person-to-person contact activities. Moreover, we show that the stringency of interventions alone does explain the number and duration of visits to venues: individual patterns of visits seem to be influenced by the local severity of the pandemic and a risk adaptation factor, which increases the people's mobility regardless of the stringency of interventions.
△ Less
Submitted 17 August, 2021; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Measuring close proximity interactions in summer camps during the COVID-19 pandemic
Authors:
E. Leoni,
G. Cencetti,
G. Santin,
T. Istomin,
D. Molteni,
G. P. Picco,
E. Farella,
B. Lepri,
A. M. Murphy
Abstract:
Policy makers have implemented multiple non-pharmaceutical strategies to mitigate the COVID-19 worldwide crisis. Interventions had the aim of reducing close proximity interactions, which drive the spread of the disease. A deeper knowledge of human physical interactions has revealed necessary, especially in all settings involving children, whose education and gathering activities should be preserve…
▽ More
Policy makers have implemented multiple non-pharmaceutical strategies to mitigate the COVID-19 worldwide crisis. Interventions had the aim of reducing close proximity interactions, which drive the spread of the disease. A deeper knowledge of human physical interactions has revealed necessary, especially in all settings involving children, whose education and gathering activities should be preserved. Despite their relevance, almost no data are available on close proximity contacts among children in schools or other educational settings during the pandemic. Contact data are usually gathered via Bluetooth, which nonetheless offers a low temporal and spatial resolution. Recently, ultra-wideband (UWB) radios emerged as a more accurate alternative that nonetheless exhibits a significantly higher energy consumption, limiting in-field studies. In this paper, we leverage a novel approach, embodied by the Janus system that combines these radios by exploiting their complementary benefits. The very accurate proximity data gathered in-field by Janus, once augmented with several metadata, unlocks unprecedented levels of information, enabling the development of novel multi-level risk analyses. By means of this technology, we have collected real contact data of children and educators in three summer camps during summer 2020 in the province of Trento, Italy. The wide variety of performed daily activities induced multiple individual behaviors, allowing a rich investigation of social environments from the contagion risk perspective. We consider risk based on duration and proximity of contacts and classify interactions according to different risk levels. We can then evaluate the summer camps' organization, observe the effect of partition in small groups, or social bubbles, and identify the organized activities that mitigate the riskier behaviors. [...]
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation
Authors:
Yahui Liu,
Enver Sangineto,
Yajing Chen,
Linchao Bao,
Haoxian Zhang,
Nicu Sebe,
Bruno Lepri,
Wei Wang,
Marco De Nadai
Abstract:
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic interpolation results. However, state-of-the-art models frequently show abrupt changes in the image appearance during interpolation, and usually perform poorly in interpolations across domains. In this paper, we propose a new training protocol based on three specific losses which hel…
▽ More
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic interpolation results. However, state-of-the-art models frequently show abrupt changes in the image appearance during interpolation, and usually perform poorly in interpolations across domains. In this paper, we propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space in which: 1) Both intra- and inter-domain interpolations correspond to gradual changes in the generated images and 2) The content of the source image is better preserved during the translation. Moreover, we propose a novel evaluation metric to properly measure the smoothness of latent style space of I2I translation models. The proposed method can be plugged into existing translation approaches, and our extensive experiments on different datasets show that it can significantly boost the quality of the generated images and the graduality of the interpolations.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Efficient Training of Visual Transformers with Small Datasets
Authors:
Yahui Liu,
Enver Sangineto,
Wei Bi,
Nicu Sebe,
Bruno Lepri,
Marco De Nadai
Abstract:
Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger representation capacity. However, the lack of the typical convolutional inductive bias makes these models more data-hungry than common CNNs. In fact, some local properties…
▽ More
Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger representation capacity. However, the lack of the typical convolutional inductive bias makes these models more data-hungry than common CNNs. In fact, some local properties of the visual domain which are embedded in the CNN architectural design, in VTs should be learned from samples. In this paper, we empirically analyse different VTs, comparing their robustness in a small training-set regime, and we show that, despite having a comparable accuracy when trained on ImageNet, their performance on smaller datasets can be largely different. Moreover, we propose a self-supervised task which can extract additional information from images with only a negligible computational overhead. This task encourages the VTs to learn spatial relations within an image and makes the VT training much more robust when training data are scarce. Our task is used jointly with the standard (supervised) training and it does not depend on specific architectural choices, thus it can be easily plugged in the existing VTs. Using an extensive evaluation with different VTs and datasets, we show that our method can improve (sometimes dramatically) the final accuracy of the VTs. Our code is available at: https://github.com/yhlleo/VTs-Drloc.
△ Less
Submitted 14 November, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.