-
Journalists are most likely to receive abuse: Analysing online abuse of UK public figures across sport, politics, and journalism on Twitter
Authors:
Liam Burke-Moore,
Angus R. Williams,
Jonathan Bright
Abstract:
Engaging with online social media platforms is an important part of life as a public figure in modern society, enabling connection with broad audiences and providing a platform for spreading ideas. However, public figures are often disproportionate recipients of hate and abuse on these platforms, degrading public discourse. While significant research on abuse received by groups such as politicians…
▽ More
Engaging with online social media platforms is an important part of life as a public figure in modern society, enabling connection with broad audiences and providing a platform for spreading ideas. However, public figures are often disproportionate recipients of hate and abuse on these platforms, degrading public discourse. While significant research on abuse received by groups such as politicians and journalists exists, little has been done to understand the differences in the dynamics of abuse across different groups of public figures, systematically and at scale. To address this, we present analysis of a novel dataset of 45.5M tweets targeted at 4,602 UK public figures across 3 domains (members of parliament, footballers, journalists), labelled using fine-tuned transformer-based language models. We find that MPs receive more abuse in absolute terms, but that journalists are most likely to receive abuse after controlling for other factors. We show that abuse is unevenly distributed in all groups, with a small number of individuals receiving the majority of abuse, and that for some groups, abuse is more temporally uneven, being driven by specific events, particularly for footballers. We also find that a more prominent online presence and being male are indicative of higher levels of abuse across all 3 domains.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Prompto: An open source library for asynchronous querying of LLM endpoints
Authors:
Ryan Sze-Yin Chan,
Federico Nanni,
Edwin Brown,
Ed Chapman,
Angus R. Williams,
Jonathan Bright,
Evelina Gabasova
Abstract:
Recent surge in Large Language Model (LLM) availability has opened exciting avenues for research. However, efficiently interacting with these models presents a significant hurdle since LLMs often reside on proprietary or self-hosted API endpoints, each requiring custom code for interaction. Conducting comparative studies between different models can therefore be time-consuming and necessitate sign…
▽ More
Recent surge in Large Language Model (LLM) availability has opened exciting avenues for research. However, efficiently interacting with these models presents a significant hurdle since LLMs often reside on proprietary or self-hosted API endpoints, each requiring custom code for interaction. Conducting comparative studies between different models can therefore be time-consuming and necessitate significant engineering effort, hindering research efficiency and reproducibility. To address these challenges, we present prompto, an open source Python library which facilitates asynchronous querying of LLM endpoints enabling researchers to interact with multiple LLMs concurrently, while maximising efficiency and utilising individual rate limits. Our library empowers researchers and developers to interact with LLMs more effectively and enabling faster experimentation and evaluation. prompto is released with an introductory video (https://youtu.be/-eZAmlV4ypk) under MIT License and is available via GitHub (https://github.com/alan-turing-institute/prompto).
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Large language models can consistently generate high-quality content for election disinformation operations
Authors:
Angus R. Williams,
Liam Burke-Moore,
Ryan Sze-Yin Chan,
Florence E. Enock,
Federico Nanni,
Tvesha Sippy,
Yi-Ling Chung,
Evelina Gabasova,
Kobi Hackenburg,
Jonathan Bright
Abstract:
Advances in large language models have raised concerns about their potential use in generating compelling election disinformation at scale. This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation. First, we introduce DisElect, a novel evaluation dataset designed to measure LLM compliance with instructions to generate con…
▽ More
Advances in large language models have raised concerns about their potential use in generating compelling election disinformation at scale. This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation. First, we introduce DisElect, a novel evaluation dataset designed to measure LLM compliance with instructions to generate content for an election disinformation operation in localised UK context, containing 2,200 malicious prompts and 50 benign prompts. Using DisElect, we test 13 LLMs and find that most models broadly comply with these requests; we also find that the few models which refuse malicious prompts also refuse benign election-related prompts, and are more likely to refuse to generate content from a right-wing perspective. Secondly, we conduct a series of experiments (N=2,340) to assess the "humanness" of LLMs: the extent to which disinformation operation content generated by an LLM is able to pass as human-written. Our experiments suggest that almost all LLMs tested released since 2022 produce election disinformation operation content indiscernible by human evaluators over 50% of the time. Notably, we observe that multiple models achieve above-human levels of humanness. Taken together, these findings suggest that current LLMs can be used to generate high-quality content for election disinformation operations, even in hyperlocalised scenarios, at far lower costs than traditional methods, and offer researchers and policymakers an empirical benchmark for the measurement and evaluation of these capabilities in current and future models.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Behind the Deepfake: 8% Create; 90% Concerned. Surveying public exposure to and perceptions of deepfakes in the UK
Authors:
Tvesha Sippy,
Florence Enock,
Jonathan Bright,
Helen Z. Margetts
Abstract:
This article examines public exposure to and perceptions of deepfakes based on insights from a nationally representative survey of 1403 UK adults. The survey is one of the first of its kind since recent improvements in deepfake technology and widespread adoption of political deepfakes. The findings reveal three key insights. First, on average, 15% of people report exposure to harmful deepfakes, in…
▽ More
This article examines public exposure to and perceptions of deepfakes based on insights from a nationally representative survey of 1403 UK adults. The survey is one of the first of its kind since recent improvements in deepfake technology and widespread adoption of political deepfakes. The findings reveal three key insights. First, on average, 15% of people report exposure to harmful deepfakes, including deepfake pornography, deepfake frauds/scams and other potentially harmful deepfakes such as those that spread health/religious misinformation/propaganda. In terms of common targets, exposure to deepfakes featuring celebrities was 50.2%, whereas those featuring politicians was 34.1%. And 5.7% of respondents recall exposure to a selection of high profile political deepfakes in the UK. Second, while exposure to harmful deepfakes was relatively low, awareness of and fears about deepfakes were high (and women were significantly more likely to report experiencing such fears than men). As with fears, general concerns about the spread of deepfakes were also high; 90.4% of the respondents were either very concerned or somewhat concerned about this issue. Most respondents (at least 91.8%) were concerned that deepfakes could add to online child sexual abuse material, increase distrust in information and manipulate public opinion. Third, while awareness about deepfakes was high, usage of deepfake tools was relatively low (8%). Most respondents were not confident about their detection abilities and were trustful of audiovisual content online. Our work highlights how the problem of deepfakes has become embedded in public consciousness in just a few years; it also highlights the need for media literacy programmes and other policy interventions to address the spread of harmful deepfakes.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Evidence of a log scaling law for political persuasion with large language models
Authors:
Kobi Hackenburg,
Ben M. Tappin,
Paul Röttger,
Scott Hale,
Jonathan Bright,
Helen Margetts
Abstract:
Large language models can now generate political messages as persuasive as those written by humans, raising concerns about how far this persuasiveness may continue to increase with model size. Here, we generate 720 persuasive messages on 10 U.S. political issues from 24 language models spanning several orders of magnitude in size. We then deploy these messages in a large-scale randomized survey ex…
▽ More
Large language models can now generate political messages as persuasive as those written by humans, raising concerns about how far this persuasiveness may continue to increase with model size. Here, we generate 720 persuasive messages on 10 U.S. political issues from 24 language models spanning several orders of magnitude in size. We then deploy these messages in a large-scale randomized survey experiment (N = 25,982) to estimate the persuasive capability of each model. Our findings are twofold. First, we find evidence of a log scaling law: model persuasiveness is characterized by sharply diminishing returns, such that current frontier models are barely more persuasive than models smaller in size by an order of magnitude or more. Second, mere task completion (coherence, staying on topic) appears to account for larger models' persuasive advantage. These findings suggest that further scaling model size will not much increase the persuasiveness of static LLM-generated messages.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
Authors:
Jerrin Bright,
Bavesh Balaji,
Yuhao Chen,
David A Clausi,
John S Zelek
Abstract:
In the high-stakes world of baseball, every nuance of a pitcher's mechanics holds the key to maximizing performance and minimizing runs. Traditional analysis methods often rely on pre-recorded offline numerical data, hindering their application in the dynamic environment of live games. Broadcast video analysis, while seemingly ideal, faces significant challenges due to factors like motion blur and…
▽ More
In the high-stakes world of baseball, every nuance of a pitcher's mechanics holds the key to maximizing performance and minimizing runs. Traditional analysis methods often rely on pre-recorded offline numerical data, hindering their application in the dynamic environment of live games. Broadcast video analysis, while seemingly ideal, faces significant challenges due to factors like motion blur and low resolution. To address these challenges, we introduce PitcherNet, an end-to-end automated system that analyzes pitcher kinematics directly from live broadcast video, thereby extracting valuable pitch statistics including velocity, release point, pitch position, and release extension. This system leverages three key components: (1) Player tracking and identification by decoupling actions from player kinematics; (2) Distribution and depth-aware 3D human modeling; and (3) Kinematic-driven pitch statistics. Experimental validation demonstrates that PitcherNet achieves robust analysis results with 96.82% accuracy in pitcher tracklet identification, reduced joint position error by 1.8mm and superior analytics compared to baseline methods. By enabling performance-critical kinematic analysis from broadcast video, PitcherNet paves the way for the future of baseball analytics by optimizing pitching strategies, preventing injuries, and unlocking a deeper understanding of pitcher mechanics, forever transforming the game.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Women are less comfortable expressing opinions online than men and report heightened fears for safety: Surveying gender differences in experiences of online harms
Authors:
Francesca Stevens,
Florence E. Enock,
Tvesha Sippy,
Jonathan Bright,
Miranda Cross,
Pica Johansson,
Judy Wajcman,
Helen Z. Margetts
Abstract:
Online harms, such as hate speech, trolling and self-harm promotion, continue to be widespread. While some work suggests women are disproportionately affected, other studies find mixed evidence for gender differences in experiences with content of this kind. Using a nationally representative survey of UK adults (N=1992), we examine exposure to a variety of harms, fears surrounding being targeted,…
▽ More
Online harms, such as hate speech, trolling and self-harm promotion, continue to be widespread. While some work suggests women are disproportionately affected, other studies find mixed evidence for gender differences in experiences with content of this kind. Using a nationally representative survey of UK adults (N=1992), we examine exposure to a variety of harms, fears surrounding being targeted, the psychological impact of online experiences, the use of safety tools to protect against harm, and comfort with various forms of online participation across men and women. We find that while men and women see harmful content online to a roughly similar extent, women are more at risk than men of being targeted by harms including online misogyny, cyberstalking and cyberflashing. Women are significantly more fearful of being targeted by harms overall, and report greater negative psychological impact as a result of particular experiences. Perhaps in an attempt to mitigate risk, women report higher use of a range of safety tools and less comfort with several forms of online participation, with just 23% of women comfortable expressing political views online compared to 40% of men. We also find direct associations between fears surrounding harms and comfort with online behaviours. For example, fear of being trolled significantly decreases comfort expressing opinions, and fear of being targeted by misogyny significantly decreases comfort sharing photos. Our results are important because with much public discourse happening online, we must ensure all members of society feel safe and able to participate in online spaces.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
AI for bureaucratic productivity: Measuring the potential of AI to help automate 143 million UK government transactions
Authors:
Vincent J. Straub,
Youmna Hashem,
Jonathan Bright,
Satyam Bhagwanani,
Deborah Morgan,
John Francis,
Saba Esnaashari,
Helen Margetts
Abstract:
There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK centra…
▽ More
There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK central government, and measuring their potential for AI-driven automation. We estimate that UK central government conducts approximately one billion citizen-facing transactions per year in the provision of around 400 services, of which approximately 143 million are complex repetitive transactions. We estimate that 84% of these complex transactions are highly automatable, representing a huge potential opportunity: saving even an average of just one minute per complex transaction would save the equivalent of approximately 1,200 person-years of work every year. We also develop a model to estimate the volume of transactions a government service undertakes, providing a way for government to avoid conducting time consuming transaction volume measurements. Finally, we find that there is high turnover in the types of services government provide, meaning that automation efforts should focus on general procedures rather than services themselves which are likely to evolve over time. Overall, our work presents a novel perspective on the structure and functioning of modern government, and how it might evolve in the age of artificial intelligence.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Domain-Guided Masked Autoencoders for Unique Player Identification
Authors:
Bavesh Balaji,
Jerrin Bright,
Sirisha Rambhatla,
Yuhao Chen,
Alexander Wong,
John Zelek,
David A Clausi
Abstract:
Unique player identification is a fundamental module in vision-driven sports analytics. Identifying players from broadcast videos can aid with various downstream tasks such as player assessment, in-game analysis, and broadcast production. However, automatic detection of jersey numbers using deep features is challenging primarily due to: a) motion blur, b) low resolution video feed, and c) occlusio…
▽ More
Unique player identification is a fundamental module in vision-driven sports analytics. Identifying players from broadcast videos can aid with various downstream tasks such as player assessment, in-game analysis, and broadcast production. However, automatic detection of jersey numbers using deep features is challenging primarily due to: a) motion blur, b) low resolution video feed, and c) occlusions. With their recent success in various vision tasks, masked autoencoders (MAEs) have emerged as a superior alternative to conventional feature extractors. However, most MAEs simply zero-out image patches either randomly or focus on where to mask rather than how to mask. Motivated by human vision, we devise a novel domain-guided masking policy for MAEs termed d-MAE to facilitate robust feature extraction in the presence of motion blur for player identification. We further introduce a new spatio-temporal network leveraging our novel d-MAE for unique player identification. We conduct experiments on three large-scale sports datasets, including a curated baseball dataset, the SoccerNet dataset, and an in-house ice hockey dataset. We preprocess the datasets using an upgraded keyframe identification (KfID) module by focusing on frames containing jersey numbers. Additionally, we propose a keyframe-fusion technique to augment keyframes, preserving spatial and temporal context. Our spatio-temporal network showcases significant improvements, surpassing the current state-of-the-art by 8.58%, 4.29%, and 1.20% in the test set accuracies, respectively. Rigorous ablations highlight the effectiveness of our domain-guided masking approach and the refined KfID module, resulting in performance enhancements of 1.48% and 1.84% respectively, compared to original architectures.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery
Authors:
Jerrin Bright,
Bavesh Balaji,
Harish Prakash,
Yuhao Chen,
David A Clausi,
John Zelek
Abstract:
Precise Human Mesh Recovery (HMR) with in-the-wild data is a formidable challenge and is often hindered by depth ambiguities and reduced precision. Existing works resort to either pose priors or multi-modal data such as multi-view or point cloud information, though their methods often overlook the valuable scene-depth information inherently present in a single image. Moreover, achieving robust HMR…
▽ More
Precise Human Mesh Recovery (HMR) with in-the-wild data is a formidable challenge and is often hindered by depth ambiguities and reduced precision. Existing works resort to either pose priors or multi-modal data such as multi-view or point cloud information, though their methods often overlook the valuable scene-depth information inherently present in a single image. Moreover, achieving robust HMR for out-of-distribution (OOD) data is exceedingly challenging due to inherent variations in pose, shape and depth. Consequently, understanding the underlying distribution becomes a vital subproblem in modeling human forms. Motivated by the need for unambiguous and robust human modeling, we introduce Distribution and depth-aware human mesh recovery (D2A-HMR), an end-to-end transformer architecture meticulously designed to minimize the disparity between distributions and incorporate scene-depth leveraging prior depth information. Our approach demonstrates superior performance in handling OOD data in certain scenarios while consistently achieving competitive results against state-of-the-art HMR methods on controlled datasets.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Who is driving the conversation? Analysing the nodality of British MPs and journalists on Twitter
Authors:
Leonardo Castro-Gonzalez,
Sukankana Chakraborty,
Helen Margetts,
Hardik Rajpal,
Daniele Guariso,
Jonathan Bright
Abstract:
Who sets the policy agenda? In this paper, we explore the roles of policy actors in agenda setting by studying their relative influence in policy-related discussions. Our approach builds on ``nodality'' \textemdash a concept in political science that determines the capacity of an actor to share information and to be at the centre of information networks. We propose a novel methodology that quantif…
▽ More
Who sets the policy agenda? In this paper, we explore the roles of policy actors in agenda setting by studying their relative influence in policy-related discussions. Our approach builds on ``nodality'' \textemdash a concept in political science that determines the capacity of an actor to share information and to be at the centre of information networks. We propose a novel methodology that quantifies the nodality of all individual actors in any conversation by analysing a comprehensive set of their centrality measures in the related information network. We combine this with the analysis of the activity time-series, of the related conversation (or topic), to demonstrate how nodality scores relate to the capacity to drive topic-related activity. Here we analyse policy-related discussions on X (previously Twitter) and quantify the nodality of two sets of actors in the UK political system \textemdash Members of Parliament (MPs) and accredited journalists - on four policy topics: The Russia-Ukraine War, the Cost-of-Living Crisis, Brexit and COVID-19. Our results show that the capacity to influence the activity related to a topic is significantly and positively associated with nodality. In particular, we identify two dimensions of nodality that drive the capacity to influence topic-related activity. The first is ``active nodality", which reflects the level of topic-related engagement an individual actor has on the platform. The second dimension is ``inherent nodality" which is entirely independent of the platform and reflects the actor's institutional position (such as an MP in a front-bench role, or a journalist's position at a prominent media outlet).
△ Less
Submitted 12 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Exploring responsible applications of Synthetic Data to advance Online Safety Research and Development
Authors:
Pica Johansson,
Jonathan Bright,
Shyam Krishna,
Claudia Fischer,
David Leslie
Abstract:
The use of synthetic data provides an opportunity to accelerate online safety research and development efforts while showing potential for bias mitigation, facilitating data storage and sharing, preserving privacy and reducing exposure to harmful content. However, the responsible use of synthetic data requires caution regarding anticipated risks and challenges. This short report explores the poten…
▽ More
The use of synthetic data provides an opportunity to accelerate online safety research and development efforts while showing potential for bias mitigation, facilitating data storage and sharing, preserving privacy and reducing exposure to harmful content. However, the responsible use of synthetic data requires caution regarding anticipated risks and challenges. This short report explores the potential applications of synthetic data to the domain of online safety, and addresses the ethical challenges that effective use of the technology may present.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Understanding gender differences in experiences and concerns surrounding online harms: A short report on a nationally representative survey of UK adults
Authors:
Florence E. Enock,
Francesca Stevens,
Jonathan Bright,
Miranda Cross,
Pica Johansson,
Judy Wajcman,
Helen Z. Margetts
Abstract:
Online harms, such as hate speech, misinformation, harassment and self-harm promotion, continue to be widespread. While some work suggests that women are disproportionately affected by such harms, other studies find little evidence for gender differences in overall exposure. Here, we present preliminary results from a large, nationally representative survey of UK adults (N = 2000). We asked about…
▽ More
Online harms, such as hate speech, misinformation, harassment and self-harm promotion, continue to be widespread. While some work suggests that women are disproportionately affected by such harms, other studies find little evidence for gender differences in overall exposure. Here, we present preliminary results from a large, nationally representative survey of UK adults (N = 2000). We asked about exposure to 15 specific harms, along with fears surrounding exposure and comfort engaging in certain online behaviours. While men and women report seeing online harms to a roughly equal extent overall, we find that women are significantly more fearful of experiencing every type of harm that we asked about, and are significantly less comfortable partaking in several online behaviours. Strikingly, just 24% of women report being comfortable expressing political opinions online compared with almost 40% of men, with similar overall proportions for challenging certain content. Our work suggests that women may suffer an additional psychological burden in response to the proliferation of harmful online content, doing more 'safety work' to protect themselves. With much public discourse happening online, gender inequality in public voice is likely to be perpetuated if women feel too fearful to participate. Our results are important because to establish greater equality in society, we must take measures to ensure all members feel safe and able to participate in the online space.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data
Authors:
Leonardo Castro-Gonzalez,
Yi-Ling Chung,
Hannak Rose Kirk,
John Francis,
Angus R. Williams,
Pica Johansson,
Jonathan Bright
Abstract:
The field of machine learning has recently made significant progress in reducing the requirements for labelled training data when building new models. These `cheaper' learning techniques hold significant potential for the social sciences, where development of large labelled training datasets is often a significant practical impediment to the use of machine learning for analytical tasks. In this ar…
▽ More
The field of machine learning has recently made significant progress in reducing the requirements for labelled training data when building new models. These `cheaper' learning techniques hold significant potential for the social sciences, where development of large labelled training datasets is often a significant practical impediment to the use of machine learning for analytical tasks. In this article we review three `cheap' techniques that have developed in recent years: weak supervision, transfer learning and prompt engineering. For the latter, we also review the particular case of zero-shot prompting of large language models. For each technique we provide a guide of how it works and demonstrate its application across six different realistic social science applications (two different tasks paired with three different dataset makeups). We show good performance for all techniques, and in particular we demonstrate how prompting of large language models can achieve high accuracy at very low cost. Our results are accompanied by a code repository to make it easy for others to duplicate our work and use it in their own research. Overall, our article is intended to stimulate further uptake of these techniques in the social sciences.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Understanding engagement with platform safety technology for reducing exposure to online harms
Authors:
Jonathan Bright,
Florence E. Enock,
Pica Johansson,
Helen Z. Margetts,
Francesca Stevens
Abstract:
User facing 'platform safety technology' encompasses an array of tools offered by platforms to help people protect themselves from harm, for example allowing people to report content and unfollow or block other users. These tools are an increasingly important part of online safety: in the UK, legislation has made it a requirement for large platforms to offer them. However, little is known about us…
▽ More
User facing 'platform safety technology' encompasses an array of tools offered by platforms to help people protect themselves from harm, for example allowing people to report content and unfollow or block other users. These tools are an increasingly important part of online safety: in the UK, legislation has made it a requirement for large platforms to offer them. However, little is known about user engagement with such tools. We present findings from a nationally representative survey of UK adults covering their awareness of and experiences with seven common safety technologies. We show that experience of online harms is widespread, with 67% of people having seen what they perceived as harmful content online; 26% of people have also had at least one piece of content removed by content moderation. Use of safety technologies is also high, with more than 80\% of people having used at least one. Awareness of specific tools is varied, with people more likely to be aware of 'post-hoc' safety tools, such as reporting, than preventative measures. However, satisfaction with safety technologies is generally low. People who have previously seen online harms are more likely to use safety tools, implying a 'learning the hard way' route to engagement. Those higher in digital literacy are also more likely to use some of these tools, raising concerns about the accessibility of these technologies to all users. Additionally, women are more likely to engage in particular types of online 'safety work'. We discuss the implications of our results for those seeking a safer online environment.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Generative AI is already widespread in the public sector
Authors:
Jonathan Bright,
Florence E. Enock,
Saba Esnaashari,
John Francis,
Youmna Hashem,
Deborah Morgan
Abstract:
Generative AI has the potential to transform how public services are delivered by enhancing productivity and reducing time spent on bureaucracy. Furthermore, unlike other types of artificial intelligence, it is a technology that has quickly become widely available for bottom-up adoption: essentially anyone can decide to make use of it in their day to day work. But to what extent is generative AI a…
▽ More
Generative AI has the potential to transform how public services are delivered by enhancing productivity and reducing time spent on bureaucracy. Furthermore, unlike other types of artificial intelligence, it is a technology that has quickly become widely available for bottom-up adoption: essentially anyone can decide to make use of it in their day to day work. But to what extent is generative AI already in use in the public sector? Our survey of 938 public service professionals within the UK (covering education, health, social work and emergency services) seeks to answer this question. We find that use of generative AI systems is already widespread: 45% of respondents were aware of generative AI usage within their area of work, while 22% actively use a generative AI system. Public sector professionals were positive about both current use of the technology and its potential to enhance their efficiency and reduce bureaucratic workload in the future. For example, those working in the NHS thought that time spent on bureaucracy could drop from 50% to 30% if generative AI was properly exploited, an equivalent of one day per week (an enormous potential impact). Our survey also found a high amount of trust (61%) around generative AI outputs, and a low fear of replacement (16%). While respondents were optimistic overall, areas of concern included feeling like the UK is missing out on opportunities to use AI to improve public services (76%), and only a minority of respondents (32%) felt like there was clear guidance on generative AI usage in their workplaces. In other words, it is clear that generative AI is already transforming the public sector, but uptake is happening in a disorganised fashion without clear guidelines. The UK's public sector urgently needs to develop more systematic methods for taking advantage of the technology.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Approaches to the Algorithmic Allocation of Public Resources: A Cross-disciplinary Review
Authors:
Saba Esnaashari,
Jonathan Bright,
John Francis,
Youmna Hashem,
Vincent Straub,
Deborah Morgan
Abstract:
Allocation of scarce resources is a recurring challenge for the public sector: something that emerges in areas as diverse as healthcare, disaster recovery, and social welfare. The complexity of these policy domains and the need for meeting multiple and sometimes conflicting criteria has led to increased focus on the use of algorithms in this type of decision. However, little engagement between res…
▽ More
Allocation of scarce resources is a recurring challenge for the public sector: something that emerges in areas as diverse as healthcare, disaster recovery, and social welfare. The complexity of these policy domains and the need for meeting multiple and sometimes conflicting criteria has led to increased focus on the use of algorithms in this type of decision. However, little engagement between researchers across these domains has happened, meaning a lack of understanding of common problems and techniques for approaching them. Here, we performed a cross disciplinary literature review to understand approaches taken for different areas of algorithmic allocation including healthcare, organ transplantation, homelessness, disaster relief, and welfare. We initially identified 1070 papers by searching the literature, then six researchers went through them in two phases of screening resulting in 176 and 75 relevant papers respectively. We then analyzed the 75 papers from the lenses of optimization goals, techniques, interpretability, flexibility, bias, ethical considerations, and performance. We categorized approaches into human-oriented versus resource-oriented perspective, and individual versus aggregate and identified that 76% of the papers approached the problem from a human perspective and 60% from an aggregate level using optimization techniques. We found considerable potential for performance gains, with optimization techniques often decreasing waiting times and increasing success rate by as much as 50%. However, there was a lack of attention to responsible innovation: only around one third of the papers considered ethical issues in choosing the optimization goals while just a very few of them paid attention to the bias issues. Our work can serve as a guide for policy makers and researchers wanting to use an algorithm for addressing a resource allocation problem.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Jersey Number Recognition using Keyframe Identification from Low-Resolution Broadcast Videos
Authors:
Bavesh Balaji,
Jerrin Bright,
Harish Prakash,
Yuhao Chen,
David A Clausi,
John Zelek
Abstract:
Player identification is a crucial component in vision-driven soccer analytics, enabling various downstream tasks such as player assessment, in-game analysis, and broadcast production. However, automatically detecting jersey numbers from player tracklets in videos presents challenges due to motion blur, low resolution, distortions, and occlusions. Existing methods, utilizing Spatial Transformer Ne…
▽ More
Player identification is a crucial component in vision-driven soccer analytics, enabling various downstream tasks such as player assessment, in-game analysis, and broadcast production. However, automatically detecting jersey numbers from player tracklets in videos presents challenges due to motion blur, low resolution, distortions, and occlusions. Existing methods, utilizing Spatial Transformer Networks, CNNs, and Vision Transformers, have shown success in image data but struggle with real-world video data, where jersey numbers are not visible in most of the frames. Hence, identifying frames that contain the jersey number is a key sub-problem to tackle. To address these issues, we propose a robust keyframe identification module that extracts frames containing essential high-level information about the jersey number. A spatio-temporal network is then employed to model spatial and temporal context and predict the probabilities of jersey numbers in the video. Additionally, we adopt a multi-task loss function to predict the probability distribution of each digit separately. Extensive evaluations on the SoccerNet dataset demonstrate that incorporating our proposed keyframe identification module results in a significant 37.81% and 37.70% increase in the accuracies of 2 different test sets with domain gaps. These results highlight the effectiveness and importance of our approach in tackling the challenges of automatic jersey number detection in sports videos.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis
Authors:
Jerrin Bright,
Yuhao Chen,
John Zelek
Abstract:
Using videos to analyze pitchers in baseball can play a vital role in strategizing and injury prevention. Computer vision-based pose analysis offers a time-efficient and cost-effective approach. However, the use of accessible broadcast videos, with a 30fps framerate, often results in partial body motion blur during fast actions, limiting the performance of existing pose keypoint estimation models.…
▽ More
Using videos to analyze pitchers in baseball can play a vital role in strategizing and injury prevention. Computer vision-based pose analysis offers a time-efficient and cost-effective approach. However, the use of accessible broadcast videos, with a 30fps framerate, often results in partial body motion blur during fast actions, limiting the performance of existing pose keypoint estimation models. Previous works have primarily relied on fixed backgrounds, assuming minimal motion differences between frames, or utilized multiview data to address this problem. To this end, we propose a synthetic data augmentation pipeline to enhance the model's capability to deal with the pitcher's blurry actions. In addition, we leverage in-the-wild videos to make our model robust under different real-world conditions and camera positions. By carefully optimizing the augmentation parameters, we observed a notable reduction in the loss by 54.2% and 36.2% on the test dataset for 2D and 3D pose estimation respectively. By applying our approach to existing state-of-the-art pose estimators, we demonstrate an average improvement of 29.2%. The findings highlight the effectiveness of our method in mitigating the challenges posed by motion blur, thereby enhancing the overall quality of pose estimation.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Authors:
Angus R. Williams,
Hannah Rose Kirk,
Liam Burke,
Yi-Ling Chung,
Ivan Debono,
Pica Johansson,
Francesca Stevens,
Jonathan Bright,
Scott A. Hale
Abstract:
Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of…
▽ More
Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of cross-group text classification in order to understand how well classifiers trained on one domain or demographic can transfer to others, with a view to building more generalisable abuse classifiers. We fine-tune language models to classify tweets targeted at public figures across DOmains (sport and politics) and DemOgraphics (women and men) using our novel DODO dataset, containing 28,000 labelled entries, split equally across four domain-demographic pairs. We find that (i) small amounts of diverse data are hugely beneficial to generalisation and model adaptation; (ii) models transfer more easily across demographics but models trained on cross-domain data are more generalisable; (iii) some groups contribute more to generalisability than others; and (iv) dataset similarity is a signal of transferability.
△ Less
Submitted 25 April, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
Understanding Counterspeech for Online Harm Mitigation
Authors:
Yi-Ling Chung,
Gavin Abercrombie,
Florence Enock,
Jonathan Bright,
Verena Rieser
Abstract:
Counterspeech offers direct rebuttals to hateful speech by challenging perpetrators of hate and showing support to targets of abuse. It provides a promising alternative to more contentious measures, such as content moderation and deplatforming, by contributing a greater amount of positive online speech rather than attempting to mitigate harmful content through removal. Advances in the development…
▽ More
Counterspeech offers direct rebuttals to hateful speech by challenging perpetrators of hate and showing support to targets of abuse. It provides a promising alternative to more contentious measures, such as content moderation and deplatforming, by contributing a greater amount of positive online speech rather than attempting to mitigate harmful content through removal. Advances in the development of large language models mean that the process of producing counterspeech could be made more efficient by automating its generation, which would enable large-scale online campaigns. However, we currently lack a systematic understanding of several important factors relating to the efficacy of counterspeech for hate mitigation, such as which types of counterspeech are most effective, what are the optimal conditions for implementation, and which specific effects of hate it can best ameliorate. This paper aims to fill this gap by systematically reviewing counterspeech research in the social sciences and comparing methodologies and findings with computer science efforts in automatic counterspeech generation. By taking this multi-disciplinary view, we identify promising future directions in both fields.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
'Team-in-the-loop': Ostrom's IAD framework 'rules in use' to map and measure contextual impacts of AI
Authors:
Deborah Morgan,
Youmna Hashem,
John Francis,
Saba Esnaashari,
Vincent J. Straub,
Jonathan Bright
Abstract:
This article explores how the 'rules in use' from Ostrom's Institutional Analysis and Development Framework (IAD) can be developed as a context analysis approach for AI. AI risk assessment frameworks increasingly highlight the need to understand existing contexts. However, these approaches do not frequently connect with established institutional analysis scholarship. We outline a novel direction i…
▽ More
This article explores how the 'rules in use' from Ostrom's Institutional Analysis and Development Framework (IAD) can be developed as a context analysis approach for AI. AI risk assessment frameworks increasingly highlight the need to understand existing contexts. However, these approaches do not frequently connect with established institutional analysis scholarship. We outline a novel direction illustrated through a high-level example to understand how clinical oversight is potentially impacted by AI. Much current thinking regarding oversight for AI revolves around the idea of decision makers being in-the-loop and, thus, having capacity to intervene to prevent harm. However, our analysis finds that oversight is complex, frequently made by teams of professionals and relies upon explanation to elicit information. Professional bodies and liability also function as institutions of polycentric oversight. These are all impacted by the challenge of oversight of AI systems. The approach outlined has potential utility as a policy tool of context analysis aligned with the 'Govern and Map' functions of the National Institute of Standards and Technology (NIST) AI Risk Management Framework; however, further empirical research is needed. Our analysis illustrates the benefit of existing institutional analysis approaches in foregrounding team structures within oversight and, thus, in conceptions of 'human in the loop'.
△ Less
Submitted 30 June, 2024; v1 submitted 24 March, 2023;
originally announced March 2023.
-
A multidomain relational framework to guide institutional AI research and adoption
Authors:
Vincent J. Straub,
Deborah Morgan,
Youmna Hashem,
John Francis,
Saba Esnaashari,
Jonathan Bright
Abstract:
Calls for new metrics, technical standards and governance mechanisms to guide the adoption of Artificial Intelligence (AI) in institutions and public administration are now commonplace. Yet, most research and policy efforts aimed at understanding the implications of adopting AI tend to prioritize only a handful of ideas; they do not fully connect all the different perspectives and topics that are…
▽ More
Calls for new metrics, technical standards and governance mechanisms to guide the adoption of Artificial Intelligence (AI) in institutions and public administration are now commonplace. Yet, most research and policy efforts aimed at understanding the implications of adopting AI tend to prioritize only a handful of ideas; they do not fully connect all the different perspectives and topics that are potentially relevant. In this position paper, we contend that this omission stems, in part, from what we call the relational problem in socio-technical discourse: fundamental ontological issues have not yet been settled--including semantic ambiguity, a lack of clear relations between concepts and differing standard terminologies. This contributes to the persistence of disparate modes of reasoning to assess institutional AI systems, and the prevalence of conceptual isolation in the fields that study them including ML, human factors, social science and policy. After developing this critique, we offer a way forward by proposing a simple policy and research design tool in the form of a conceptual framework to organize terms across fields--consisting of three horizontal domains for grouping relevant concepts and related methods: Operational, Epistemic, and Normative. We first situate this framework against the backdrop of recent socio-technical discourse at two premier academic venues, AIES and FAccT, before illustrating how developing suitable metrics, standards, and mechanisms can be aided by operationalizing relevant concepts in each of these domains. Finally, we outline outstanding questions for developing this relational approach to institutional AI research and adoption.
△ Less
Submitted 17 July, 2023; v1 submitted 17 March, 2023;
originally announced March 2023.
-
How can we combat online misinformation? A systematic overview of current interventions and their efficacy
Authors:
Pica Johansson,
Florence Enock,
Scott Hale,
Bertie Vidgen,
Cassidy Bereskin,
Helen Margetts,
Jonathan Bright
Abstract:
The spread of misinformation is a pressing global problem that has elicited a range of responses from researchers, policymakers, civil society and industry. Over the past decade, these stakeholders have developed many interventions to tackle misinformation that vary across factors such as which effects of misinformation they hope to target, at what stage in the misinformation lifecycle they are ai…
▽ More
The spread of misinformation is a pressing global problem that has elicited a range of responses from researchers, policymakers, civil society and industry. Over the past decade, these stakeholders have developed many interventions to tackle misinformation that vary across factors such as which effects of misinformation they hope to target, at what stage in the misinformation lifecycle they are aimed at, and who they are implemented by. These interventions also differ in how effective they are at reducing susceptibility to (and curbing the spread of) misinformation. In recent years, a vast amount of scholarly work on misinformation has become available, which extends across multiple disciplines and methodologies. It has become increasingly difficult to comprehensively map all of the available interventions, assess their efficacy, and understand the challenges, opportunities and tradeoffs associated with using them. Few papers have systematically assessed and compared the various interventions, which has led to a lack of understanding in civic and policymaking discourses. With this in mind, we develop a new hierarchical framework for understanding interventions against misinformation online. The framework comprises three key elements: Interventions that Prepare people to be less susceptible; Interventions that Curb the spread and effects of misinformation; and Interventions that Respond to misinformation. We outline how different interventions are thought to work, categorise them, and summarise the available evidence on their efficacy; offering researchers, policymakers and practitioners working to combat online misinformation both an analytical framework that they can use to understand and evaluate different interventions (and which could be extended to address new interventions that we do not describe here) and a summary of the range of interventions that have been proposed to date.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Artificial intelligence in government: Concepts, standards, and a unified framework
Authors:
Vincent J. Straub,
Deborah Morgan,
Jonathan Bright,
Helen Margetts
Abstract:
Recent advances in artificial intelligence (AI), especially in generative language modelling, hold the promise of transforming government. Given the advanced capabilities of new AI systems, it is critical that these are embedded using standard operational procedures, clear epistemic criteria, and behave in alignment with the normative expectations of society. Scholars in multiple domains have subs…
▽ More
Recent advances in artificial intelligence (AI), especially in generative language modelling, hold the promise of transforming government. Given the advanced capabilities of new AI systems, it is critical that these are embedded using standard operational procedures, clear epistemic criteria, and behave in alignment with the normative expectations of society. Scholars in multiple domains have subsequently begun to conceptualize the different forms that AI applications may take, highlighting both their potential benefits and pitfalls. However, the literature remains fragmented, with researchers in social science disciplines like public administration and political science, and the fast-moving fields of AI, ML, and robotics, all developing concepts in relative isolation. Although there are calls to formalize the emerging study of AI in government, a balanced account that captures the full depth of theoretical perspectives needed to understand the consequences of embedding AI into a public sector context is lacking. Here, we unify efforts across social and technical disciplines by first conducting an integrative literature review to identify and cluster 69 key terms that frequently co-occur in the multidisciplinary study of AI. We then build on the results of this bibliometric analysis to propose three new multifaceted concepts for understanding and analysing AI-based systems for government (AI-GOV) in a more unified way: (1) operational fitness, (2) epistemic alignment, and (3) normative divergence. Finally, we put these concepts to work by using them as dimensions in a conceptual typology of AI-GOV and connecting each with emerging AI technical measurement standards to encourage operationalization, foster cross-disciplinary dialogue, and stimulate debate among those aiming to rethink government with AI.
△ Less
Submitted 25 October, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Regression test of various versions of STRmix
Authors:
Jo-Anne Bright,
Judi Morawitz,
Duncan Taylor,
John Buckleton
Abstract:
STRmix has been in operational use since 2012 for the interpretation of forensic DNA profiles. During that time incremental improvements have been made to the modelling of variance for single and composite peaks, and drop-in. The central element of the algorithm has remained as the Markov chain Monte Carlo (MCMC) based on the Metropolis-Hastings algorithm. Regression experiments are reported compa…
▽ More
STRmix has been in operational use since 2012 for the interpretation of forensic DNA profiles. During that time incremental improvements have been made to the modelling of variance for single and composite peaks, and drop-in. The central element of the algorithm has remained as the Markov chain Monte Carlo (MCMC) based on the Metropolis-Hastings algorithm. Regression experiments are reported comparing versions V2.3 to V2.9. There is a high degree of similarity in LRs for true contributors across multiple versions of STRmix that span 7 years of development (from V2.3 to V2.9). This is due to the fact that the underlying core models of STRmix that are used to describe DNA profile behaviour have remained stable across that time.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
Probabilistic genotyping code review and testing
Authors:
John Buckleton,
Jo-Anne Bright,
Kevin Cheng,
Duncan Taylor
Abstract:
We discuss a range of miscodes found in probabilistic genotyping (PG) software and from other industries that have been reported in the literature and have been used to inform PG admissibility hearings. Every instance of the discovery of a miscode in PG software with which we have been associated has occurred either because of testing, use, or repeat calculation of results either by us or other us…
▽ More
We discuss a range of miscodes found in probabilistic genotyping (PG) software and from other industries that have been reported in the literature and have been used to inform PG admissibility hearings. Every instance of the discovery of a miscode in PG software with which we have been associated has occurred either because of testing, use, or repeat calculation of results either by us or other users. In all cases found during testing or use something has drawn attention to an anomalous result. Intelligent investigation has led to the examination of a small section of the code and detection of the miscode. Previously, three instances from other industries quoted by the Electronic Frontier Foundation Amicus brief as part of a PG admissibility hearing (atmospheric ozone, NIMIS, and VW) and two previous examples raised in relation to PG admissibility (Kerberos and Therac-25) were presented as examples of miscodes and how an extensive code review could have resolved these situations. However, we discuss how these miscodes might not have been discovered through code review alone. These miscodes could only have been detected through use of the software or through testing. Once the symptoms of the miscode(s) have been detected, a code review serves as a beneficial approach to try and diagnose to the issue.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism
Authors:
Jerrin Bright,
Suryaprakash Rajkumar,
Arockia Selvakumar Arockia Doss
Abstract:
Convolutional Neural Networks need the construction of informative features, which are determined by channel-wise and spatial-wise information at the network's layers. In this research, we focus on bringing in a novel solution that uses sophisticated optimization for enhancing both the spatial and channel components inside each layer's receptive field. Capsule Networks were used to understand the…
▽ More
Convolutional Neural Networks need the construction of informative features, which are determined by channel-wise and spatial-wise information at the network's layers. In this research, we focus on bringing in a novel solution that uses sophisticated optimization for enhancing both the spatial and channel components inside each layer's receptive field. Capsule Networks were used to understand the spatial association between features in the feature map. Standalone capsule networks have shown good results on comparatively simple datasets than on complex datasets as a result of the inordinate amount of feature information. Thus, to tackle this issue, we have proposed ME-CapsNet by introducing deeper convolutional layers to extract important features before passing through modules of capsule layers strategically to improve the performance of the network significantly. The deeper convolutional layer includes blocks of Squeeze-Excitation networks which use a stochastic sampling approach for progressively reducing the spatial size thereby dynamically recalibrating the channels by reconstructing their interdependencies without much loss of important feature information. Extensive experimentation was done using commonly used datasets demonstrating the efficiency of the proposed ME-CapsNet, which clearly outperforms various research works by achieving higher accuracy with minimal model complexity in complex datasets.
△ Less
Submitted 31 March, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
How do climate change skeptics engage with opposing views? Understanding mechanisms of social identity and cognitive dissonance in an online forum
Authors:
Lisa Oswald,
Jonathan Bright
Abstract:
Does engagement with opposing views help break down ideological `echo chambers'; or does it backfire and reinforce them? This question remains critical as academics, policymakers and activists grapple with the question of how to regulate political discussion on social media. In this study, we contribute to the debate by examining the impact of opposing views within a major climate change skeptic o…
▽ More
Does engagement with opposing views help break down ideological `echo chambers'; or does it backfire and reinforce them? This question remains critical as academics, policymakers and activists grapple with the question of how to regulate political discussion on social media. In this study, we contribute to the debate by examining the impact of opposing views within a major climate change skeptic online community on Reddit. A large sample of posts (N = 3000) was manually coded as either dissonant or consonant which allowed the automated classification of the full dataset of more than 50,000 posts, with codes inferred from linked websites. We find that ideologically dissonant submissions act as a stimulant to activity in the community: they received more attention (comments) than consonant submissions, even though they received lower scores through up-voting and down-voting. Users who engaged with dissonant submissions were also more likely to return to the forum. Consistent with identity theory, confrontation with opposing views triggered activity in the forum, particularly among users that are highly engaged with the community. In light of the findings, theory of social identity and echo chambers is discussed and enhanced.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
Echo Chambers Exist! (But They're Full of Opposing Views)
Authors:
Jonathan Bright,
Nahema Marchal,
Bharath Ganesh,
Stevan Rudinac
Abstract:
The theory of echo chambers, which suggests that online political discussions take place in conditions of ideological homogeneity, has recently gained popularity as an explanation for patterns of political polarization and radicalization observed in many democratic countries. However, while micro-level experimental work has shown evidence that individuals may gravitate towards information that sup…
▽ More
The theory of echo chambers, which suggests that online political discussions take place in conditions of ideological homogeneity, has recently gained popularity as an explanation for patterns of political polarization and radicalization observed in many democratic countries. However, while micro-level experimental work has shown evidence that individuals may gravitate towards information that supports their beliefs, recent macro-level studies have cast doubt on whether this tendency generates echo chambers in practice, instead suggesting that cross-cutting exposures are a common feature of digital life. In this article, we offer an explanation for these diverging results. Building on cognitive dissonance theory, and making use of observational trace data taken from an online white nationalist website, we explore how individuals in an ideological 'echo chamber' engage with opposing viewpoints. We show that this type of exposure, far from being detrimental to radical online discussions, is actually a core feature of such spaces that encourages people to stay engaged. The most common 'echoes' in this echo chamber are in fact the sound of opposing viewpoints being undermined and marginalized. Hence echo chambers exist not only in spite of but thanks to the unifying presence of oppositional viewpoints. We conclude with reflections on policy implications of our study for those seeking to promote a more moderate political internet.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.
-
SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python
Authors:
Pauli Virtanen,
Ralf Gommers,
Travis E. Oliphant,
Matt Haberland,
Tyler Reddy,
David Cournapeau,
Evgeni Burovski,
Pearu Peterson,
Warren Weckesser,
Jonathan Bright,
Stéfan J. van der Walt,
Matthew Brett,
Joshua Wilson,
K. Jarrod Millman,
Nikolay Mayorov,
Andrew R. J. Nelson,
Eric Jones,
Robert Kern,
Eric Larson,
CJ Carey,
İlhan Polat,
Yu Feng,
Eric W. Moore,
Jake VanderPlas,
Denis Laxalde
, et al. (10 additional authors not shown)
Abstract:
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent reposit…
▽ More
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
Estimating Traffic Disruption Patterns with Volunteered Geographic Information
Authors:
Chico Q. Camargo,
Jonathan Bright,
Graham McNeill,
Sridhar Raman,
Scott A. Hale
Abstract:
Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers. Road networks are increasingly congested, yet traffic data is often expensive to obtain, making informed policy-making harder. This paper explores the extent to which traffic disruption can be estimated from static features from the volunteered geographic information site OpenStreetMap (OSM). We use O…
▽ More
Accurate understanding and forecasting of traffic is a key contemporary problem for policymakers. Road networks are increasingly congested, yet traffic data is often expensive to obtain, making informed policy-making harder. This paper explores the extent to which traffic disruption can be estimated from static features from the volunteered geographic information site OpenStreetMap (OSM). We use OSM features as predictors for linear regressions of counts of traffic disruptions and traffic volume at 6,500 points in the road network within 112 regions of Oxfordshire, UK. We show that more than half the variation in traffic volume and disruptions can be explained with static features alone, and use cross-validation and recursive feature elimination to evaluate the predictive power and importance of different land use categories. Finally, we show that using OSM's granular point of interest data allows for better predictions than the aggregate categories typically used in studies of transportation and land use.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.
-
Diagnosing the performance of human mobility models at small spatial scales using volunteered geographic information
Authors:
Chico Q. Camargo,
Jonathan Bright,
Scott A. Hale
Abstract:
Accurate modelling of local population movement patterns is a core contemporary concern for urban policymakers, affecting both the short term deployment of public transport resources and the longer term planning of transport infrastructure. Yet, while macro-level population movement models (such as the gravity and radiation models) are well developed, micro-level alternatives are in much shorter s…
▽ More
Accurate modelling of local population movement patterns is a core contemporary concern for urban policymakers, affecting both the short term deployment of public transport resources and the longer term planning of transport infrastructure. Yet, while macro-level population movement models (such as the gravity and radiation models) are well developed, micro-level alternatives are in much shorter supply, with most macro-models known to perform badly in smaller geographic confines. In this paper we take a first step to remedying this deficit, by leveraging two novel datasets to analyse where and why macro-level models of human mobility break down at small scales. In particular, we use an anonymised aggregate dataset from a major mobility app and combine this with freely available data from OpenStreetMap concerning land-use composition of different areas around the county of Oxfordshire in the United Kingdom. We show where different models fail, and make the case for a new modelling strategy which moves beyond rough heuristics such as distance and population size towards a detailed, granular understanding of the opportunities presented in different areas of the city.
△ Less
Submitted 20 May, 2019;
originally announced May 2019.
-
Understanding news story chains using information retrieval and network clustering techniques
Authors:
Tom Nicholls,
Jonathan Bright
Abstract:
Content analysis of news stories (whether manual or automatic) is a cornerstone of the communication studies field. However, much research is conducted at the level of individual news articles, despite the fact that news events (especially significant ones) are frequently presented as "stories" by news outlets: chains of connected articles covering the same event from different angles. These stori…
▽ More
Content analysis of news stories (whether manual or automatic) is a cornerstone of the communication studies field. However, much research is conducted at the level of individual news articles, despite the fact that news events (especially significant ones) are frequently presented as "stories" by news outlets: chains of connected articles covering the same event from different angles. These stories are theoretically highly important in terms of increasing public recall of news items and enhancing the agenda-setting power of the press. Yet thus far, the field has lacked an efficient method for detecting groups of articles which form stories in a way that enables their analysis.
In this work, we present a novel, automated method for identifying linked news stories from within a corpus of articles. This method makes use of techniques drawn from the field of information retrieval to identify textual closeness of pairs of articles, and then clustering techniques taken from the field of network analysis to group these articles into stories. We demonstrate the application of the method to a corpus of 61,864 articles, and show how it can efficiently identify valid story clusters within the corpus. We use the results to make observations about the prevalence and dynamics of stories within the UK news media, showing that more than 50% of news production takes place within stories.
△ Less
Submitted 24 January, 2018;
originally announced January 2018.
-
Does Campaigning on Social Media Make a Difference? Evidence from candidate use of Twitter during the 2015 and 2017 UK Elections
Authors:
Jonathan Bright,
Scott A Hale,
Bharath Ganesh,
Andrew Bulovsky,
Helen Margetts,
Phil Howard
Abstract:
Social media are now a routine part of political campaigns all over the world. However, studies of the impact of campaigning on social platform have thus far been limited to cross-sectional datasets from one election period which are vulnerable to unobserved variable bias. Hence empirical evidence on the effectiveness of political social media activity is thin. We address this deficit by analysing…
▽ More
Social media are now a routine part of political campaigns all over the world. However, studies of the impact of campaigning on social platform have thus far been limited to cross-sectional datasets from one election period which are vulnerable to unobserved variable bias. Hence empirical evidence on the effectiveness of political social media activity is thin. We address this deficit by analysing a novel panel dataset of political Twitter activity in the 2015 and 2017 elections in the United Kingdom. We find that Twitter based campaigning does seem to help win votes, a finding which is consistent across a variety of different model specifications including a first difference regression. The impact of Twitter use is small in absolute terms, though comparable with that of campaign spending. Our data also support the idea that effects are mediated through other communication channels, hence challenging the relevance of engaging in an interactive fashion.
△ Less
Submitted 27 July, 2018; v1 submitted 19 October, 2017;
originally announced October 2017.
-
Estimating Local Commuting Patterns From Geolocated Twitter Data
Authors:
Graham McNeill,
Jonathan Bright,
Scott A. Hale
Abstract:
The emergence of large stores of transactional data generated by increasing use of digital devices presents a huge opportunity for policymakers to improve their knowledge of the local environment and thus make more informed and better decisions. A research frontier is hence emerging which involves exploring the type of measures that can be drawn from data stores such as mobile phone logs, Internet…
▽ More
The emergence of large stores of transactional data generated by increasing use of digital devices presents a huge opportunity for policymakers to improve their knowledge of the local environment and thus make more informed and better decisions. A research frontier is hence emerging which involves exploring the type of measures that can be drawn from data stores such as mobile phone logs, Internet searches and contributions to social media platforms, and the extent to which these measures are accurate reflections of the wider population. This paper contributes to this research frontier, by exploring the extent to which local commuting patterns can be estimated from data drawn from Twitter. It makes three contributions in particular. First, it shows that simple heuristics drawn from geolocated Twitter data offer a good proxy for local commuting patterns; one which outperforms the major existing method for estimating these patterns (the radiation model). Second, it investigates sources of error in the proxy measure, showing that the model performs better on short trips with higher volumes of commuters; it also looks at demographic biases but finds that, surprisingly, measurements are not significantly affected by the fact that the demographic makeup of Twitter users differs significantly from the population as a whole. Finally, it looks at potential ways of going beyond simple heuristics by incorporating temporal information into models.
△ Less
Submitted 6 December, 2016;
originally announced December 2016.
-
Explaining the emergence of echo chambers on social media: the role of ideology and extremism
Authors:
Jonathan Bright
Abstract:
The emergence of politically driven divisions in online discussion networks has attracted a wealth of literature, but also one which has thus far been largely limited to single country studies. Hence whilst there is good evidence that these networks do divide and fragment into what are often described as "echo chambers", we know little about the factors which might explain this division or make ne…
▽ More
The emergence of politically driven divisions in online discussion networks has attracted a wealth of literature, but also one which has thus far been largely limited to single country studies. Hence whilst there is good evidence that these networks do divide and fragment into what are often described as "echo chambers", we know little about the factors which might explain this division or make networks more or less fragmented, as studies have been limited to a small number of political groupings with limited possibilities for systematic comparison.
This paper seeks to remedy this deficit, by providing a systematic large scale study of fragmentation on Twitter which considers discussion networks surrounding 90 different political parties in 23 different countries. It shows that political party groupings which are further apart in ideological terms interact less, and that individuals and parties which sit at the extreme ends of the ideological scale are particularly likely to form echo chambers. Indeed, exchanges between centrist parties who sit on different sides of the left-right divide are more likely than communication between centrist and extremist parties who are, notionally, from the same ideological wing. In light of the results, theory about exposure to different ideological viewpoints online is discussed and enhanced.
△ Less
Submitted 10 March, 2017; v1 submitted 16 September, 2016;
originally announced September 2016.
-
Wikipedia traffic data and electoral prediction: towards theoretically informed models
Authors:
Taha Yasseri,
Jonathan Bright
Abstract:
This aim of this article is to explore the potential use of Wikipedia page view data for predicting electoral results. Responding to previous critiques of work using socially generated data to predict elections, which have argued that these predictions take place without any understanding of the mechanism which enables them, we first develop a theoretical model which highlights why people might se…
▽ More
This aim of this article is to explore the potential use of Wikipedia page view data for predicting electoral results. Responding to previous critiques of work using socially generated data to predict elections, which have argued that these predictions take place without any understanding of the mechanism which enables them, we first develop a theoretical model which highlights why people might seek information online at election time, and how this activity might relate to overall electoral outcomes, focussing especially on how different types of parties such as new and established parties might generate different information seeking patterns. We test this model on a novel dataset drawn from a variety of countries in the 2009 and 2014 European Parliament elections. We show that while Wikipedia offers little insight into absolute vote outcomes, it offers a good information about changes in both overall turnout at elections and in vote share for particular parties. These results are used to enhance existing theories about the drivers of aggregate patterns in online information seeking.
△ Less
Submitted 22 January, 2016; v1 submitted 5 May, 2015;
originally announced May 2015.
-
Can electoral popularity be predicted using socially generated big data?
Authors:
Taha Yasseri,
Jonathan Bright
Abstract:
Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of can…
▽ More
Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections of these socially generated footprints, often known as big data, could help us to re-investigate different aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one such possibility: the monitoring and predicting of popularity dynamics of candidates and parties through the analysis of socially generated data on the web during electoral campaigns. Such data offer considerable possibility for improving our awareness of popularity dynamics. However they also suffer from significant drawbacks in terms of representativeness and generalisability. In this paper we discuss potential ways around such problems, suggesting the nature of different political systems and contexts might lend differing levels of predictive power to certain types of data source. We offer an initial exploratory test of these ideas, focussing on two data streams, Wikipedia page views and Google search queries. On the basis of this data, we present popularity dynamics from real case examples of recent elections in three different countries.
△ Less
Submitted 8 August, 2014; v1 submitted 10 December, 2013;
originally announced December 2013.