-
Incorporating Human Translator Style into English-Turkish Literary Machine Translation
Authors:
Zeynep Yirmibeşoğlu,
Olgun Dursun,
Harun Dallı,
Mehmet Şahin,
Ena Hodzik,
Sabri Gürses,
Tunga Güngör
Abstract:
Although machine translation systems are mostly designed to serve in the general domain, there is a growing tendency to adapt these systems to other domains like literary translation. In this paper, we focus on English-Turkish literary translation and develop machine translation models that take into account the stylistic features of translators. We fine-tune a pre-trained machine translation mode…
▽ More
Although machine translation systems are mostly designed to serve in the general domain, there is a growing tendency to adapt these systems to other domains like literary translation. In this paper, we focus on English-Turkish literary translation and develop machine translation models that take into account the stylistic features of translators. We fine-tune a pre-trained machine translation model by the manually-aligned works of a particular translator. We make a detailed analysis of the effects of manual and automatic alignments, data augmentation methods, and corpus size on the translations. We propose an approach based on stylistic features to evaluate the style of a translator in the output translations. We show that the human translator style can be highly recreated in the target machine translations by adapting the models to the style of the translator.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Back-to-the-Future Whois: An IP Address Attribution Service for Working with Historic Datasets
Authors:
Florian Streibelt,
Martina Lindorfer,
Seda Gürses,
Carlos H. Gañán,
Tobias Fiebig
Abstract:
Researchers and practitioners often face the issue of having to attribute an IP address to an organization. For current data this is comparably easy, using services like whois or other databases. Similarly, for historic data, several entities like the RIPE NCC provide websites that provide access to historic records. For large-scale network measurement work, though, researchers often have to attri…
▽ More
Researchers and practitioners often face the issue of having to attribute an IP address to an organization. For current data this is comparably easy, using services like whois or other databases. Similarly, for historic data, several entities like the RIPE NCC provide websites that provide access to historic records. For large-scale network measurement work, though, researchers often have to attribute millions of addresses. For current data, Team Cymru provides a bulk whois service which allows bulk address attribution. However, at the time of writing, there is no service available that allows historic bulk attribution of IP addresses. Hence, in this paper, we introduce and evaluate our 'Back-to-the-Future whois' service, allowing historic bulk attribution of IP addresses on a daily granularity based on CAIDA Routeviews aggregates. We provide this service to the community for free, and also share our implementation so researchers can run instances themselves.
△ Less
Submitted 31 January, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Exploring Data Pipelines through the Process Lens: a Reference Model forComputer Vision
Authors:
Agathe Balayn,
Bogdan Kulynych,
Seda Guerses
Abstract:
Researchers have identified datasets used for training computer vision (CV) models as an important source of hazardous outcomes, and continue to examine popular CV datasets to expose their harms. These works tend to treat datasets as objects, or focus on particular steps in data production pipelines. We argue here that we could further systematize our analysis of harms by examining CV data pipelin…
▽ More
Researchers have identified datasets used for training computer vision (CV) models as an important source of hazardous outcomes, and continue to examine popular CV datasets to expose their harms. These works tend to treat datasets as objects, or focus on particular steps in data production pipelines. We argue here that we could further systematize our analysis of harms by examining CV data pipelines through a process-oriented lens that captures the creation, the evolution and use of these datasets. As a step towards cultivating a process-oriented lens, we embarked on an empirical study of CV data pipelines informed by the field of method engineering. We present here a preliminary result: a reference model of CV data pipelines. Besides exploring the questions that this endeavor raises, we discuss how the process lens could support researchers in discovering understudied issues, and could help practitioners in making their processes more transparent.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Heads in the Clouds: Measuring the Implications of Universities Migrating to Public Clouds
Authors:
Tobias Fiebig,
Seda Gürses,
Carlos H. Gañán,
Erna Kotkamp,
Fernando Kuipers,
Martina Lindorfer,
Menghua Prisse,
Taritha Sari
Abstract:
With the emergence of remote education and work in universities due to COVID-19, the `zoomification' of higher education, i.e., the migration of universities to the clouds, reached the public discourse. Ongoing discussions reason about how this shift will take control over students' data away from universities, and may ultimately harm the privacy of researchers and students alike. However, there h…
▽ More
With the emergence of remote education and work in universities due to COVID-19, the `zoomification' of higher education, i.e., the migration of universities to the clouds, reached the public discourse. Ongoing discussions reason about how this shift will take control over students' data away from universities, and may ultimately harm the privacy of researchers and students alike. However, there has been no comprehensive measurement of universities' use of public clouds and reliance on Software-as-a-Service offerings to assess how far this migration has already progressed.
We perform a longitudinal study of the migration to public clouds among universities in the U.S. and Europe, as well as institutions listed in the Times Higher Education (THE) Top100 between January 2015 and October. We find that cloud adoption differs between countries, with one cluster (Germany, France, Austria, Switzerland) showing a limited move to clouds, while the other (U.S., U.K, the Netherlands, THE Top100) frequently outsources universities' core functions and services -- starting long before the COVID-19 pandemic. We attribute this clustering to several socio-economic factors in the respective countries, including the general culture of higher education and the administrative paradigm taken towards running universities. We then analyze and interpret our results, finding that the implications reach beyond individuals' privacy towards questions of academic independence and integrity.
△ Less
Submitted 23 August, 2023; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Privacy Engineering Meets Software Engineering. On the Challenges of Engineering Privacy ByDesign
Authors:
Blagovesta Kostova,
Seda Gürses,
Carmela Troncoso
Abstract:
Current day software development relies heavily on the use of service architectures and on agile iterative development methods to design, implement, and deploy systems. These practices result in systems made up of multiple services that introduce new data flows and evolving designs that escape the control of a single designer. Academic privacy engineering literature typically abstracts away such c…
▽ More
Current day software development relies heavily on the use of service architectures and on agile iterative development methods to design, implement, and deploy systems. These practices result in systems made up of multiple services that introduce new data flows and evolving designs that escape the control of a single designer. Academic privacy engineering literature typically abstracts away such conditions of software production in order to achieve generalizable results. Yet, through a systematic study of the literature, we show that proposed solutions inevitably make assumptions about software architectures, development methods and scope of designer control that are misaligned with current practices. These misalignments are likely to pose an obstacle to operationalizing privacy engineering solutions in the wild. Specifically, we identify important limitations in the approaches that researchers take to design and evaluate privacy enhancing technologies which ripple to proposals for privacy engineering methodologies. Based on our analysis, we delineate research and actions needed to re-align research with practice, changes that serve a precondition for the operationalization of academic privacy results in common software engineering practices.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
Decentralized Privacy-Preserving Proximity Tracing
Authors:
Carmela Troncoso,
Mathias Payer,
Jean-Pierre Hubaux,
Marcel Salathé,
James Larus,
Edouard Bugnion,
Wouter Lueks,
Theresa Stadler,
Apostolos Pyrgelis,
Daniele Antonioli,
Ludovic Barman,
Sylvain Chatel,
Kenneth Paterson,
Srdjan Čapkun,
David Basin,
Jan Beutel,
Dennis Jackson,
Marc Roeschlin,
Patrick Leu,
Bart Preneel,
Nigel Smart,
Aysajan Abidin,
Seda Gürses,
Michael Veale,
Cas Cremers
, et al. (9 additional authors not shown)
Abstract:
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chai…
▽ More
This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chain. The system aims to minimise privacy and security risks for individuals and communities and guarantee the highest level of data protection. The goal of our proximity tracing system is to determine who has been in close physical proximity to a COVID-19 positive person and thus exposed to the virus, without revealing the contact's identity or where the contact occurred. To achieve this goal, users run a smartphone app that continually broadcasts an ephemeral, pseudo-random ID representing the user's phone and also records the pseudo-random IDs observed from smartphones in close proximity. When a patient is diagnosed with COVID-19, she can upload pseudo-random IDs previously broadcast from her phone to a central server. Prior to the upload, all data remains exclusively on the user's phone. Other users' apps can use data from the server to locally estimate whether the device's owner was exposed to the virus through close-range physical proximity to a COVID-19 positive person who has uploaded their data. In case the app detects a high risk, it will inform the user.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Questioning the assumptions behind fairness solutions
Authors:
Rebekah Overdorf,
Bogdan Kulynych,
Ero Balsa,
Carmela Troncoso,
Seda Gürses
Abstract:
In addition to their benefits, optimization systems can have negative economic, moral, social, and political effects on populations as well as their environments. Frameworks like fairness have been proposed to aid service providers in addressing subsequent bias and discrimination during data collection and algorithm design. However, recent reports of neglect, unresponsiveness, and malevolence cast…
▽ More
In addition to their benefits, optimization systems can have negative economic, moral, social, and political effects on populations as well as their environments. Frameworks like fairness have been proposed to aid service providers in addressing subsequent bias and discrimination during data collection and algorithm design. However, recent reports of neglect, unresponsiveness, and malevolence cast doubt on whether service providers can effectively implement fairness solutions. These reports invite us to revisit assumptions made about the service providers in fairness solutions. Namely, that service providers have (i) the incentives or (ii) the means to mitigate optimization externalities. Moreover, the environmental impact of these systems suggests that we need (iii) novel frameworks that consider systems other than algorithmic decision-making and recommender systems, and (iv) solutions that go beyond removing related algorithmic biases. Going forward, we propose Protective Optimization Technologies that enable optimization subjects to defend against negative consequences of optimization systems.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
POTs: Protective Optimization Technologies
Authors:
Bogdan Kulynych,
Rebekah Overdorf,
Carmela Troncoso,
Seda Gürses
Abstract:
Algorithmic fairness aims to address the economic, moral, social, and political impact that digital systems have on populations through solutions that can be applied by service providers. Fairness frameworks do so, in part, by mapping these problems to a narrow definition and assuming the service providers can be trusted to deploy countermeasures. Not surprisingly, these decisions limit fairness f…
▽ More
Algorithmic fairness aims to address the economic, moral, social, and political impact that digital systems have on populations through solutions that can be applied by service providers. Fairness frameworks do so, in part, by mapping these problems to a narrow definition and assuming the service providers can be trusted to deploy countermeasures. Not surprisingly, these decisions limit fairness frameworks' ability to capture a variety of harms caused by systems.
We characterize fairness limitations using concepts from requirements engineering and from social sciences. We show that the focus on algorithms' inputs and outputs misses harms that arise from systems interacting with the world; that the focus on bias and discrimination omits broader harms on populations and their environments; and that relying on service providers excludes scenarios where they are not cooperative or intentionally adversarial.
We propose Protective Optimization Technologies (POTs). POTs provide means for affected parties to address the negative impacts of systems in the environment, expanding avenues for political contestation. POTs intervene from outside the system, do not require service providers to cooperate, and can serve to correct, shift, or expose harms that systems impose on populations and their environments. We illustrate the potential and limitations of POTs in two case studies: countering road congestion caused by traffic-beating applications, and recalibrating credit scoring for loan applicants.
△ Less
Submitted 26 January, 2020; v1 submitted 7 June, 2018;
originally announced June 2018.
-
Emerging Markets for RFID Traces
Authors:
Matthias Bauer,
Benjamin Fabian,
Matthias Fischmann,
Seda Gürses
Abstract:
RFID tags are held to become ubiquitous in logistics in the near future, and item-level tagging will pave the way for Ubiquitous Computing, for example in application fields like smart homes. Our paper addresses the value and the production cost of information that can be gathered by observing these tags over time and different locations. We argue that RFID technology will induce a thriving mark…
▽ More
RFID tags are held to become ubiquitous in logistics in the near future, and item-level tagging will pave the way for Ubiquitous Computing, for example in application fields like smart homes. Our paper addresses the value and the production cost of information that can be gathered by observing these tags over time and different locations. We argue that RFID technology will induce a thriving market for such information, resulting in easy data access for analysts to infer business intelligence and individual profiles of unusually high detail. Understanding these information markets is important for many reasons: They represent new business opportunities, and market players need to be aware of their roles in these markets. Policy makers need to confirm that the market structure will not negatively affect overall welfare. Finally, though we are not addressing the complex issue of privacy, we are convinced that market forces will have a significant impact on the effectiveness of deployed security enhancements to RFID technology. In this paper we take a few first steps into a relatively new field of economic research and conclude with a list of research problems that promise deeper insights into the matter.
△ Less
Submitted 5 June, 2006;
originally announced June 2006.