(Translated by https://www.hiragana.jp/) Sybil-Proof Mechanism for Information Propagation with Budgets
Sybil-Proof Mechanism for Information Propagation with Budgets
Junjie Zheng1Xu Ge1Bin Li 2Dengji Zhao 1∗ 1ShanghaiTech University
2Nanjing University of Science and Technology
{zhengjj, gexu, zhaodj}@shanghaitech.edu.cn, cs.libin@njust.edu.cn
Abstract
This paper examines the problem of distributing rewards on social networks to improve the efficiency of crowdsourcing tasks for sponsors.
To complete the tasks efficiently, we aim to design reward mechanisms that incentivize early-joining agents to invite more participants to the tasks. Nonetheless, participants could potentially engage in strategic behaviors, e.g., not inviting others to the tasks, misreporting their capacity for the tasks, or creaking fake identities (aka Sybil attacks), to maximize their own rewards. The focus of this study is to address the challenge outlined above by designing effective reward mechanisms. To this end,
we propose a novel reward mechanism, called Propagation Reward Distribution Mechanism (PRDM), for the general information propagation model with limited budgets. It is proved that the PRDM can not only incentivize all agents to contribute their full efforts to the tasks and share the task information to all their neighbors in the social networks, but can also prevent them from Sybil attacks.
1 Introduction
The widespread availability of mobile Internet devices has fostered greater interconnectedness among individuals via social networks and amplified the impact of information spread through social connections.††∗Corresponding Author.
In certain fields, including viral marketing Leskovec et al. (2006), crowdsourcing distribution Singer and Mittal (2011); Doan et al. (2011), answer querying Kleinberg and Raghavan (2005), sponsors frequently incentivize participants with monetary rewards to gather as much data or sell as many products as possible. In 2005, Amazon launched a crowdsourcing platform called Amazon Mechanical Turk (MTurk) to gather data from non-professionals. On the MTurk platform, the sponsors can post tasks and rewards, and then the workers claim the tasks and receive payments accordingly based on the quantity and quality of their completed tasks. Many studies requiring extensive data started collecting data through MTurk Sorokin and Forsyth (2008). One study in 2019 showed that more than 250,000 people have completed at least one task on MTurk Robinson et al. (2019). However, a large percentage of these workers are fixed, which is mainly because that inviting new people to join is not beneficial. Making existing workers invite more people to participate can significantly improve efficiency.
In this paper, we aim to adequately utilize people’s connections in the network to design a reward distribution mechanism Zhang and Zhao (2022). This mechanism incentivizes agents to invite more people to participate by the reward distribution, which eventually improves the overall completion efficiency. The first difficulty is distributing the rewards within a constrained budget. The mechanism should motivate agents to spread the information in their social network as much as possible. In the DARPA network challenge Pickard et al. (2010); Tang et al. (2011), the winning team from MIT used a pioneering mechanism to effectively motivate people to spread information and quickly found all ten red balloons. In multi-level marketing Emek et al. (2011a); Drucker and Fleischer (2012), the seller expects to sell more products by attracting more people to purchase. In our problem setting, we also need to properly allocate the limited budget to participants.
Another difficulty is resolving Sybil attacks in social networks. A Sybil attack is when participants create multiple false identities to accomplish specific purposes. Sybil attacks are widespread and easily performed, affecting eventual results and harming others Alothali et al. (2018); Yu et al. (2006); Zhang et al. (2014). Traditional defense approaches are mainly focused on the communication domain Chen et al. (2021); Jamshidi et al. (2019); Zhang and Lee (2019). Scholars have extensively studied this phenomenon in various domains, such as the Vickrey-Clarke-Groves process in auction theory is vulnerable to Sybil attacks Yokoo et al. (2004), and Yokoo et al. Yokoo et al. (2001) developed a new protocol against false-name bids. In Bitcoin transactions, Babaioff et al. Babaioff et al. (2012) devised a scheme that rewards information propagation to prevent Sybil attacks to make more revenue. In crowdsourcing, individuals have different abilities, such as computing power, purchasing advertising, or providing data. Emek et al. Emek et al. (2011b) solved the problem of Sybil attacks in viral marketing by rewarding propagation behavior based on the size of a maximum perfect binary tree. We aim to use this authentic contribution information to design an information propagation mechanism that defends against Sybil attacks.
In this paper, our mechanism drives improvements in the following dimensions.
•
We propose a model that quantifies an agent’s contribution by introducing the concept of capacity. The model considers the general setting of Sybil attacks.
•
We propose a novel natural mechanism to allocate rewards that maximize information propagation within a limited budget while resisting Sybil attacks.
Related work. With a fixed budget, Shi et al. Shi et al. (2020) devised a mechanism that maximizes information propagation but is not resistant to Sybil attacks. Chen et al. Chen and Li (2021) designed a special scenario of a free market with lotteries, where participants have a strong incentive to maximize the diffusion of information, and false-name manipulations fail to yield excessive rewards. In the answer querying problem, Zhang et al. Zhang et al. (2020) designed a mechanism that incentivizes the agents to propagate the requestor’s query information while making the Sybil attack unavailable for additional gain. However, their mechanism only solves the scene of a single problem query in a tree. Hong et al. Chen et al. (2022) solved the problem of Sybil attacks in diffusion auctions by removing possible fake agents by graph-structured methods, providing a new approach to tackle similar issues.
The remainder of this paper is organized as follows. Section 2 describes the fundamental setup and definition of the model. Section 3 shows our mechanism and an example of running the mechanism. Section 4 shows the properties of our mechanism. In Section 5, we discuss these properties. In Section 6, we summarize our work and discuss possible future directions.
2 The Model
We consider the crowdsourcing problem powered by social networks, where a sponsor expects to leverage the social connections to recruit more participants (or agents) to some crowdsourcing task, e.g., data collecting. For convenience, we model the social connections of all agents as a directed graph , where represents the set of vertices and denotes the edge set. Except for the sponsor , the graph consists of a set of agents who can contribute to the task, i.e., . For each agent , we denote by the maximum contribution capacity (or simply, capacity) of for the task, e.g., can denote the affordable number of pictures that need to be labeled. For any two agents , there is an edge if and only if agent can invite agent . Given an edge , we say is a child of and use to denote the set of ’s children in . Without promotions, the sponsor can only recruit her direct children to the task, and within such small number of participants the task may fail to be accomplished. To attract more agents, the sponsor plans to reward the participants to incentivize them to further spread the task information to their children, under a total budget of , and the amount of each participant’s reward is determined by her reports, including her performance on the task and her diffusion efforts.
As usual, let be agent ’s private type, where denotes the set of her children and is her capacity. In addition, denote by the type profile of all agents, and the type profile of all agents except agent , i.e., . For convenience’s sake, we use to denote the type space of agent where is the power set of the set , and to denote the space of all type profiles.
As is private information, agent can cheat the sponsor to benefit herself. Let be the type reported by agent , i.e., diffused information to and contributed to the task.
Since agent is unaware of other agents in the graph who are not her children and cannot contribute more than her capacity, we require that and . Similarly, let denote the report profile of all agents, where represents the report profile of all agents except agent . Accordingly, we use to denote the space of , the space of , and the space of .
Definition 1.
Given a report profile , we say agent is active if there exists a sequence of agents , where and holds for any .
That is, an agent is an active agent if there is a “diffusion path” from the sponsor to her. Note that only active agents are real participants of the crowdsourcing task. Based on the definition of active agents, we next introduce the concept of active network.
Definition 2.
Given a report profile , we use (or for short) to denote the active network generated by , where is the set of all active agents and .
The active network represents all agents that do participate in the task. Given any report profile , the sponsor only need to reward agents in the active networks.
Definition 3.
A reward distribution mechanism on the social network consists of
a set of reward functions, where is the reward function for and for an inactive agent .
Given any report profile , outputs the reward to . If an agent is not in the active network, her reward is always zero as she does not participate in the task and contributes nothing.
When is clear from the context, we write as and for short. In the following, we define some desirable properties that a reward mechanism should satisfy. First, the reward mechanism should be individually rational, which guarantees that each participant is willing to stay in the mechanism.
Definition 4.
A reward distribution mechanism is individually rational (IR) if for all graph , all and all report profile .
If a reward mechanism is not individually rational, then in certain cases some participants will pay to the sponsor and the best reply is leaving the mechanism. Therefore, the individually rational property is also known as the participation constraint.
Besides the IR property, the sponsor also expects an agent to authentically contribute all her abilities and invite all her children to the task.
Definition 5.
A reward distribution mechanism is incentive compatible (IC) if the following inequality
(1)
holds for all graph , all , all , all and all .
Incentive compatibility implies that diffusing the task information to all children and contributing all her efforts to the task is a dominant strategy for all agents.
As the sponsor is endowed with a fixed budget, the total rewards to agents are limited.
Definition 6.
A reward distribution mechanism is budget balanced (BB) if
(2)
for all graph , all and all report profile .
Definition 7.
A reward distribution mechanism is asymptotically budget balanced (ABB) if
(3)
for all graph , all and all report profile .
The ABB property requires the sponsor’s budget to be fully distributed to agents when the sum of all agents’ contributions goes to infinity. If a reward mechanism is IR and IC, then agents are motivated to contribute all their capacities and propagate the task information to all their children. However, as the agents are individuals distributed in the network, they can easily create fake identities or even fake social networks to gain more reward. Such behaviors are called Sybil attack or false-name attack, and a good reward mechanism should prevent such kind of behavior.
Next, we give a formal definition of Sybil attacks.
Definition 8.
A Sybil attack of agent is denoted by an attacking type report , where is a set of fake identities and accordingly are their reports, where
•
;
•
for all .
In other words, agent can create arbitrary number of fake identities and arbitrary social connections between these identities. Let us consider a special case of Sybil attack: all the fake nodes are invited by the inviters of node .
Definition 9.
A parallel Sybil attack of agent is a special kind of Sybil attack, where is a set of fake identities invited by the parents of .
A Parallel Sybil attack implies only fake in parallel, where the fake participants are all invited by at least one inviter of the agent committing the attack. With the definition of Sybil attacks, we intend to design reward mechanisms that can defend against Sybil attacks.
Definition 10.
A reward distribution mechanism is Sybil-proof (SP), if the inequality
(4)
holds for all graph , all , all , all and , where is the report profile of all agents under Sybil attack . The mechanism is parallel Sybil-proof (PSP) if the Sybil attacks satisfy the situation of parallel Sybil attacks.
The SP property may be too strong to be held, and we next introduce a mild condition for Sybil-proofness, called -SP.
Definition 11.
A reward distribution mechanism is -Sybil-proof (-SP), if the inequality
(5)
holds for all graph , all , all , all and .
In the following contents, we focus on designing reward mechanisms that satisfy IR, IC and other expected properties.
3 Propagation Reward Distribution Mechanism
This section introduces a novel reward distribution mechanism called Propagation Reward Distribution Mechanism (PRDM). PRDM starts by layering a given network and then determines the final rewards for each agent by the contribution phase and propagation phase.
The goal of all agents is to get more rewards except that the sponsor wants to maximize the information propagation instead of receiving a reward. Sponsor will always diffuse the information to all the children. For a given report profile , we generate the active network . In , define the depth of agent as the length of the shortest path from to , written as . Therefore, different agents can be divided into different layers based on their depths, and define the -th layer as the set of all agents with depth .
Since we only allow information to be propagated from the previous layer to the next layer, for all , only the edges from agent to the agents in the -th layer are retained. By the above processing, we construct a layered directed graph based on . Figure 1 shows an example of how to get the corresponding layered graph from an active network. In the obtained layered graph, for any , define as the set of all parents of in -th layer.
Figure 1: An example of transforming an active network (a) into a layered graph (b).
Input:A report profile , a fixed budget and parameters and
1
Construct the active network ;
2
Compute the depth of each agent who is on the graph to obtain the layer sets ;
3
For , let be the total contribution of and layer ;
4Contribution phase: Initialize each agent’s weight for , and the initial budget of the first layer is ;
5fordo
6foreach agent do
7
;
8
9 ;
10
11Propagation phase: Initialize each agent’s reward for all , and for ;
12fordo
13foreach agent do
14foreach agent do
15
;
16
17
18
Output:the reward vector
Algorithm 1Propagation Reward Distribution Mechanism
PRDM is divided into a contribution phase and a propagation phase. In the contribution phase, the corresponding weight is determined by each agent’s depth and contribution. In the propagation phase, the weight is redistributed according to agents’ propagation and output agents’ final reward. In PRDM, the parameter is a virtual capacity of the sponsor, which is utilized to deliver the budget to the following layers. The parameter measures what proportion of the rewards an agent gives her invitees. With the above definitions, the general procedure of PRDM is shown in Algorithm 1.
Figure 2: An example of PRDM on input , , , each agent has a contribution of . (a) the invitation relationship among the sponsor and each agent. (b) each layer’s initial budget and each agent’s weight in contribution phase. (c) the transfer of reward during propagation phase and each agent’s final reward .
3.1 An Example of PRDM
In this subsection, we show an example of the mechanism in operation. An instance is shown in Figure 2 to give an illustration of PRDM. The sponsor transmits the information to the first layer . After that, and . The invitation relationships among all the agents are presented in Figure 2(a).
Assuming a budget , we set and , all agents report a contribution of . The process of distributing rewards using PRDM is as follows.
Contribution phase:
•
Step 1: is the total contribution of sponsor and agents , , and . We can calculate and the budget , so that each of them has weight
•
Step 2: Calculate the budget and . Then we obtain the weight of the agent , , and as
•
Step 3: Similarly, , , so the weight of agents and is
Propagation phase:
•
Step 4: The initial reward for agents is the weight calculated in the contribution phase
•
Step 5: Agent and agent transfer of their weights to agent respectively as rewards; agent transfers of her weights to agent and agent
•
Step 6: Similarly, we consider the transfer of agent and agent
The final reward is according to PRDM. Each component of represents the reward of the corresponding agent. Note that we still have available for further propagation.
4 Properties of PRDM
In this section, we show several properties of PRDM. We start by discussing the straightforward properties of PRDM, and then we illustrate how PRDM maximizes information propagation and defends against Sybil attacks.
For the convenience contents of the following formulation, denote as the sum of the contributions of the set , e.g., is the total contribution of -th layer. Recall that when is an integer, denotes the total contribution of the first layers.
Theorem 1.
The Propagation Reward Distribution Mechanism is asymptotically budget balanced.
Proof.
In PRDM, the division of the initial budget is performed only in the contribution phase, which implies . Recall that for an active network , the sponsor has a virtual contribution and is the total contribution of and all the agents in layer .
According to PRDM, each layer can only divide a part of the remaining reward from the previous layer. Suppose that there are layers. We focus on , which is the residual budget of layer inherited from the upper layer. Generally, for , we have . Specially, let be the budget that has not been distributed. Then, we can infer that
Next, we show that converges to 0 when the total contribution goes to infinity. Starting from the first layer, we can get
Similarly, for , we have . Then, when the total contribution goes to infinity, , hence .
∎
The above theorem indicates that PRDM will allocate all of the sponsor’s budget to the agents when the total contribution is large enough. Meanwhile, the sponsor does not need to pay extra budgets for the contributions of extra participants.
Theorem 2.
The Propagation Reward Distribution Mechanism is individually rational.
Proof.
Intuitively, any agent in a social network , at any stage of PRDM, does not need to pay a fee, so holds.
∎
Actually, for any agent of the active network, they always have a positive reward . Furthermore, Theorem 3 shows that an agent maximize the reward when she truthfully report her type.
Theorem 3.
The Propagation Reward Distribution Mechanism is incentive compatible.
Proof.
By the definition of incentive compatible, PRDM needs to satisfy that for any agent , for any report profile of others, truthfully reporting her private type is a dominant strategy. The report of agent consists of the contributions and the set of children . Hence for any agent , we need to prove the following two parts
•
Agent contributes as much as she is capable to maximize her reward.
•
Agent invites all her children to maximize her reward.
Part 1: if agent is not in the active network
, the reward is zero regardless of how much she contributes, so maximizes her reward. For any , assume that agent is in the -th layer () in the layered graph with layers and agent is the only parent of her children in -th layer. Thus for any , any and , we have
(6)
where is the total contribution in -th layer except , is the total contribution of ’s children in -th layer. The first term of in Equation (6) is the reward reserved by . The second term is the reward coming from the next layer. All quantities except are fixed, so the first term increases as increases and the second term decreases as increases. Consider the worst case: , , when the first term decreases the fastest while the second term increases the slowest, can be reduced as
(7)
Since is a monotonically increasing function of , agent receives the highest reward when . Furthermore, if , agent is in the first layer and is not required to distribute rewards to the previous layer, the first term in Equation (6) will be larger. If , agent is in the last layer and has no rewards from the next layer, so the second term in Equation (6) is . If agent is not the only parent of her children in -th layer, the second term in the equation (6) decreases more slowly. All of these cases will be better than the worst case we discussed in Equation (7). Therefore maximizes the reward of agent .
Part 2: if agent is not in the active network ,
, again her reward is always equal to . If , for all , she add one more child into . Suppose agent is already in . In that case, we consider that is in the layer below , gets an additional reward without affecting the existing reward, and ’s reward remains unchanged if is in other layers. Alternatively is a new agent in the active network, then must be in the next layer of , the reward of changes from to , which is obviously increased. Hence when agent invites all her children, she maximizes the reward.
In conclusion, PRDM is incentive compatible, which indicates that truthful report is the dominant strategy for all agents. In other words, all agents will maximize information propagation while making the largest contributions within their capacity.
∎
Next, we will discuss the property of Sybil-proofness.
Theorem 4.
The Propagation Reward Distribution Mechanism is parallel Sybil-proof.
Proof.
Suppose agent (). When agent does commit a parallel Sybil attack to be . It can be simply deduced from the proof of incentive compatible that for all nodes in the set , their dominant strategy is making the largest contributions within their capacity and invites all their children. However, their capacity is limited by , which means that truthful reports without creating fake nodes will maximize the benefit of agent .
∎
Then we discuss the more general situation of Sybil attacks. Before giving the main conclusion, we first present two lemmas. Lemma 1 concludes that an agent cannot increase her weight in contribution phase by making fake nodes.
Figure 3: (a) is the case where agent does not commit Sybil attacks, the black node represents agent , and the white nodes represent real participants that invites. (b) shows the situation where conducts fake nodes one layer down in which the dashed node represent all the nodes generated by . (c) is the most general form of a Sybil attacks.
Lemma 1.
Each agent cannot increase the total weight in contribution phase by committing Sybil attack .
Proof.
Suppose agent (). When agent does not commit a Sybil attack, the network is shown in Figure 3(a), the weight of is . Let us first show that an agent cannot increase her weight by making several fake nodes as her own children. For convenience, we denote .
Without loss of generality, let . After committing Sybil attack , agent can transfer part of her contribution to her fake nodes () and . Let be the total weight of and all her fake nodes. According to PRDM, as shown in Figure 3(b), when all the fake nodes are in the next layer of , we have