Classification of gossip and epidemic protocols
Explore the intricate world of gossip and epidemic protocols, essential algorithms for communication in distributed networks.
Written by Ajay Rathore (Senior Java Software Engineer), in the December 2023 issue of Today Software Magazine. Read the article in Romanian here.
Gossip or epidemic protocols are a class of algorithms that pertain to communication over a distributed network. These algorithms have several applications from communication in distributed ledger networks and IOT networks to synchronizing updates across replication of databases, etc.
The following study focuses on classifying these protocols – materials were selected only if they presented performance data for the implementations discussed.
Introduction gossip and epidemic protocol
These protocols follow a similar approach as an epidemic or gossip spread in real world.
In the implementations of these algorithms the focus is usually set on communicating data to randomly selected nodes, and the expectation is to reach all the nodes in the network.
To classify the Gossip protocol, the first thing to look at is the origin of these protocols in the paper published by Alan Demers et al – Epidemic Algorithms for Replicated Database Maintenance – which addressed the issue of replicating database updates to thousands of different sites.
The challenge was to achieve this on an unreliable and slowly changing network in a way that is efficient and scalable.
These 3 approaches are considered:
Direct mail: Intuitively, it can be understood as sending an update to all the nodes and every node in the network will send their update to all nodes. This way for n updates we will generate m messages where m is no. of nodes. So, it is possible to have m*n messages in the network at any point. Also, for unstable networks, this might mean that updates are not sent to all the nodes.
Anti-entropy: Different sites resolve differences between their states. A random pair is selected, and the entire database is analysed. Since analysing delta between entire state is slow, this approach is much slower than the first one. Nevertheless, it is much more reliable. In this approach, we are generating a few counts of messages as two nodes exchange state which could contain multiple updates. Even if they might have received updates from other nodes, and in one exchange, they are transmitting those updates, too. However, there is data exchange throughout the entire database.
Rumour Mongering: Each update is treated as a hot rumor. When getting a hot rumour, a node selects another node and shares this hot rumour with a random node. The same behaviour continues until the expectation of communicating with every node is reached. Since this method only sends new updates, the size of messages is small. Although, there is a non-zero probability that a rumor might die before reaching all the nodes.
Anti-entropy and Rumour Mongering are variations of epidemic protocols. They define performance of such protocols can be analysed using 3 criteria:
Residue: How many nodes didn’t receive an update when there are no more rumours being passed.
Traffic: Number of update messages sent per site.
Delay: Average time it took for an update to arrive at a node after initial injection in the network and the time the message arrived at the last node.
Classifying epidemic protocols
Dissemination and aggregation
From the origin of such protocols, we can see that authors are trying to spread information (dissemination) while processing the information as they aggregate the updates (Aggregation). We can say the first two classifications can be done based on these two tasks.
We will divide the algorithms into these two sections considering which implementations/ strategies are good at disseminating and which are aggregating.
Here are some implementations and the categories they can fit into:
Dissemination | Aggregation |
Rumor Mongering | Anti-Entropy |
Median-Counter Algorithms | P-Q epidemic |
Multi-factor weighting function (MFWF) | epidemic with immunity |
| Epidemic with TTL |
Table 1. Created by the author: Classification of algorithms based on their focus on processing data vs faster/low-cost rumour spreading
P-Q epidemic, epidemic with immunity and epidemic with TTL were studied through a unified study that provides enhancements of these algorithms – Feng, Zuyong and Chin, Kwan-Wu: A unified study of epidemic routing protocols and their enhancements, 2012.
This classification uses the focus of protocols in providing good aggregations of information across the network vs. focusing on rapidly transferring information; a tradeoff between them can be observed based on the Delay information.
| tlast | Tave | Cost |
Anti-entropy | 7.8 | 5.3s | Size of db |
Rumour Mongering | Similar time as above | Similar time as above | Size of one list of hot rumours {assumed to be much smaller than db} |
Epidemic with TTL | N/A | Order of 104 s |
|
P-Q Epidemic | N/A | Order of 104 s |
|
Epidemic with immunity | N/A | Order of 104 s |
|
MFWF | N/A | 5s |
|
Table 2. Created by the author: Performance comparison for different algorithms
It can be observed that dissemination focused algorithms work on reducing the delay time. Hence, we can summarise pros and cons of this classification as follows:
| Pros | Cons |
Dissemination | It provides fast spreading of information. | It can have a non-zero probability of missing out nodes. |
Aggregation | It provides a sure way of processing the data at nodes. Providing guarantees of operations done at each node. | It is costly and hence can be slower to execute. |
Table 3. Created by the author: Pros and cons of dissemination and aggregation
Synchronous and asynchronous communication
Next criteria of classification can be seen as the type of communication used between nodes. The same classification is used by Bunsel, Yann & Bertier in the paper called “Bridging the Gap between Population and Gossip-based Protocols”. They define Synchronous as “A synchronous communication is modeled by a bounded transmission delay” and Asynchronous as “An asynchronous communication does not impose a bound on the transmission delay of a message.”
Here are some implementations that can fit into this classification:
Synchronous | Asynchronous |
MFWF | Asynchronous Sum-Weight Gossip Protocols |
Synchronous Consensus During Incomplete Synchrony | Epidemic Asynchronous Ru- mour Spreading |
Table 4. Created by the author: Classification of algorithms based on their communication model
Although four specific algorithms are added to the table above, any implementation of algorithms which finish in bounded time will be considered synchronous and the opposite will be asynchronous. One example is MFWF, which was considered synchronous for this study as there is no evidence in presented algorithms if there is any mechanism to provide asynchronous processing.
The performance of the selected algorithms is not expressed in the same units. Below is a table with units proposed in the implementations and their values.
Algorithm | Unit | Value | Remarks |
MFWF | Percentage of saved rebroadcasts | Up to 84% | Indicates very few nodes have to rebroadcast the message. |
Synchronous Consensus During Incomplete Synchrony | N/A | N/A | Focuses performance by providing proof for high tolerance to fault under synchronous implementation. |
Asynchronous Sum-Weight Gossip Protocols | Number of messages/node | Up to 17 | Showcases standard deviation from empirical values. |
Epidemic Asynchronous Ru- mour Spreading | N/A | N/A | Focuses on proving f<n failures can be tolerated. And present the time complexity and message complexity of the algorithm. |
Table 5. Created by the author: Performance description of algorithms
One major pro for Asynchronous communication that can be observed in these implementations is the possibility of implementing consensus without leader selection.
Below are pros and cons for each category.
Category | Pros | Cons |
Asynchronous | Can be used for networks with high entropy where respecting bounded delivery could be difficult. | Requires extra mechanism to define behaviour of node that are processing rumour and receive another at the same time. |
Synchronous | Atomicity for pushing and processing rumours. Defined time bound for finishing operations. Consensus algorithm implementation without leader selection |
|
Table 6. Created by the author: Pros and cons of synchronous and asynchronous communication
Anonymous and non-anonymous communication
The definition of these categories are also studied inspired from Bridging the Gap between Population and Gossip-based Protocols paper. The authors define Anonymous Gossip Protocols (AGP) as “protocols that do not require being aware of the identities of any peer for any of the three functions of the generic protocol.” and Non-Anonymous Gossip protocol (NGP) as “are not oblivious to the identities of peers they are communicating with or any other”.
Considering these definitions, we can find the following protocol implementation in each category:
Anonymous | Non-Anonymous |
Rumour Mongering | Gossip-Based Peer Sampling |
Anti-Entropy | T-Man |
Gossip Based Aggregation |
|
Table 7. Created by the author: Anonymous and non-anonymous protocol implementation
General usage of AGP seems to be applicable where a gossip protocol is used for both dissemination and aggregation. NGPs are being used for the purpose of building topologies.
The performance of Rumor Mongering and Anti-Entropy has been discussed above. The performance of the remaining 3 algorithms can be seen using the speed of convergence.
Even though, convergence could hold different meanings, we will assume that convergence towards the goal of the algorithm.
Algorithm | Convergence Factor | Remark |
Gossip Based Aggregation | 0.3 -0.8 | Range differs based on the topology of the network. |
T-Man | 1 – 2 | Based on the no. of cycles done. |
Gossip-Based Peer Sampling |
| Values converge fast, and overall good performance is provided. |
Table 8. Created by the author: Algorithms performance considering the convergence factor
It is noticeable that topological awareness requires special mechanisms to share the known peers in NGP.
There are no clear pros and cons here as this class of algorithms are being implemented for very different purposes.
Error correction vs no-error correction
This is a category inspired by the fact that algorithms like Rumour Mongering have a non-zero probability of not spreading all the updates.
It implies that there might be nodes in the system that have not received updates. In contrast, Anti-Entropy is a reliable algorithm that ensures that all the updates reach throughout the system.
Pros of No-Error Correction algorithm such as Rumour Mongering is that they are very fast. The same is lacking in Error-Correction algorithm such as Anti-Entropy.
Other considerations
Several other classifications can be built by considering combinations of these different classes, for example, Anonymous Synchronized Gossip Protocol.
Other classes of Gossip Protocol are based on Location and other criteria such as:
Location-Based Gossiping Protocol
Chance-Based Gossiping Protocol
Fair Efficient Location-Based Gossiping
Fuzzy-Gossip
Fitness-Gossiping (FGP)
Acknowledgement
This Report was prepared under the guidance of Dr. Elena Simona Apostol at Universitatea Politehnica from Bucharest during the Distributed Algorithms course.