# mj1.atMichael Jaros' Techblog

15Jul/120

## Looking into the future with prediction markets

#### Posted by mj

Decision events of common interest such as elections or contests are often preceded by measures to predict their outcome. Conventional measures include polls and interviews. From a perspective of collective knowledge, the accuracy of such measures is naturally limited, because the opinions of insiders have the same weight as the opinions of clueless individuals.

Prediction markets [1] are a way to map the probability of an event to the price of a market share by allowing participants to bet on or against the event and aggregating their opinions. The advantage of this method emerges from a self-controlling mechanism of the market's participants: Insiders will place a much higher bet than individuals with little knowledge about the event [2]. Prediction markets thus draw much of their accuracy from insider trading, a behavior that is frowned upon or even prosecuted on many other markets. Participants generally have a motivation to get more information and thus increase their predictions' accuracy.

Prediction markets have not always been able to beat other methods' accuracy [3], but their predictions are considered better than that of "almost any of the individual participants in the market" [4]. However, prediction markets can suffer from problems known from traditional markets like market manipulation attempts and speculative bubbles. While prediction markets based on real currency are not legal in many parts of the world, real money (or real risk) is considered a key ingredient to their accuracy. Therefore many of the existing initiatives use virtual money combined with prizes for well-performing participants.

Early examples of successful prediction markets are the Iowa Electronic Markets [5] and markets used internally by well-known corporations such as Hewlett-Packard and Intel for sales- and production-related predictions.

References:
1. Wikipedia: Prediction Market []
2. Howe (2008): Crowdsourcing: - Why the Power of the Crowd is Driving the Future of Business []
3. Graefe et al. (2011): Comparing face-to-face meetings, nominal groups, delphi, and prediction markets on an estimation task []
4. Hubbard (2010): How to Measure Anything: Finding the Value of Intangibles in Business, Second Edition, Chapter 13 []
5. Iowa Electronic Markets (IEM) []
8Apr/120

## From SIR to AISI – adapting SIR for tweetflows

#### Posted by mj

The length of the infectious period $t_I$ as used in a SIR model corresponds best to the time a service request is visible for a follower on Twitter. This number is difficult to model however, because it depends on the number of tweets in the follower's timeline at the moment of the service request. In other words: The infectious period is a parameter of each potentially receiving node in contrast to biological contagion where the infectious period is a general parameter of the disease.

Instead of analyzing the infectious period for each individual node, I propose to heuristically classify nodes in the originator's follower network as active (A) or inactive (I) nodes. Only active nodes can be susceptible or infectious. AISI (active/inactive susceptible/infectious) means a Tweetflow-specific simplification compared to the SIR model: Instead of considering the infectious period for each vertex in the graph, a simple heuristical algorithm determines nodes that are likely to react to a service request at the beginning using data available via the Twitter API.

AISI actually adapts a concept referred to as percolation [1]: Each vertex in the follower network is considered open or closed. Instead of making this decision randomly, it is more accurate in the tweetflow scenario to classify vertices as open or closed based on existing node classification data (active/inactive).

Determining active nodes among the followers is obviously the first sub-problem that goes hand in hand with the AISI model. The problem can be solved by analyzing the frequency of passive or active Twitter use. The Twitter API currently does not provide any way to determine a user's last login date [2]. However the date of the last status update is available and should allow for a better approximation than the last login date because passive users are unlikely to retweet or answer a service request. The time window in which a last status update has to occur for the node to be considered active should be similar to a typical $t_i$.

Last but not least, an open vertex (leading to an active node) is a necessary, but not a sufficient condition for an infection of that node. The probability of infection for such an open vertex depends on several both general and node-based factors (e.g. skill, payoff).

References:
1. Easley and Kleinberg (2010): Networks, Crowds, and Markets, p.572 []
25Mar/120

## Infection modeling: The SIR model

#### Posted by mj

In the branching process model, a very simplified network structure is assumed. For real-world networks, a more general model should be applied. In order to allow for arbitrary graphs with cycles, we have to distinguish three states for each node:

• Susceptible nodes have not been infected yet and are therefore available for infection. They do not infect other nodes.
• Infectious nodes have been infected and infect other nodes with a certain probability.
• Removed (recovered) nodes have gone through an infectious period and cannot take part in further infection (neither actively nor passively).

Using these three states S, I, and R, and the length of the infectious period $t_I$ as an aditional parameter, a SIR model [1] [2] can describe infections in any network structure: Susceptible nodes are infected with a certain probability and infected nodes are removed from the model after the infectious period.

A SIR model assumes that a disease can be caught at most once by each node and is therefore adequate for the modelling of the tweetflow discovery phase. SIS (susceptible - infectious - susceptible) models allow re-infections and apply to many real-world diseases.

References:
1. Hethcote (1989): Three basic epidemiological models []
2. Easley and Kleinberg (2010): Networks, Crowds, and Markets, p.572 []
25Mar/120

## Infection modeling: The branching process model

#### Posted by mj

When examining the spread of diseases inside a population, not only the contagiousness of the disease, but also the structure of the network connecting the population determine the progress of the infection, as Easley and Kleinberg describe in [1]. Because messages in a social network spread in a very similar way as diseases or ideas, we try to model the discovery phase of a tweetflow invocation using infection modelling.

In tweetflow terms, the contagiousness of a disease for a node corresponds to the payoff (reward - effort) of the tweetflow and the skill of the node. The length of the infectious period corresponds to the period of validity (time to live, ttl). The severity of the infection could match the priority of a tweetflow, if applicable.

Diseases spread in a population as members of the population infect other members (biological contagion). Ideas can spread in the same way inside a social network (social contagion). However, there is an important difference in the way these types of infections are usually analyzed: In biological contagion, there is no decision-making, but a random choice (infection or no infection). Sometimes, these randomized models are useful for social contagion too, if the decision processes are too complex to model or have too many unknown parameters.

The simplest model for infections is the branching process. The contact network is considered a regular tree with k children per node, and the distance from the root is measured in waves. Beginning from this root, the (infected) originator, each infected person passes on the infection to each of the k people in the subsequent wave with probability p. The basic reproductive number $R_0=pk$ is the expected number of new infections caused by a single infected node. If $R_0>1$, the disease persists in the network, if R_0 < 1[/math], it will die out after a finite number of waves. So both a high infection probability and high numbers of connected nodes are factors of persistence of the disease.

References:
1. Easley and Kleinberg (2010): Networks, Crowds, and Markets []
10Mar/120

## Infection Phase of Tweetflow Execution

#### Posted by mj

Before a tweetflow can be executed, the requesting node must distribute it among its followers. We call this the infection phase: Starting from the requestor, each node in the follower network has three choices:

• Ignore: The infection stops at this node, none of its followers receives the request.
• Accept: The infection stops at this node, none of its followers receives the request, but the node signals that it is willing to fulfill the request.
• Retweet: The infection continues across this node, and all of its followers receive the request.

I have written a small python program that posts a message in a follower network. It then selects one of the 3 choices mentioned above randomly for each node that receives the message. The result can be drawn as a graph, where red stands for "ignore", yellow means "retweet" and green stands for "accept".

It is clearly visible that in order to reach a high infection rate:

• High follower counts are most important for nodes with a small distance to the requestor.
• Retweeting the information is most important for nodes with a small distance to the requestor.

The infection rate of a follower network can be seen as a random variable, so an expected value for the infection rate can be calculated if there are usable estimates for the probabilities of each choice (accept, retweet, ignore).

16Oct/110

## From Tuple Spaces to Tweetflows

#### Posted by mj

The coordination of asynchronous actors has always been a major problem in distributed systems and parallel computing. About 20 years ago, David Gelernter and Nicholas Carriero have introduced the coordination language Linda. The most important difference to existing approaches was that they treated the coordination problem seperately from the computation problem.

Linda implementations offer the four basic operations in, out, rd, and eval for access to a virtual shared memory called the tuple space. Developers do not have to worry about network protocols or message formats because coordination is achieved just by reading data from and writing data to the tuple space. This is facilitated by the facts that read operations block by default until data is available and data can be consumed when read.

Today, another worldwide space connects millions of asynchronous actors: The Twitter network has some features very similar to a tuple space, and it supports basic operations as well. One major difference is that Twitter users do not use the system for coordination in a uniform way so far. Tweetflows could be a solution that not only coordinates human actors, but also integrates computer-provided services.

Another interesting difference between Linda implementations and Tweetflows lies in the type of coupling provided by each system and the resulting semantics: As mentioned before, Linda's actors are only coupled by the data put into the space -- Tweetflows on the other hand implement some kind of message-passing.

1Sep/110

## Crowdsourcing Workflow Execution

#### Posted by mj

Service-oriented architectures map business processes to software components both inside and across organizations. Processes are described as workflows and define the coordination of activities performed by potentially distributed actors.

In my work I examine the requirements and implications of crowdsourced workflow execution in social networks. The traditional workflow approaches are well established in many domains. However they are not suitable for an increasing number of applications due to lack of flexibility, scalability and availability as well as need for simplicity and loose coupling.

Building upon scientific work by Martin Treiber and others (Tweetflows), I try to find a prediction model for the execution probability of crowdsourced lightweight workflows.

Abstract