CS791: 11/16 Lecture Notes

2 papers today: (3 if we're lucky)

SRM (S. Floyd, V. Jacobson, C. Liu, S. McCanne, L. Zhang)
PPV (C. Papadopoulos, G. Parulkar and G. Varghese)
Digital Fountain (next time)

Slide 1

Difficulties in changing from a unicase to a multicast paradigm for data transport

ACK implosion : 1-to-1, there is data/ack balance 1-to-many, "acks >> data" and are hard to merge
General scalability issues 10-20 is easy. 100s of 1000s of clients is hard.
Heterogeneous clients: congestion control: receivers w/ different connectivity want good performance layering - a general/common approach packet loss handling/reaction

Slide 2

Defining Reliability for the purpose of Multicast:

"Reliable Transfers"
Goal: all rcvrs eventually receive all data No notion of "sequenced delivery" : only "total delivery" (abstraction = complete content // rather than // ongoing stream)
Timeliness not a major consideration (Except PPV infocomm98)
Multiple senders makes things more interesting "Very Much Harder problem" (whiteboard - "wb")
One sender (DF model - data dissemination)

Slide 3

SRM (Scalable Reliable Multicast): Key ideas

"1 sz fits all" doesn't work for more complicated applications
Application level framing (ALF): the app knows the right defn of reliability
Notion: SRM as "core" functions that ALL reliable multicast protos will need, app can then build additional functionality on top of it
- cong control
- seq, strict ordering
- assumes best-effort IP MC
Fate-sharing in unicast implies that either the sender or receiver can do loss recovery
- TCP :
  - sender-driver (sender retransmits until ACK)
  - fate sharing: if either end dies, the connection is gone, so sender and rcvr-driven recovery are equivalent
- Notion of WEAK fate-sharing:
  It's easy for all rcvrs to know about the server, harder for server to know about all rcvrs If a rcvr goes down, other rcvrs (even sender) generally don't care, therefore: Receiver-driven reliability (NAK) repeatedly until you get the data you want
- while this was being debated, lots of sender v. receiver papers claimed receiver preferable

Slide 4

Objective: "minimal" definition of reliable multicast

Use IP multicast as baseline/network paradigm
Rcvr-driven reliability
Target Application: "shared whiteboard"
- "moderate" number of participants
- all participants are (potentially) senders
  - Any sender can send a "drawop" - an idempotent operation
  - idempotency: ordering is not important, duplication is not an issue (bank deposits are NOT idempotent! drawops are!) reliable mc with duplicate uppression - HARD! (not touched in these papers)
- Receiver set may be unmaintainable - noone knows the complete set
- User/event Naming, esp of pieces of state, is an interesting and difficult problem (not discussed today)

Slide 5

Weak, "eventual" reliability

no attempt at globally consistent whiteboard
display whatever is currently available
fixup problems on the fly (naming conventions help here)

Feedback control (interesting part for us) is NACK-based

but not "all-the-way-to-sender" : that's unscalable (NACK implosion if pkt is lost close to source)
Receivers cooperate : other rcvrs can retransmit what we may have lost.
- NACK to "neighbors", one or more (hopefully) rcvs and re-xmits over MC group going across MC group - potential waste, but it's safe because of idempotency
- Receiver collaboration (and one can act as a retransmission source)
- Idempotency (duplicates are not a problem - defined away by the app)
- Scope + time reqs/resps appropriately ( one of the nice features of the SRM paper)

Slide 6

Algorithm for loss recovery (essential aspects of paper):

Sender S, hosts A, B
1st: A figures out he's missing something (remember - no sequencing between sources)
i.e. a gap in a sequence from a particular sender
2nd: Randomization ; before recovery attempt, wait for random timer (avoids implosion)
- morover - nacks scoped locally, so don't necessarily reach the source best algo not yet determined heuristics: set low TTLs and experiment.
Host B can service a request
- Set a random times - prevent response implosion
- could unicast repairs - bad:
  noone else knows you serviced - can't cancel their timers
  - doesn't help with clusteres of loss
- multicast repairs (again, w/ limited TTL)
- "local recovery, setting random timers"
Randomization suppresses both duplicate requests and duplicate responses
Question: "can we distinguish sender S from re-sender B?"
Answer: yes - IP sender address is different. not an issue for SRM, may be for others

Slide 7

Simulation / measurement : loss analysis

Simple topologies:
- req vs. repairs vs. repair latency (fig 3-6)
- repair latency is O(5rtt) - penalty of timers
- "small price to pay for traffic savings"
Feeling of mbone/community: it's hard to beat the SRM approach

Question: Gabe: "sltn to repair latency - what if we pair receivers?"
Answer: Problems:

how do you set up the pairs? How do you make them loss-disjoint? This maes them distant - higher latencies
what if your pair quits the session - keepalive, ping, etc pair-changing, etc
what if you both lose anyway

How realistic is expectation of people cooperating? SRM isn't dependent on any individual - just enlarge scope until you find someone who can help you
no knowledge about who is there

Question: Khaled: "what keeps C and T from re-transmitting after B?"
Answer: B's re-xmission also reaches C and T (does it? scoped? what size is the scope?) and they suppress.
What about out-of-order? Hmm - this makes it trickier.
This paper: "really nice ideas, seems to work well in practice, but all the issues haven't been solved"

Slide 8

Scalability issues:

session messages/ [[ representatives (in local domains: primarily responsible for loss handling in local subtree) ]]
setting timers - can of worms - need to know distances between hosts (rtt estimates?)
to set stuff correctly, we need pairwise distances - unclear how this can be known
local recovery - still open - many papers address it

Slide 9

Local Recovery (next paper will use this extensively)

goal: Loss neighborhood should coincide with traffic recovery (recovery is focused in "bad" areas)
- imperfect tactics
- what about multiple uncorrelated loss regions? - scaling problems (RLM)

Question: Khaled: "are we using app-level framing?"
Answer: yes - assumption about drawops and idempotency - drawops fit into individual packets (generally)
problem: what if an op spans multiple packets? have to send/request multiple packets. in-other-words: application-level objects are "named" so they can be requested. logically = "presentation layer" (if you believe in such a thing) point/issue : naming : sequence numbers may not be all that useful, whereas ADU (application data unit) may convey the needed (by application) information
assumption: non-conflicting namespaces (global-uique-host-IP:host-wide-unique seq)

Slide 10

PPV: A Response to SRM.

Problems w/ SRM:

Recovery latencies are too large (10-20rtts? what are you, kidding?)
Requestor/replier mapping may be very random
Excessive exposure to redundant replies if they are poorly scoped (or bad overlap cases)

Better job at addressing these 3 probs

Slide 11

Error Correction: Goals

Avoid req implision (SRM: good)
Minimize dup replies (SRM: good)
Minimize recovery latency (SRM: bad)
Maximize recovery isolation (SRM: weak)
Adapt to dynamic membership (SRM: good)
Minimize overall recovery traffic

Slide 12

Error Correction: Algorithms (Use picture fm paper, f.e. fig 4)

Small subset of interested nodes become "repliers"
- (repliership not mandatory - "announce candidacy" to upstream router who selects among its child candidates)
- Crucial idea: every subtree has a replier.
- How are they chosen? notdiscussed. Probably very important to chose carefully.
Once every subtree has a replier, we have a careful loss-handling scenario:
When a whole subtree has lost:
- client NACKS, goes to subtree's replier replier doesn't have it either, NACKs its parent node (propogate upward, toward server) until we find a replier who GOT IT.
- UNICAST a request UPSTREAM.
- implicit suppression: upstream links see only ONE packet even if whole tree saw loss, since only one request "ascends" the tree
Hard part: getting replies out to all who are interested
- Crucial router in this scheme: point at which upstream "turns round" to a downstream = "turning point"
- on the way back:want to MC, but only to the target subtree
- Any node hat has been a turning pt for a repair request, when it sees the response request (unicast), it transforms it into a multicast response wich it sends ONLY down to subtrees who asked for repairs
  - "sub-cast" : generalization of multicast
  - defn: "Multicast transmission (limited) to a particular sub-tree"
- alternative implementation: turning point notices repair multicast and limits it to a sub-set of MC-subscribed links

Why this approach is difficult: it requires routers to be smarter/do more.

Simulations in this paper = "good start at analyzing this" = powerful mechanism for local recovery

Clarification: "turning point" = router "above" a replier in distribution tree

Question: "Can a 'turning point' also be a replier?"
Answer: yes, but that would SEVERELY/MAJORLY change the router's role (as opposed to just a BIG change in the router's role)
Only a replier's NACK gets forwarded upstream - so router has to know who its immediate replier is

assume replier isn't dead
assume repliers are reliable and trustworthy
it's a problem - JB: "don't know what the solutio is, don't know if there is one"
also important: handle failure/death of repliers (keepalive, etc) THIS IS A GENERAL PROBLEM with hierarchical scheme. intact tree of trusted representatives is the baseline assumption
Lots of engineering involved - lots of work (potential for lots of bugs?)
Nice properties:

completely deterministic
make repair requets immediately, respond to them immediately
issues: implementation, electing repliers (ill-defined)

Anoter interesting projet: intelligent placement of repliers (open/unsolved problem)

Slide XXX

Pros/Cons:

Pro: Achieves objetives
Pro: reliable, sequenced delivery
Pro: scales "well" (transmissions are countable and understandable)
Pro: recovery traffic is low and localized
Con: engineering complexity of implementation
Con: election procedure unclear, difficult problem (esp if most clients don't want to be repliers)
Con: Routers must implement and activate subcast functionality

Slide XXX

That's it - next week:

Quiz thursday (45 mins)
Guest lecture (45 mins) - Mystery Guest!

Adam Bradley (artdodge@cs.bu.edu)