[tor/master] Topic documentation on our publish-subscribe architecture.

2 Dec 2019

commit d700dc7801a9e23ddda0d482eeb2b64ced3ca756
Author: Nick Mathewson <nickm@torproject.org>
Date:   Sat Nov 16 14:31:49 2019 -0500

    Topic documentation on our publish-subscribe architecture.
---
 src/core/mainloop/mainloop_pubsub.h |  32 ++++++++
 src/lib/pubsub/publish_subscribe.md | 141 ++++++++++++++++++++++++++++++++++++
 src/mainpage.md                     |   3 +-
 3 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/src/core/mainloop/mainloop_pubsub.h b/src/core/mainloop/mainloop_pubsub.h
index bd57c0c17..c02127401 100644
--- a/src/core/mainloop/mainloop_pubsub.h
+++ b/src/core/mainloop/mainloop_pubsub.h
@@ -14,9 +14,41 @@
 
 struct pubsub_builder_t;
 
+/**
+ * Describe when and how messages are delivered on message channel.
+ *
+ * Every message channel must be associated with one of these strategies.
+ **/
 typedef enum {
+   /**
+    * Never deliver messages automatically.
+    *
+    * If a message channel uses this strategy, then no matter now many
+    * messages are published on it, they are not delivered until something
+    * manually calls dispatch_flush() for that channel
+    **/
    DELIV_NEVER=0,
+   /**
+    * Deliver messages promptly, via the event loop.
+    *
+    * If a message channel uses this strategy, then publishing a messages
+    * that channel activates an event that causes messages to be handled
+    * later in the mainloop.  The messages will be processed at some point
+    * very soon, delaying only for pending IO events and the like.
+    *
+    * Generally this is the best choice for a delivery strategy, since
+    * it avoids stack explosion.
+    **/
    DELIV_PROMPT,
+   /**
+    * Deliver messages immediately, skipping the event loop.
+    *
+    * Every event on this channel is flushed immediately after it is queued,
+    * using the stack.
+    *
+    * This delivery type should be used with caution, since it can cause
+    * unexpected call chains, resource starvation, and the like.
+    **/
    DELIV_IMMEDIATE,
 } deliv_strategy_t;
 
diff --git a/src/lib/pubsub/publish_subscribe.md b/src/lib/pubsub/publish_subscribe.md
new file mode 100644
index 000000000..019591bc1
--- /dev/null
+++ b/src/lib/pubsub/publish_subscribe.md
@@ -0,0 +1,141 @@
+
+@page publish_subscribe Publish-subscribe message passing in Tor
+
+@tableofcontents
+
+## Introduction
+
+Tor has introduced a generic publish-subscribe mechanism for delivering
+messages internally.  It is meant to help us improve the modularity of
+our code, by avoiding direct coupling between modules that don't
+actually need to invoke one another.
+
+This publish-subscribe mechanism is *not* meant for handing
+multithreading or multiprocess issues, thought we hope that eventually
+it might be extended and adapted for that purpose.  Instead, we use
+publish-subscribe today to decouple modules that shouldn't be calling
+each other directly.
+
+For example, there are numerous parts of our code that might need to
+take action when a circuit is completed: a controller might need to be
+informed, an onion service negotiation might need to be attached, a
+guard might need to be marked as working, or a client connection might
+need to be attached.  But many of those actions occur at a higher layer
+than circuit completion: calling them directly is a layering violation,
+and makes our code harder to understand and analyze.
+
+But with message-passing, we can invert this layering violation: circuit
+completion can become a "message" that the circuit code publishes, and
+to which higher-level layers subscribe.  This means that circuit
+handling can be decoupled from higher-level modules, and stay nice and
+simple. (@ref pubsub_notyet "1")
+
+> @anchor pubsub_notyet 1. Unfortunately, like most of our code, circuit
+> handling is _not_ yet refactored to use publish-subscribe throughout.
+> Instead, layer violations of the type described here are pretty common
+> in Tor today.  To see a small part of what happens when a circuit is
+> completed today, have a look at circuit_build_no_more_hops() and its
+> associated code.
+
+## Channels and delivery policies
+
+To work with messages, especially when refactoring existing code, you'll
+need to understand "channels" and "delivery policies".
+
+Every message is delivered on a "message channel".  Each channel
+(conceptually) a queue-like structure that can support an arbitrarily
+number of message types.  Where channels vary is their delivery
+mechanisms, and their guarantees about when messages are processed.
+
+Currently, three delivery policies are possible:
+
+   - `DELIV_PROMPT` -- causes messages to be processed via a callback in
+      Tor's event loop.  This is generally the best choice, since it
+      avoids unexpected growth of the stack.
+
+   - `DELIV_IMMEDIATE` -- causes messages to be processed immediately
+      on the call stack when they are published.  This choice grows the
+      stack, and can lead to unexpected complexity in the call graph.
+      We should only use it when necessary.
+
+   - `DELIV_NEVER` -- causes messages not to be delivered by the message
+      dispatch system at all. Instead, some other part of the code must
+      call dispatch_flush() to get the messages delivered.
+
+## Layers: Dispatch vs publish-subsubscribe vs mainloop.
+
+At the lowest level, messages are sent via the "dispatcher" module in
+@refdir{lib/dispatch}.  For performance, this dispatcher works with a
+untyped messages.  Publishers, subscribers, channels, and messages are
+distinguished by short integers.  Associated data is handled as
+dynamically-typed data pointers, and its types are also stored as short
+integers.
+
+Naturally, this results in a type-unsafe C API, so most other modules
+shouldn't invoke @refdir{lib/dispatch} directly.  At a higher level,
+@refdir{lib/pubsub} defines a set of functions and macros that make
+messages named and type-safe.  This is the one that other modules should
+use when they want to send or receive a message.
+
+The two modules above do not handle message delivery.  Instead, the
+dispatch module takes a callback that it can invoke when a channel
+becomes nonempty, and defines a dispatch_flush() function to deliver all
+the messages queued in a channel.  The work of actually making sure that
+dispatch_flush() is called when appropriate falls to the main loop,
+which needs to integrate the message dispatcher with the rest of our
+events and callbacks.  This work happens in mainloop_pubsub.c.
+
+
+## How to publish and subscribe
+
+This section gives an overview of how to make new messages and how to
+use them.  For full details, see pubsub_macros.h.
+
+Before anybody can publish or subscribe to a message, the message must
+be declared, typically in a header.  This uses DECLARE_MESSAGE() or
+DECLARE_MESSAGE_INT().
+
+Only subsystems can publish or subscribe messages.  For more information
+about the subsystems architecture, see @ref initialization.
+
+To publish a message, you must:
+   - Include the header that declares the message.
+   - Declare a set of helper functions via DECLARE_PUBLISH().  These
+     must be visible wherever you call PUBLISH().
+   - Call PUBLISH() to actually send a message.
+   - Connect your subsystem to the dispatcher by calling
+     DISPATCH_ADD_PUB() from your subsystem's subsys_fns_t.add_pubsub
+     callback.
+
+To subscribe to a message, you must:
+   - Include the header that declares the message.
+   - Declare a callback function to be invoked when the message is delivered.
+   - Use DISPATCH_SUBSCRIBE at file scope to define a set of wrapper
+     functions to call your callback function with the appropriate type.
+   - Connect your subsystem to the dispatcher by calling
+     DISPATCH_ADD_SUB() from your subsystem's subsys_fns_t.add_pubsub
+     callback.
+
+Again, the file-level documentation for pubsub_macros.h describes how to
+declare a message, how to publish it, and how to subscribe to it.
+
+## Designing good messages
+
+**Frequency**:
+The publish-subscribe system uses a few function calls
+and allocations for each message sent. This makes it unsuitable for
+very-high-bandwidth events, like "receiving a single data cell" or "a
+socket has become writable."  It's fine, however, for events that
+ordinarily happen a bit less frequently than that, like a circuit
+getting finished, a new connection getting opened, or so on.
+
+**Semantics**:
+A message should declare that something has happened or is happening,
+not that something in particular should be done.
+
+For example, suppose you want to set up a message so that onion services
+clean up their replay caches whenever we're low on memory.  The event
+should be something like `memory_low`, not `clean_up_replay_caches`.
+The latter name would imply that the publisher knew who was subscribing
+to the message and what they intended to do about it, which would be a
+layering violation.
diff --git a/src/mainpage.md b/src/mainpage.md
index a2c5ec630..63a5b0a3f 100644
--- a/src/mainpage.md
+++ b/src/mainpage.md
@@ -41,6 +41,8 @@ Tor repository.
 
 @subpage time_periodic
 
+@subpage publish_subscribe
+
 
 @page intro A high-level overview
 
@@ -140,4 +142,3 @@ more connection types.
 
 A 'Node' (node_t) is a view of a Tor instance's current knowledge and opinions
 about a Tor relay or bridge.
-

    

nickm＠torproject.org

tags

participants (1)