haskell-distributed-next
diff --git a/‎_layouts/site.html‎
Lines changed: 1 addition & 1 deletion b/‎_layouts/site.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎img/OTP-Diagrams.png‎
54.7 KB b/‎img/OTP-Diagrams.png‎
54.7 KB
diff --git a/‎img/one-for-all-left-to-right.png‎
41 KB b/‎img/one-for-all-left-to-right.png‎
41 KB
diff --git a/‎img/one-for-all.png‎
51.5 KB b/‎img/one-for-all.png‎
51.5 KB
diff --git a/‎img/one-for-one.png‎
26.6 KB b/‎img/one-for-one.png‎
26.6 KB
diff --git a/‎img/sup1.png‎
10.2 KB b/‎img/sup1.png‎
10.2 KB
diff --git a/‎tutorials/5ch.md‎
Lines changed: 137 additions & 21 deletions b/‎tutorials/5ch.md‎
Lines changed: 137 additions & 21 deletions
diff --git a/‎wiki/contributing.md‎
Lines changed: 13 additions & 12 deletions b/‎wiki/contributing.md‎
Lines changed: 13 additions & 12 deletions
diff --git a/‎wiki/maintainers.md‎
Lines changed: 3 additions & 11 deletions b/‎wiki/maintainers.md‎
Lines changed: 3 additions & 11 deletions
diff --git a/‎wiki/reliability.md‎
Lines changed: 0 additions & 2 deletions b/‎wiki/reliability.md‎
Lines changed: 0 additions & 2 deletions
@@ -21,7 +21,7 @@
             <div class="carousel-caption">
               <h1>Get Started</h1>
               <p class="lead">Learn how to build concurrent, distributed programs with Cloud Haskell</p>
-              <a class="btn btn-large btn-primary" href="/tutorials/ch1.html">Learn more</a>
+              <a class="btn btn-large btn-primary" href="/tutorials/1ch.html">Learn more</a>
             </div>
           </div>
         </div>
 
@@ -1,36 +1,152 @@
 ---
 layout: tutorial
 categories: tutorial
-sections: ['Introduction']
-title: Supervision Principles
+sections: ['Introduction', 'Quis custodiet ipsos custodes', 'Isolated Restarts', 'All or nothing restarts']
+title: 5. Supervision Principles
 ---
 
 ### Introduction
 
 In previous tutorial, we've looked at utilities for linking processes together
 and monitoring their lifecycle as it changes. The ability to link and monitor are
-foundational tools for building _reliable_ systems, and are the bedrock principles
+foundational tools for building _reliable_ systems and are the bedrock principles
 on which Cloud Haskell's supervision capabilities are built.
 
-The [`Supervisor`][1] provides a means to manage a set of _child processes_ and to construct
-a tree of processes, where some children are workers (e.g., regular processes) and
-others are themselves supervisors.
+A `Supervisor` manages a set of _child processes_ throughout their entire lifecycle,
+from birth (spawning) till death (exiting). Supervision is a key component in building
+fault tolerant systems, providing applications with a structured way to recover from
+isolated failures without the whole system crashing. Supervisors allow us to structure
+our applications as independently managed subsystems, each with its own dependencies
+(and inter-dependencies with other subsystems) and specify various policies determining
+the fashion in which these subsystems are to be started, stopped (i.e., terminated)
+and how they should behave at each level in case of failures.
 
-The supervisor process is started with a list of _child specifications_, which
-tell the supervisor how to interact with its children. Each specification provides
-the supervisor with the following information about the child process:
+Supervisors also provide a convenient means to shut down a system (or subsystem) in a
+controlled fashion, since supervisors will always terminate their children before
+exiting themselves and do so based on the policies supplied when they were initially
+created.
 
-1. [`ChildKey`][2]: used to identify the child once it has been started
-2. [`ChildType`][3]: indicating whether the child is a worker or another (nested) supervisor
-3. [`RestartPolicy`][4]: tells the supervisor under what circumstances the child should be restarted
-4. [`ChildTerminationPolicy`][5]: tells the supervisor how to terminate the child, should it need to
-5. [`ChildStart`][6]: provides a means for the supervisor to start/spawn the child process
+### Quis custodiet ipsos custodes
 
-TBC
+Supervisors can be used to construct a tree of processes, where some children are
+workers (e.g., regular processes) and others are themselves supervisors. Each supervisor
+is responsible for monitoring its children and handling child failures by policy, as
+well as deliberately terminating children when instructed to do so (either explicitly
+per child, or when the supervisor is itself told to terminate).
+
+Each supervisor takes with a list of _child specifications_, which tell the supervisor
+how to interact with its children. Each specification provides the supervisor with the
+following information about the corresponding child process:
+
+1. `ChildKey`: used to identify the child specification and process (once it has started)
+2. `ChildType`: indicates whether the child is a worker or another (nested) supervisor
+3. `RestartPolicy`: tells the supervisor under what circumstances the child should be restarted
+4. `ChildTerminationPolicy`: tells the supervisor how to terminate the child, should it need to
+5. `ChildStart`: provides a means for the supervisor to start/spawn the child process
+
+The `RestartPolicy` determines the circumstances under which a child should be
+restarted when the supervisor detects that it has exited. A `Permanent` child will
+always be restarted, whilst a `Temporary` child is never restarted. `Transient` children
+are only restarted if the exit normally (i.e., the `DiedReason` the supervisor sees for
+the child is `DiedNormal` rather than `DiedException`). `Intrinsic` children behave
+exactly like `Transient` ones, except that if they terminate normally, the whole
+supervisor (i.e., all the other children) exits normally as well, as if someone had
+triggered the shutdown/terminate sequence for the supervisor's process explicitly.
+
+When a supervisor is told directly to terminate a child process, it uses the
+`ChildTerminationPolicy` to determine whether the child should be terminated
+_gracefully_ or _brutally killed_. This _shutdown protocol_ is used throughout
+[distributed-process-platform][dpp] and in order for a child process to be managed
+effectively by its supervisor, it is imperative that it understands the protocol.
+When a _graceful_ shutdown is required, the supervisor will send an exit signal to the
+child process, with the `ExitReason` set to `ExitShutdown`, whence the child process is
+expected to perform any required cleanup and then exit with the same `ExitReason`,
+indicating that the shutdown happened cleanly/gracefully. On the other hand, when
+the `RestartPolicy` is set to `TerminateImmediately`, the supervisor will not send
+an exit signal at all, calling the `kill` primitive instead of the `exit` primitive.
+This immediately kills the child process without giving it the opportunity to clean
+up its internal state at all. The gracefull shutdown mode, `TerminateTimeout`, must
+provide a timeout value. The supervisor attempts a _gracefull_ shutdown initially,
+however if the child does not exit within the given time window, the supervisor will
+automatically revert to a _brutal kill_ using `TerminateImmediately`. If the
+timeout value is set to `Infinity`, the supervisor will wait indefintiely for the
+child to exit cleanly. 
+
+When a supervisor detects a child exit, it will attempt a restart. Whilst explicitly
+terminating a child will **only** terminate the specified child process, unexpected
+child exits can trigger a _branch restart_, where other (sibling) child processes are
+restarted along with the child that failed. How the supervisor goes about this
+_branch restart_ is governed by the `RestartStrategy` given when the supervisor is
+first started.
+
+------
+> ![Info: ][info] Whenever a `RestartStrategy` causes multiple children to be restarted
+> in response to a single child failure, a _branch restart_ incorporating some (possibly
+> a subset) of the supervisor's remaining children will be triggered. The exceptions
+> to this rule are `Temporary` children and `Transient` children that exit normally,
+> therefore **not** triggering a restart. The basic rule of thumb is that, if a child
+> should be restarted and the `RestartStrategy` is not `RestartOne`, then a _branch_
+> containing some other children will be restarted as well.
+------
+
+### Isolated Restarts
+
+The `RestartOne` strategy is very simple. When one child fails, only that individual
+child is restarted and its siblings are left running. Use `RestartOne` whenever the
+processes being supervised are completely independent of one another, or a child can
+be restarted and lose it's state without adversely affecting its siblings.
+
+-------
+![Sup1: ][sup1]
+-------
+
+### All or nothing restarts
+
+The `RestartAll` strategy is used when our children are all inter-dependent and it's
+necessary to restart them all whenever one child crashes. This strategy triggers one of
+those _branch restarts_ we mentioned earlier, which in this case means that **all** the
+supervisor's children are restarted if any child fails.
+
+The order and manner in which the surviving children are restarted depends on the chosen
+`RestartMode` which parameterises the `RestartStrategy`. This comes in three flavours:
+
+1. `RestartEach`: stops then starts each child sequentially
+2. `RestartInOrder`: stops all children first (in order), then restarts them sequentially
+3. `RestartRevOrder`: stops all children in one order, then restarts them sequentially in the opposite
+
+Each `RestartMode` is further parameterised by its `RestartOrder`, which is either left
+to righ, or right to left. To illustrate, we will consider three alternative configurations
+here, starting with `RestartEach` and `LeftToRight`.
+
+-------
+![Sup2: ][sup2]
+-------
+
+There are times when we need to shut down all the children first, before restarting them.
+The `RestartInOrder` mode will do this, shutting the children down according to our chosen
+`RestartOrder` and then starting them up in the same way. Here's an example demonstrating
+`RestartInOrder` using `LeftToRight`.
+
+-------
+![Sup3: ][sup3]
+-------
+
+If we'd chosen `RightToLeft`, the children would have been stopped from right to left (i.e.,
+starting with child-3, then child-2, etc) and then restarted in the same order.
+
+The astute reader might've noticed that so far, we've yet to demonstrate the behaviour that's
+default in [Erlang/OTP's Supervisor][erlsup], and it's a default for good reason. It is not
+uncommon for children to depend on one another and therefore need to be started in the correct
+order. Since these children rely on their siblings to function, we must stop them in the opposite
+order, otherwise the dependent children might crash whilst we're restarting other processes they
+rely on. It follows that, in this setup, we cannot subsequently (re)start the children in the
+same order we stopped them either.
+
+[dpp]: https://github.com/haskell-distributed/distributed-process-platform
+[sup1]: /img/one-for-one.png
+[sup2]: /img/one-for-all.png
+[sup3]: /img/one-for-all-left-to-right.png
+[alert]: /img/alert.png
+[info]: /img/info.png
+[erlsup]: http://www.erlang.org/doc/man/supervisor.html
 
-[1]: /static/doc/distributed-process-platform/Control-Distributed-Process-Platform-Supervisor.html
-[2]: /static/doc/distributed-process-platform/Control-Distributed-Process-Platform-Supervisor.html
-[3]: /static/doc/distributed-process-platform/Control-Distributed-Process-Platform-Supervisor.html
-[4]: /static/doc/distributed-process-platform/Control-Distributed-Process-Platform-Supervisor.html
-[5]: /static/doc/distributed-process-platform/Control-Distributed-Process-Platform-Supervisor.html
-[6]: /static/doc/distributed-process-platform/Control-Distributed-Process-Platform/Supervisor.html
 
@@ -24,11 +24,10 @@ We have a rather full backlog, so your help will be most welcome assisting
 us in clearing that. You can view the exiting open issues on the
 [jira issue tracker](https://cloud-haskell.atlassian.net/issues/?filter=10001).
 
-If you wish to submit an issue there, you can do so without logging in,
-although you obviously won't get any email notifications unless you create
-an account and provide your email address.
+If you wish to submit a new issue there, you cannot do so without logging in
+creating an account (by providing your email address) and logging in.
 
-It is also important to work out which component or sub-system should be
+It is also helpful to work out which component or sub-system should be
 changed. You may wish to email the maintainers to discuss this first.
 
 ### __2. Make sure your patch merges cleanly__
@@ -47,7 +46,8 @@ local branch. For example:
 
 $ git checkout -b bugfix-issue123
 
-## make, add and commit your changes
+## add and commit your changes
+## base them on master for bugfixes or development for new features
 
 $ git checkout master
 $ git remote add upstream git://github.com/haskell-distributed/distributed-process.git
@@ -70,9 +70,9 @@ conventions page [here](http://hackage.haskell.org/trac/ghc/wiki/WorkingConventi
 
 1. try to make small patches - the bigger they are, the longer the pull request QA process will take
 2. strictly separate all changes that affect functionality from those that just affect code layout, indentation, whitespace, filenames etc
-3. always include the issue number (of the form `fixes #N`) in the final commit message for the patch - pull requests without an issue are unlikely to have been discussed (see above)
+3. always include the issue number (of the form `PROJECT_CODE #resolve Fixed`) in the final commit message for the patch - pull requests without an issue are unlikely to have been discussed (see above)
 4. use Unix conventions for line endings. If you are on Windows, ensure that git handles line-endings sanely by running `git config --global core.autocrlf false`
-5. make sure you have setup git to use the correct name and email for your commits - see the [github help guide](https://help.github.com/articles/setting-your-email-in-git)
+5. make sure you have setup git to use the correct name and email for your commits - see the [github help guide](https://help.github.com/articles/setting-your-email-in-git) - otherwise you won't be attributed in the scm history!
 
 ### __4. Make sure all the tests pass__
 
@@ -171,7 +171,7 @@ import Data.Blah
 import Data.Boom (Typeable)
 {% endhighlight %}
 
-Personally I don't care *that much* about alignment for other things,
+We generally don't care *that much* about alignment for other things,
 but as always, try to follow the convention in the file you're editing
 and don't change things just for the sake of it.
 
@@ -186,18 +186,18 @@ punctuation.
 
 Comment every top level function (particularly exported functions),
 and provide a type signature; use Haddock syntax in the comments.
-Comment every exported data type.  Function example:
+Comment every exported data type. Function example:
 
 {% highlight haskell %}
--- | Send a message on a socket.  The socket must be in a connected
--- state.  Returns the number of bytes sent.  Applications are
+-- | Send a message on a socket. The socket must be in a connected
+-- state.  Returns the number of bytes sent. Applications are
 -- responsible for ensuring that all data has been sent.
 send :: Socket      -- ^ Connected socket
      -> ByteString  -- ^ Data to send
      -> IO Int      -- ^ Bytes sent
 {% endhighlight %}
 
-For functions the documentation should give enough information to
+For functions, the documentation should give enough information to
 apply the function without looking at the function's definition.
 
 ### Naming
@@ -214,3 +214,4 @@ abbreviation.  For example, write `HttpServer` instead of
 Use singular when naming modules e.g. use `Data.Map` and
 `Data.ByteString.Internal` instead of `Data.Maps` and
 `Data.ByteString.Internals`.
+
@@ -117,17 +117,9 @@ What's good for the goose...
 
 #### Making API documentation available on the website
 
-Currently this is a manual process. If you don't sed/awk out the
-reference/link paths, it'll be a mess. We will add a script to
-handle this some time soon. I tend to only update the static
-documentation for d-p and d-p-platform, at least until the process has
-been automated. I also do this *only* for mainline branches (i.e.,
-for development and master), although again, automation could solve
-a lot of issues there.
-
-There is also an open ticket to set up nightly builds, which will
-update the HEAD haddocks (on the website) and produce an 'sdist'
-bundle and add that to the website too.
+There is an open ticket to set up nightly builds, which will update
+the HEAD haddocks (on the website) and produce an 'sdist' bundle and
+add that to the website too.
 
 See https://cloud-haskell.atlassian.net/browse/INFRA-1 for details.
 
 
@@ -28,8 +28,6 @@ child processes. A supervisors *children* can be either worker processes or
 supervisors, which allows us to build hierarchical process structures (called
 supervision trees in Erlang parlance).
 
-The supervision APIs are a work in progress.
-
 [1]: http://en.wikipedia.org/wiki/Open_Telecom_Platform
 [2]: http://www.erlang.org/doc/design_principles/sup_princ.html
 [3]: http://www.erlang.org/doc/man/supervisor.html