I am reading Hacker's guide to Neural Networks. Since I am also learning Clojure, I tried to implement them in Clojure.
I would like the feedback about what could be more idiomatic and better in the below code. Also I have some problems which i have asked at the end.
Here I have simple gates like Multiply that take inputs from other gates and produce output. The circuit is made with the composition of these gates.
Two functions forward and backward are defined. forward computes the output value of the circuit when we start from some input value. backward computes the gradient or the pull that should be applied on the inputs to make the output value more positive.
backward uses forward for its calculation. Using backward we get some values (pulls) which after applying on the input would produce output value greater than before.
;; A single unit has forward value and backward gradient.
(defrecord Unit
[value gradient name]
Object
(toString [_]
(str name " : ")))
;; A Gate has two units of inputs.
(defrecord Gate
[^:Unit input-a ^:Unit input-b])
(defprotocol GateOps
"Basic Gate Operations: Forward and Backward are two
protocol-operations need to be supported by each gate."
(forward [this] "Give the output-value from its input - used in
going forward the circuit. ")
(backward [this back-grad] "Gives the gradient to its input -
argument has back-grad which is gradient from its output. Backward calcuates the
derivative for generating the backward pull."))
;; Unit is basic unit of cirtuit and hence simple operations.
(extend-protocol GateOps
Unit
(forward [this]
(:value this))
(backward [this back-grad]
{this back-grad}))
;; MultiplyGate gets two inputs and * their values going forward.
(defrecord MultiplyGate [input-a input-b]
GateOps
(forward [this]
(* (forward (:input-a this)) (forward (:input-b this))))
(backward [this back-grad]
(let [input-a (:input-a this)
input-b (:input-b this)
val-a (forward input-a)
val-b (forward input-b)]
(merge-with + (backward input-a (* val-b back-grad))
(backward input-b (* val-a back-grad))))))
;; AddGate add values of two inputs.
(defrecord AddGate [input-a input-b]
GateOps
(forward [this]
(+ (forward (:input-a this)) (forward (:input-b this))))
(backward [this back-grad]
(let [input-a (:input-a this)
input-b (:input-b this)]
(merge-with + (backward input-a (* 1.0 back-grad))
(backward input-b (* 1.0 back-grad))))))
(defn sig
"Sigmoid function : f(x) = 1/(1 + exp(-x))"
[x]
(/ 1 (+ 1 (Math/pow Math/E (- x)))))
;; SigmoidGate applies sig on input.
(defrecord SigmoidGate [gate]
GateOps
(forward [this]
(sig (forward (:gate this))))
(backward [this back-grad]
(let [s (forward this)
;; s is (sig input) i.e. output
ds (* s (- 1 s) back-grad)]
(backward (:gate this) ds))))
(defmacro defunit
"Creates a Unit that also stores the name of the variable."
[var-name body]
`(def ~var-name (~@body (name '~var-name))))
;; neural network : f(x,y) = sig(a*x + b*y + c)
(defunit a (->Unit 1.0 0.0))
(defunit b (->Unit 2.0 0.0))
(defunit c (->Unit -3.0 0.0))
(defunit x (->Unit -1.0 0.0))
(defunit y (->Unit 3.0 0.0))
(def ax (->MultiplyGate a x))
(def by (->MultiplyGate b y))
(def axc (->AddGate ax c))
(def axcby (->AddGate axc by))
(def sigaxcby (->SigmoidGate axcby ))
Running:
neurals.core> (forward sigaxcby)
0.8807970779778823
neurals.core> (clojure.pprint/pprint (backward sigaxcby 1.0))
{{:value 2.0, :gradient 0.0, :name "b"} 0.31498075621051985,
{:value 3.0, :gradient 0.0, :name "y"} 0.20998717080701323,
{:value -3.0, :gradient 0.0, :name "c"} 0.10499358540350662,
{:value -1.0, :gradient 0.0, :name "x"} 0.10499358540350662,
{:value 1.0, :gradient 0.0, :name "a"} -0.10499358540350662}
nil
Queries:
Backwardis inefficient. It calculates theforwardon same sub-circuit multiple times.Once I know the gradients, how do I apply them to the circuit's input values because they are immutable?
I think these problems show that there is a design flaw.
Any suggestions?