There's a couple things to consider. The thread-per-subsystem route is easy to think about since the code separation is pretty apparent from the get go. However, depending on how much intercommunication your subsystems need, inter-thread communication could really kill your performance. In addition, this only scales to N cores, where N is the number of subsystems you abstract into threads.
If you're just looking to multithread an existing game, this is probably the path orof least resistance. However, if you're working on some low level engine systems that might be shared between several games or projects, I would consider another approach.
It can take a bit of mind twisting, but if you can break things up as a job queue with a set of worker threads it will scale much better in the long run. As the latest and greatest chips come out with a gazillion cores, your game's performance will scale along with it, just fire up more worker threads.
So basically, if you're looking to bolt on some parallelism to an existing project, I'd parallelize across subsystems. If you're building a new engine from scratch with parallel scalability in mind, I'd look into a job queue.