1

A node.js client application needs to synch a folder with a remote node.js server. Both are running on windows. The synch only needs to be one-way, from server to client and some way of knowing when it is completed would be good. Bandwidth is not a key consideration, an entire file could be re-downloaded if there is a partial change. As far as frequency goes, 15 minute batch update attempts would be ok for example.

What approach or library would be preferable to say, passing xml representations of the folder contents and downloading each changed file?

Thanks

4
  • 2
    This sounds like a very broad question. You're basically asking how to implement a dropbox-like sync client and server. There are a lot of approaches to data synchronization. One could write an entire book about all the various approaches, each with different advantages and disadvantages depending upon exactly what you want to optimize for, how continuous your sync is, how much you have to deal with potential sync conflicts from multiple clients making conflicting changes, how much you have to worry about bandwidth usage, whether you monitor continuously or sync in batch, etc... Commented Mar 13, 2015 at 0:46
  • Thankyou jfriend00, the changes would be in one direction, towards the clients apps, so not concerned with synch conflicts, if that makes sense. Bandwidth not a big problem, an entire file could be redownloaded if there is a partial change. Say 15 minute batch update attempts would be ok for example, if that helps. Commented Mar 13, 2015 at 3:39
  • Your question should be edited to say it's one-way sync only from server to client because that drastically simplifies the problem (no sync conflicts, only looking at changes on the server). Can you elaborate on what you mean when you say "folder updates need to be atomic"? Do you mean that an entire folder must be updated together or not at all so there could never be a situation that one file in the folder was updated and another is pending an update, but did not complete yet. If so, that adds some complication. Commented Mar 13, 2015 at 6:11
  • Thanks for the guidance jfriend. I have edited the question to one-way synch and instead of atomic to some way of knowing when an update synch is completed. Commented Mar 13, 2015 at 23:48

2 Answers 2

2

The simplest way I can think of to write your own one-way sync of a single directory of files works as follows:

  1. The client collects a list of the files it currently has and some identifying version information for each file (version number, CRC, orig file creation time-date).

  2. Client sends that list to the server in an ajax request.

  3. Server receives the list of client files and compares it to its own file list. It then returns back to the client three lists of files: 1) files to update by downloading the newest version, 2) files on the client to remove, 3) new files for the client to download. Lists 1) and 3) could be merged in some implementations, but sometimes it's useful to know which files are new.

  4. The client goes to work processing those commands, downloading new/changed files and removing any files that should be removed.

  5. When the client has finished the download, it can create it's own notification that the process has completed.


There are a couple key aspects to this process. First off, some sort of identifying version information is important. The simplest scheme here is that the server keeps track of a monotonically increasing version number for each file such that each time the file is changed on the server, that version number is increased. When the file is transferred to the client, the client also knows that version number and the version number cannot be lost. If it is not convenient to store a separate version number, it is possible to use the file modification date/time, but the client will have to be very careful whenever it updates it's own files to set the modification date/time to exactly what it is supposed to be to match the server's date/time rather than just accept the date/time that it was last written to locally on the client because that isn't the last server modification time.

Version numbers can also be stored in the filename as an identifiable suffix such as core-scripts-v11. In this case, the actual filename to the outside world would be core-scripts, but it would be stored in the repository as core-scripts-v11 to indicate that it is version 11. If this file is changed to a new version, that new version would become core-scripts-v12. Any comparison of this with the client file list would need to compare both core name and versions separately, not just raw filenames.


If you want an atomic sync operation, where a consistent set of files is always transferred and you can never get part of a newer batch of files and part of an older batch of files, then a bunch more work must be done. When files are updated on the server, they must be updated in an atomic way so that a client in the middle of syncing with a prior version is not interrupted. This would most likely be done by maintaining several versions of the server repository so that a client syncing with an existing version of the repository can continue and finish syncing with the repository and the installation of newer files won't interrupt that. Again, there are many possible ways to solve this particular problem.

Sign up to request clarification or add additional context in comments.

Comments

2

You are looking for a clone for dropbox that will monitor files for changes and so on, so may i suggest:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.