Retry of Failed Writes

DougReeder · February 8, 2022, 4:08am

There’s no real need for remotestorage.js to have code to retry a failed read - the next regularly scheduled poll should succeed if the read failure was due to some temporary server condition.

Writes that fail due to a temporary server condition are more of a problem - if nothing is done, the document appears normal when accessed from the local store, but is not synced to a server. If the user is actively editing the document, presumably the app will alert the user that an error occurred. The user interface would give the user some mechanism to attempt to save again, either explicitly or implicitly. But if writes continue to fail, or the user doesn’t know to try, the document is still left in an anomalous state.

If the user is importing many documents, and some writes fail, it may not be clear which documents need to be saved again. Indeed, a number of document types (such as calendar items and bookmarks) can’t be accessed by anything like a file name, so it would be hard to inform the user which documents failed to save. Importing many documents is also more likely to trigger a failed write, as there’s no mechanism to control the rate at which write requests are sent. It’s also worth noting that importing is likely to be one of the earliest interactions a user has with RS - if errors crop up, it will be hard for the user to trust RS.

It’s not clear whether remotestorage.js or the app is responsible for retrying. Servers might send 503 Service Unavailable status with a Retry-After HTTP header, but I can’t find any code in remotestorage.js that references any 5xx status codes (aside from some Dropbox tests).

Is this scenario one that RS currently doesn’t handle well, or have I overlooked something?

raucao · March 5, 2022, 5:05pm

I tried to understand this issue, but as described I’m failing to see how this would not be a bug (which hasn’t been reported yet AFAIK). If a write fails, then I think the next sync should absolutely try to PUT that document again, no matter why it wasn’t synced last time.

Regarding bulk imports, I can see how servers could be improved by implementing tree-level locking/queuing in order to prevent race conditions with parent folder ETAG updates or something in that direction. Then again, the actual folder ETAGs aren’t really important (only that they change when a document in its tree has changed), since the worst case is just that the client looks for document updates in that directory tree again. But maybe it would prevent some exceptions or bugs depending on how the server is implemented.

Anything that isn’t 200 or 201 means the PUT failed:

https://github.com/remotestorage/remotestorage.js/blob/9bd1292a830dbea03c990788aa189c96ee70c7c3/src/baseclient.ts#L274

DougReeder · March 5, 2022, 10:04pm

503 Service Unavailable and 504 Gateway Timeout should definitely be retried. 500 internal Server Error, 502 Bad Gateway and 507 Insufficient Storage could also be retried.

raucao · March 22, 2022, 3:55pm

I have double-checked and verified that those are already retried automatically, as long as caching is enabled for the folder/path.

Any code that isn’t 200 or 201 should be retried automatically. As far as I can see, there is no need to discern between other response codes, since anything not successful means the document isn’t synced yet.

DougReeder · March 22, 2022, 5:10pm

That’s good to hear.

Do we have any automated tests for that? I’m not seeing any in remotestorage.js/baseclient-suite.js at master · remotestorage/remotestorage.js · GitHub or anywhere else.