How to tell the baseClient you have data to store in the next flush

michielbdejong · March 25, 2014, 4:51am

as an example, let’s consider how litewrite wants to pass data to the documents module and thus to the baseClient on each keystroke.

the current situation (0.10) is that each change triggers an IndexedDB write, but the corresponding http PUT is only initiated if none is currently running. this is acceptable if you average < 5 keystrokes per second (an IndexedDB commit takes about 200ms).

but storing keystrokes is not the only example - yesterday i tried to write a script that exports all tosdr mailing list posts out of my mailbox (about 1000 out of 20,000 email messages), and ran into a situation where there were 2000 IndexedDB gets running, and 400 IndexedDB puts, and because of these long queues, requests were taking up to 5 minutes to complete.

it is not the IndexedDB access by itself that is slow, only write commits are slow. so we need a way to group IndexedDB writes into batches.

michielbdejong · March 25, 2014, 4:55am

i tried to add this writes-batching at the SyncedMap level (the datastructure used to store data in-memory in the module), but it would still be inefficient to have a registry of dirty entries and a flushing timer for each SyncedMap instance, so i thought it would be better to move this into the baseClient, and add an ‘askMeForObject’ and ‘askMeForFile’ function to the baseClient, which will call a callback for when it’s time to flush. you could then call this function as often as you want, but the baseClient would take care of flushing each dirty entry only once in each round

michielbdejong · March 25, 2014, 5:00am

then i realised that this ‘askMeFor’ API would be silly, you might as well pass the latest value instead of a callback, and then the baseClient has the latest value always, flushing it when it’s time. it could then also guarantee that unflushed changes are reflected in gets. so that effectively amounts to adding an in-memory cache layer into the baseClient which caches unflushed writes, but nothing else (as soon as a change is written through to IndexedDB, it would be flushed from the baseClient-level cache, to keep it small).

doing this at the baseClient level would avoid code-duplication at the module level, and most importantly, hide this optimization from the module developer, yet make it effective also for existing apps without the need to change any module- or app code.

michielbdejong · March 25, 2014, 5:12am

i propose to make the new behavior of baseClient .storeFile, .storeObject and .remove:

when called, the change is only stored in a list of pending changes
every 10 seconds or so, one batched call to remoteStorage.local is made,
the whole list is emptied, and can build up again during the next 10 seconds
the baseClient checks this list on reads to avoid inconsistencies

michielbdejong · March 25, 2014, 5:31am

obviously we need to release 0.10 first before starting to work on this