remoteStorage

What should the default caching behavior be?

continuing the discussion from https://github.com/remotestorage/remotestorage.js/pull/595#discussion_r9298194

@raucao storing data locally by default is what http://offlinefirst.org/ is all about, and it’s not less important on mobile than on desktop (rather the opposite, actually).

as briefly mentioned in https://github.com/remotestorage/remotestorage.js/blob/sync-per-node/doc/data-format.md#caching-strategies caching control is now more fine grained than just “on”/“off”/“tree only” which it used to be:

  • SEEN: cache all documents that were seen since the session started, including their parent folders up to the root (corresponds to the old “no”)
  • SEEN_AND_FOLDERS: for a given subtree, also cache all folders under there (corresponds to old “tree-only”)
  • ALL: cache the whole subtree, folders and documents (corresponds to old “on”)
  • flush(): flushes all cached data in the subtree

the default behavior in 0.6 and 0.7 was “ALL”
the default behavior in 0.8 and 0.9 was like “SEEN” for outgoing, and no caching at all for incoming nodes (unless they had also been cached as outgoing before).

given @raucao’s request we can add the option back to not cache incoming items by default. do you also want an option to automatically flush outgoing items as soon as they have been successfully pushed out?

once it’s clear what exact caching behavior people desire, we can give them names (maybe a name like “OUTGOING” for the first option or “FLUSH” for the second)

btw i think the second option would make more sense than the current behavior - even though as a cache-all-the-things fan, i wouldn’t be using it much myself :slight_smile:

For a lot of apps, that is true, but there are apps, like e.g. Sharesome, that are designed for online-use only, and the use case only applies when you’re online while using it. At the same time it’s a lot of data, that you don’t want to cache at all, especially not by default when you explicitly turn caching off.

As stated on GitHub, when I turn caching OFF, I expect that to be the default. Why else would I say “don’t cache this”?

I don’t even understand the reasoning behind this question. Of course I don’t want to keep outgoing items in cache, when I turn off caching. Why would I want to keep a copy of all files uploaded via Sharesome in local storage, ever?

That all sounds very complicated even to experienced users of the library, not even speaking of newcomers, who will not understand what they’re setting. There’s no way around having sane defaults, and to me that means not caching things, when people turn caching off.

You’re also completely forgetting about use cases like node modules etc, which only run certain tasks and don’t need to cache anything. You’d just increase their memory usage with no benefits at all. That sort of thing happens whenever you assume use cases for people, or rather people not having use cases, just because you yourself don’t have it.

Anyone else thinks it’s a good idea to not have a no-cache option by default?

I am confused.

What means caching here? Saving it in the Caching Layer(indexedDB by default).

the behaviour of the 0.8 version was: when caching is turned off the caching layer won’t cache anything, the requests were forwarded to the wireclient or other remote backends. And the promise was fulfilled as soon as the push finished or the get completed. If the same data would have been requested again, it would again send a request out to fetch it. Version controle was impossible because the Etags hadn’t been saved.

So SEEN caches documents that had been requested in indexedDB and the whole folder tree, which does not correspond to the Caching disabled mode of 0.8

SEEN_AND_FOLDERS now shall caches the whole tree and in addition to that each document that had been explicitly requested or written?

Did I get that right?

So there is no way around having explicitly requested or created items in my caching-layer? When the app crashes between the storing an item and the push changes out step they just linger around in indexedDB until I reload the tab and connect again with my storage?(I can construct scenarios where this would be a security issue).

How do I get notified about my changes actualy pushed out, do I have to bind to the next sync-done event and then call flush() for the written document if I don’t want to have it in cache?

for all these use-cases you can just use the nocache build.

How is that a response to “this doesn’t make any sense as default behaviour”?

I agree that it might make sense to recommend that, but then we need to rename the caching options for the other builds and remove the current “cache-off” config syntax (and throw an error), so that nobody runs into unexpected behaviour when upgrading.

ok, let’s do:

  • no-cache build: nothing is ever stored persistently, except for things like the bearer token and the user address
  • standard build: the api changes and the new config options are as follows:
    • FLUSH (incoming data is never cached; outgoing data is only persisted to make retries possible, and flushed afterwards)
    • SEEN (incoming and outgoing data is cached when seen, flushed on disconnect) default
    • SEEN AND FOLDERS (like SEEN, plus all folders are fetch pro-actively during idle time)
    • ALL (same, also for documents)

when setting a caching strategy for a certain subtree, it will trigger flushing or syncing as appropriate, so it also make sense to switch caching strategies for different subtrees as the user navigates to different parts of the app (e.g. archives, infinite scroll).

the FLUSH strategy doesn’t currently exist, i’ll add it.

Ok, let’s do it like that then.

Sigh, changing the API again :slight_smile: this time let’s try to make it stick?

I must say, I’m not sure I understand these different options completely. Namely why “SEEN AND FOLDERS”?

The name “SEEN” is a little weird and feels counter-intuitive, but that could also be because I don’t really understand the new features completely.

What is a case where you’d want SEEN but not SEEN_AND_FOLDERS? (By folders I assume you just mean the existence of folders, so empty directory structures. Is that so resource intensive that we need a separate flag for just that?)

Also, I assume the only difference between SEEN and ALL is that SEEN just caches stuff you’ve actually touched, whereas ALL will pull down historical data that you may not even be using at all. Right?

yes. the setting to only pull down folders and not documents was introduced somewhere in the 0.6 days iirc mainly because of the remotestorage-browser app i think. it was accessed by passing {data: false} as an extra parameter, and was referred to as ‘treeOnly’ in the code, i think. so it’s not a new feature, it’s actually a legacy feature. i personally don’t find it that useful either. if nobody wants it then we could also remove it, that would be a significant simplification.

the difference between SEEN and ALL is indeed as you describe. suggestions for better naming welcome!

+1 for increasing simplicity and clarity. FLUSH, SEEN, ALL seems good enough to me.

Why not have cache on/off - without any parameters, sets up reasonable behavior of, essentially, SEEN.

This, in my opinion, greatly simplifies things for users.

Then, additional options can be:
TREE: true, preemptively retrieve directory tree
ALL: true, preemptively retrieve all documents in the module.

This way, caching is kept simple, and the normal behavior is what’s defaulted. Then preemptive behavior is referenced as a param. Rather than confusing the namespace with two different areas of behavior.

We do still have caching.enable and disable as shorthands. But I just realized they’re not using the default of seen:

https://github.com/remotestorage/remotestorage.js/blob/sync-per-node/src/caching.js#L90-L92

We should change that to use the default, as it would also silently change the behaviour of existing apps otherwise.

Scratch that. I think I still haven’t understood all vs seen. Which in itself is probably the best indicator that it needs MUCH better explanation for app developers, because I was even reviewing that pull request and still don’t get it. When we’ve reviewed the sync part, we’ll know what it means and then propose some explanation/documentation for it (or to change the options).