Folder organization for performance

rosano · April 9, 2020, 3:02pm

I’m working on an app where the user will generally create a few Parent objects that have thousands of Child objects, and in the application it only makes sense to show Child objects in the context of their Parent. If the total number of Child objects will probably not exceed 100,000, I would like to know if:
a) it is worthwhile to store the Child objects in a subfolder of the Parent (/parent_objects/ID123/child_objects/…) to avoid fetching all at once and then filtering, or if
b) it makes no performance difference and I can keep my simple ‘lazy’ organization of all Parent objects in one folder and all Child objects in another folder (/parent_objects/… and /child_objects/…)

It seems to me that remotestorage’s “module” pattern is designed for the second option, so I would appreciate any links if there are examples of the first one.

raucao · April 10, 2020, 11:20am

In order for you to be able to find the best layout for your particular use case, I’ll explain how RS sync works in general.

Every document and folder carries an ETAG, which will change when the document, or a document in the folder is created, updated, or deleted (as a shorthand we just call all of these “modified”).

With remoteStorage.js, when you tell it to cache a certain path, i.e. a root folder/directory, it will regularly check only the root folder for a changed ETAG. When it sees an update, it will check the ETAGs in that folder’s listing to find out what has changed. If one or more subfolders carry new ETAGs, it will do the same for all updated subfolders, until it has found all documents that have actually been modified.

So the considerations I see are:

The more items you have, the larger your folder listings will be. Downloading larger folder listing anew is more expensive than downloading a smaller one.
The more items are being modified regularly, the more often clients have to download the folder listing for them.

In my personal opinion, a rule of thumb from these considerations is that, the more items you expect, the more folders you want to create (usually). And that it is the job of the data module to deal with the complexities of handling more folders.

In fact, for my own bookmarking app, I’m already considering adding more folders for various types of bookmarks, so that I can sync them more efficiently. For example, when “read later” bookmarks go in their own folder (until they’re archived), clients only have to sync that much smaller folder to update the reading list in their app. Also, apps can then decide to not even care about “read later” bookmarks or archive bookmarks, which means they don’t have to sync and cache those types at all in the first place.

raucao · April 10, 2020, 11:24am

The “module pattern” is first and foremost designed for granular access control via OAuth scopes/permissions, so that apps only need permissions for the types of data they actually access and/or modify.

It is also designed to abstract away the data layout, and also other functionality, from apps themselves to a certain degree, so that different apps can more easily access and modify the same data. This also means that the module can help with sync performance by way of establishing a folder layout, as well as folder management features.

rosano · April 12, 2020, 7:31pm

Thanks so much for sharing these details, really helps to understand what’s going on. I thought about it for a while and I think the ideal organization for thousands of objects might be something like /parent_objects/ID123/child_objects/creation_date/…, which would put them into somewhat meaningful groups while not requiring to check a tree that contains thousands of items. Maybe I can add a note about this to the documentation about data modelling?

raucao · April 13, 2020, 8:10am

Yes, storing items by date (with varying resolutions depending on use case) is a common pattern actually.

For example, the chat-messages module stores one archive per day, and daily archives are stored in the folder for the respective month.

One consideration there would be that changing the date of an item would require deleting and re-creating it, if it needs to go in a different parent folder. (I thought about this in the past, and I think we should add rename/move function to remoteStorage.js for this.) But many cases do not require ever changing the date anyway. Just adding this thought for a complete picture.