Approaches to Ordering and Searching Items

For my app, the items must be ordered (and that order is editable), and my app must be able to retrieve a subset of items, based on a search term. (The current app is at https://serenenotes.hominidsoftware.com/ if you want to take a look).

One possible approach to doing this with remotestorage is to hold all the items in memory, and do sorting and searching there. However, that does not necessarily scale to thousands of items. Also, before any items can be displayed, all items must be loaded and sorted. That makes for a slow app startup.

Are other approaches to ordering and searching possible? Are there any apps which do so?

For the JavaScript client library, it would be possible to add ordering and search functionality, as the data is locally stored in IndexedDB. If I were to implement that, would the changes be accepted?

1 Like

@DougReeder That would be most welcome! However, as not all apps need search and ordering, the current rough idea for this was to add it to the data modules, as a shared utility module that can be re-used in any module.

The last big step was to make this possible in remoteStorage.js 1.0 and all of the involved module architecture/packaging changes. So this field is wide open to be tackled now. More importantly, the necessary parts are now in place, so one can actually solve the problem of e.g. indexing and search.

I’m sure a few people here would gladly help with development, review, and testing something like that, as it’s one of the key missing pieces for implementing apps dealing with more complex and plentiful data.

Just to illustrate the point: a contacts module e.g. would just import remotestorage-util-search for example, then maybe have some declarative way of defining indexing, caching, and folder structure for its data. Another module, say email would also do that for search, but in addition it would also import remotestorage-util-pgp in order to deal with encrypted emails.

What do you think?

Just a side note: this is only the case when IndexedDB is available, of course. Otherwise it will fall back to localStorage (e.g. FF private window) or memory (e.g. node.js without Web Storage polyfills).

For efficient implementation, I think BaseClient would need to be involved. Data modules could specify the data be indexed on certain fields.

This might be a good opportunity to set expectations around number of items in a folder. I’m thinking the practical upper limit is governed by the longest practical JSON-LD document. That would be about 128 bytes * #items?

Maybe we could create necessary public API for whatever is needed and make it available to either all data modules or add separate loading for low-level utility modules. Perhaps if you prototype it in core while keeping modularity in mind, the best solution may present itself on the way.

I’ll start work on that.

Regarding # items in a folder, if we think a Gibibyte (1073741824 bytes) is practical maximum size of the JSON-LD for a directory, we might require servers to allow at least 8388607 (2^23-1) items in a folder. Then

  1. Server implementations know they have to either allocate a sizeable buffer, or stream out the JSON-LD.
  2. Utility programs, such as Inspektor, likewise know they have to either allocate a sizeable buffer, or use a stream to parse the JSON-LD. (Other apps don’t, because they will only read their own folders.)
  3. App and data module designers know what they can design for. If a movement-tracking app wants to store minute-by-minute activity levels, they know they can fit almost 16 years of data in one directory, and search and sort to pull the desired data for any given functionality.

Fantastic! Looking forward to seeing what you come up with.

I’m not sure I understand the relevance of maximum item count in regards to search. Caching-wise it’s definitely always better to have more folders with less items per folder, because update checks are per-folder first, and per-item second. Even very few large items that combine a bunch of data can sometimes make sense imo, e.g for things like a search index or a sort list.

Oh, by the way, the legacy contacts module has some indexing and search going on in the code. I never really looked into it, but I’m sure @michielbdejong would be able to tell you more about his approach there.

Apps where two-way syncing is important (such as contacts or bookmarks) will indeed gravitate to fewer items per folder.

Other apps are oriented around one or more devices creating records, and other devices slicing up the data for analysis. (Exercise-tracking apps are in this category.) Such apps may not sync much, and will gravitate toward many items per folder (if allowed), so the searching and sorting is simpler.