Public protocols

rosano · April 26, 2020, 1:41am

Since we brought it up on the call last week, I was thinking about the issue of multiple apps trying to use the same schema to achieve interoperability (for example multiple note-taking apps writing to /documents/notes with title, content…) and how it doesn’t seem to be universally adopted.

It’s possible that coupling the schema with the app and perhaps with the ‘storage path’ may feel limiting. It would be easier if there was no need to worry about breaking compatibility with another app, if each developer can organize things however they want and still have some way to expose their data to other apps.

A solution for this could be ‘public protocols’ for various types of content that are common use cases in remotestorage apps: maybe similar to what https://schema.org does for structured data, we can define for notes, files, photos, links, etc… I imagine this as something that works at the library level instead of the application level:

a developer can create a completely custom schema for an app
can optionally do a little extra work to specify a mapping between their custom schema and the public protocol
the library can translate between the two.

This way:

each app can organize the object locations and schemas in ways that best suit the app
compatibility with other apps is maintained in a more robust way than what exists the moment
the shared protocol can advance at a controlled pace, collectively, publicly
the library can maybe handle migrations for older versions of a protocol
unintended possibilities of presenting data may manifest (maybe an app that is not for note-taking can expose its data in plaintext)

Not sure if this makes sense from a technical perspective but I’m just thinking about the developer experience.

raucao · April 26, 2020, 9:46am

I think you basically decribe what remoteStorage.js data modules are intended for.

From my point of view, the main issue so far wasn’t so much the technical possibility of sharing the modules between apps (which exists and has been done for a few, e.g. bookmarks). But I think it’s a problem that there just aren’t many existing modules, and usually new app developers don’t want to start new shared modules, when they’re only just getting started with remoteStorage to begin with.

What you call “protocol”, I would perhaps call “data model”. What we’re talking about is the documents’ JSON-LD vocabulary (which we describe using JSON Schema in rs.js data modules), combined with the data ontology, i.e. folder and document structure, links between documents (not supported in current modules, but likely necessary), and the naming of them all (including category names).

So, based on what you wrote, on what we currently have, and on what was discussed in the community thus far, personally, I would imagine that we need some kind of registry for the data models, which (ideally) would be generic enough so that it could even be used with other protocols (e.g. SOLID), or even with custom APIs (like e.g. ownCloud/Nextcloud). And perhaps we could/should make it easier to create your own models/modules, when you’re only starting out with RS, using things like best practices documentation, tooling, etc.

However, I’m not sure it’s desirable to make it too generic from the get-go, because you wouldn’t want to standardize before experimentation and usage, in my opinion. The first step I would propose is to separate the schemas from the code of the data modules, and to think about how we can (internally) describe and standardize things like folder structure and naming, as well as advanced use cases like linking (both schemas and documents), as well as search indexes and such.

I would be very interested in what others are thinking about this topic. Let’s ~~discuss~~ brainstorm!

raucao · April 28, 2020, 1:30pm

In order to make the brainstorming of potential routes a bit easier, here is an example of how I’m using remoteStorage without currently having any shared data model or rs.js module for it:

Public profile information

I’m storing some public profile information in my storage, which (as of now) I am the only producer and consumer of. However, I actually intended this as something that could be retrieved from other people’s storages as well in the future.

Use case: As a frequent traveler, I want other people to know where I currently am, and also in which timezone. To render it nicely on my personal website, I also want people to see a map with a rough location. So I use this simple app (source code) I wrote in order to upload two documents to my /public/profile:

Here’s the current content of the current-location:

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [
      12.127262,
      47.8539273
    ]
  },
  "city": "Rosenheim",
  "state": "Bavaria",
  "country": "Germany",
  "timezone": {
    "name": "Europe/Berlin",
    "now_in_dst": 1,
    "offset_sec": 7200,
    "offset_string": "+0200",
    "short_name": "CEST"
  }
}

As you can see, I didn’t need to invent my own format for the location from scratch, as GeoJSON already standardizes how to describe a geographic location. However, that’s only true for the location data itself, not including e.g. city/state/etc. names or timezone information. (The GeoJSON spec allows using arbitrary custom properties in addition to the standardized ones.)

The PNG file contains the rendered map I mentioned. Currently, it looks like this:

current-location

Both the city name and map image are then rendered by a few lines of vanilla JS on my personal website. But obviously, it could be useful in a lot of different scenarios and more complex applications.

Questions

Now, these things would still easily fit in a current rs.js data module, which would have a few functions to store and retrieve the location and timezone data. I could extract some things from the rs-locate code, and maybe the module would even allow you to use OpenCage (geocoding) and Mapbox (map rendering) API keys with it, so it could do everything that the app currently does.

However, this raises some questions:

How could the module “import”/extend the GeoJSON format and schema instead of redefining everything in our own schema?
Should the schema actually be specified in the module code, meaning an “extended location object” (or whatever the name/@type would be) cannot be shared across modules?
Should the /profile/current-location (and current-location.png) naming scheme be hardcoded in a rs.js module only? In essence, these are well-known URIs, same as the /.well-known/ ones that are specified over at IANA for example (but only for RS storage base URLs in our case).
How would I go about proposing this for standardization (simply meaning: please, nobody else use these URIs for other things in their RS apps, and please do it the same way, if it solves a use case in your app)? How much peer review do we want to consider something standard? How would we discern between “standardized” and “custom” data?
If any of this would be published separate from rs.js module code, then where and how? And how could it be used in either rs.js directly, or from rs.js modules?
What about specifying details that don’t seem important, but can make or break UX in apps? E.g. the aspect ratio and resolution of the map image.

I realize the map image is a bit of a special case, but we actually already have a similar issue in the shares module (used in Sharesome e.g.), with the thumbnails it uses for previews.

The naming (or at least structure/ontology) for paths and documents, on the other hand, is something we need for virtually all data models/modules. Same for re-using, and linking and/or extending existing schemas I suppose.

These are a lot of questions, but anyway, I hope this creates a basis for some brainstorming. Let’s just throw around some ideas for potential answers to these questions, and maybe we can see some shapes forming.

rosano · April 30, 2020, 5:28pm

Cool idea - I wanted to make something like this so many times!

This made me think of another question: What aspects of RS are to be considered more like a filesystem (where the end-user controls where documents are stored), and what aspects are more like an API (where the apps decide where to store things and the user is not involved)?

I always imagined the ‘public profile information’ idea as an external service that asks for a URL that contains this data in the format it expects (either structured data as you presented, or a list of URLs that contain structured data). It wouldn’t matter where you store it (as a user or an app), as long as it is accessible via a public URL and formatted appropriately.

In the world of native apps, they don’t standardize anything and sandbox app data with reverse domain identifiers like com.xyzname and I think this is what I do instinctively with my RS apps; it’s ugly and closed but there’s no conflicts because every app manages their own space. I believe in the benefits of it being accessible by other apps but I think it’s challenging to try and standardize this in a shared space like storage, so I was pitching a ‘conversion’ functionality that works at the library level.

raucao · May 6, 2020, 9:37am

Just stumbled upon https://github.com/YousefED/typescript-json-schema and thought I’d leave this here.

By the way, I just had another look at JSON Schema linking/extending, and that’s actually very straight-forward, just using $id URLs (or JSON pointers) and $ref. So I believe the details are mostly a question of tooling we set up around whatever registry scheme we come up with. For the JSON-LD data itself, and the client/module implementation of e.g. fetching (or storing) related documents, I’m not so sure.

raucao · May 12, 2020, 9:38am

Would be nice if more people joined the brainstorming here. Even just adding more questions to be answered would help a lot…

raucao · May 23, 2020, 8:03am

@michielbdejong Maybe you could add a bit of perspective/experience from SOLID here? How does data model/format standardization work for SOLID apps?

michielbdejong · June 8, 2020, 12:55pm

Yes! It’s a bit of an open question still in Solid, we have https://github.com/solid/solid-panes/blob/master/Documentation/conventions.md which I plan to move under https://github.com/pdsinterop so that it hopefully we can somehow achieve interoperability between Solid, remoteStorage, and Nextcloud. We received some funding from NGI-Zero:PET and I also applied for NGI-Pointer funding which would help me to work more on this in the coming year.

I’ll look into your geo-location example and compare it to https://www.eventbrite.com/e/solid-world-june-tickets-104631158612# as well. Basically what Sharon did with https://github.com/SharonStrats/find-my-friends was to share location at 3 levels of precision, and you can choose which level is publically viewable (e.g. a 100km x 100km box), and then your closer friends see a more precise location (e.g. a 100m x 100m box).

melvincarvalho · September 24, 2020, 1:01pm

Good thread!

So there’s some advantage in reusing terms such as title, content etc. across different apps. The benefit is that you can join objects together or lists of objects together from different sources.

There’s two ways to do this

Simply use strings as keys, then join and merge on that string
Use full Unique keys as URIs

JSON-LD is one tech for doing (2) but it’s pretty terrible at (1), whereas both are valid options with trade offs

As mentioned an increasingly popular taxonomy for doing this is schema.org which while not perfect, has a good network effect, docs and is simple and easy to use. Big advantage is that you can also embed it in a plain old HTML page and it also gains you SEO (for those that want that)

What happens when you want to extend schema.org. Well you can make your own and put it somewhere. Or you can reuse something existing. Or you can use permanent ids from:

https://w3id.org/

There’s a lot of formats and vocabs out there. In 2020 I would favour JSON(LD) based solutions, as many are in other formats such as rdf/xml, turtle, RDFa etc. From my experience I’ve been able to make much faster progress dealing with just one format

That’s one way to tackle the problem at the key/value level.

It does not solve the problem of how to tackle the reuse of objects at the Object level. There’s also two aspects to that. The first is to join/merge/mashup different objects together cross origin

Shameless self promotion I’ve been working on something that is still an early draft here:

It hopefully offers much of the benefits of Linked Data with a fraction of the complexity, and provides a full upgrade path

The final piece of the jigsaw is putting the object and fields together into a coherent thing. Some refer to this as a ‘shape’. Earlier json-schema was mentioned. I’ve not played around much with that, but there does look to be quite a lot of tooling around there

Just my 2 cents!

rosano · October 7, 2020, 2:41pm

I believe that this recent release from Ink and Switch implements what I was imagining. Project Cambria: Translate your data with lenses https://www.inkandswitch.com/cambria.html

It can be used for migrating bi-directionally between multiple versions of a schema, so that older clients can consume data in a newer schema - interesting that they propose publishing the schemas at a well known URL.

I think this approach could solve some of the issues around multiple apps reading and writing to the same place. Still leaves the issue of migrating the document locations AFAIK.

raucao · October 21, 2020, 10:17am

Looks like the same people have created automerge, the goals of which sound very, very similar to the remoteStorage philosophy.

rosano · December 11, 2020, 12:33pm

An interesting idea from Fission about publishing schemas to the public directory, seems compatible with how remoteStorage works:

Apps are the main unit of things on Fission.
So it only makes sense they are available on the WNFS as well.

We’re thinking of putting them at:

/public/Apps/Published/APP_NAME.fission.app
Putting the already public published fission app in your WNFS (ie. the creator’s WNFS), we can enable some cool things, such as, sharing the data schemas.

/public/Apps/Published/APP_NAME.fission.app/schemas/playlist.json

I guess this means in certain cases you could forego ‘downloading a module’ and get the JSON schema directly from the user’s public folder. This could also tie in with Cambria’s idea of publishing to a well known URL to help with backwards compatibility.

rosano · March 7, 2021, 1:08pm

More discussion from Geoffrey Litt in Bring Your Own Client:

Schema compatibility: do all the editors need to agree on a single rigidly specified format? If there are reconcilable differences between formats, can we build “live converters” that convert between them on every change? (Essentially, imagine collaborating between Pages and Microsoft Word, running a file export in both directions on every keystroke from either app) This problem is closely related to the problem of schema versioning within a single editor, but BYOC can complicate things much further.

Preserving intent: the decoupling of git + text editors has a downside: the text format fails to capture the intent of edits, so git can’t be very smart about merging conflicts. Is this something fundamental to decoupling editors from collaboration? Or are there ways to design APIs that preserve intent better, while also supporting an open client ecosystem? (It seems like deciding on how you store your data in a CRDT is the key question here?)

Additional editor-specific metadata: Some editors need to store additional data that isn’t part of the “core data model.” Eg, Sublime Text stores my .sublime-workspace file alongside the code source. How does this work smoothly without polluting the data being used by other editors?

Code distribution: Traditionally code distribution happens through centralized means, but could code be distributed in a decentralized way alongside documents? If we’re collaborating together in a doc, can I directly share a little editor widget/tool that I’m using, without needing to send you a Github link? This might be overcomplicating things / orthogonal to the general idea here… (This idea inspired by Webstrates, linked below)

Innovation: Unfortunately stable open formats can limit product innovation—eg, email clients are tied down by adherence to the email standard. Can we mitigate that effect? I think web browsers have struck a good balance between progress and openness, despite frustrations in both directions.

rosano · October 14, 2021, 2:07pm

Noel De Martin from the SOLID community recently wrote some ideas about apps interoperating spontaneously, framing in terms of Cory Doctorow’s three classifications, describing shortcomings of centralized vocabularies and Cambria lenses, there’s a nice thread with some thoughts from Boris at Fission, be welcome to jump in