Notes Module: MarkDown or limited HTML

DougReeder · August 30, 2021, 4:00pm

I’m developing a Notes module for my note-taking application (Notes Together). To cover a range of user scenarios, I think notes need to allow semantic markup (headings, lists, emphasis, images, etc.) but not style markup. Notes are for quick personal reference, not presentation.

The general scenario is a collection of notes shared between a couple of users, using two or more apps. The protocol makes conflicts uncommon, but they will happen, particularly when collaborating in real time. The unit of conflict will typically be a paragraph or less. For this kind of textual material, automatic conflict resolution is possible, by retaining both versions of conflicting sections. (Unlike source code, ungrammatical or redundant text is usable. Users can edit out weirdness at their leisure.)

Therefore, the data storage format should be as close as possible to the users’ mental model, so merging markup produces reasonable results. There are two obvious alternatives: Markdown or a constrained subset of HTML. Markdown is simpler, but some fairly common markup isn’t yet supported, and other markup is unlikely to ever be supported. HTML allows advanced markup, but has many security issues and is difficult for editors to handle consistently. There are also other lightweight markup languages: reStructuredText, AsciiDoc, setext, etc. but it’s not clear they have tooling that produces an abstract syntax tree.

I intend to decide in the next couple weeks on a data storage format. I’m leaning toward Markdown, but I would like to hear opinions on alternatives.

rosano · August 31, 2021, 1:36pm

I’m generally in favour of ‘do what works for you’. The schema discussion comes up quite often: Public protocols

raucao · August 31, 2021, 2:19pm

I’m not sure how this is helpful regarding the particular module and question.

I think Markdown makes the most sense, too, for multiple reasons:

It’s just text, so the content would already compatible with other apps that store notes as text
Many technical users already use Markdown in their text notes (I know I do, and I know many others that do)
In theory, you could still put HTML in a Markdown document (but it could obviously create compatibility issues with other apps that do not support HTML)

Personally, I think it would be fantastic if the next note-taking app would be compatible with an existing one; ideally Litewrite, which still has the most users I think. We already have at least 4 or 5 different notes apps that all use their own categories and formats, and are not compatible between each other.

Litewrite’s data module is also extremely simple to begin with (albeit from the very early days, so it’s a little dated), so it could easily be extended and/or replaced by a newer module for example:

Hope this helps! Would also gladly test any alpha or beta versions, or review module code.

DougReeder · September 1, 2021, 2:05am

I think at least two textual document modules are required. A “word processing” module would support styling and a broad range of markup, and no particular relationship between documents. A “note” module would not support styling, and only supports basic markup, and also has some concept of collections of notes. I’m focusing on the latter. (Also, while Litewrite doesn’t appear to add markup, the module code doesn’t appear to enforce anything, which is key to compatibility.)

My thought is that the storage format doesn’t need to be human-readable - the module controls access, and could handle conversion to appropriate formats.

I would like to store notes so they are useable by other apps. It’s not clear that there’s an existing module with well-defined semantics.

Re: HTML in Markdown - that would require every app to deal with HTML, its security vulnerabilities, and difficult-to-edit document model. So, I would write the module to strip out HTML (or possibly try to convert it to Markdown).

raucao · September 1, 2021, 8:03am

The enforcement of the format is done by remoteStorage.js itself. You can declare types in the module using JSON Schema, which will automatically be validated using tv4 in rs.js:

github.com

litewrite/remotestorage-module-documents/blob/master/remotestorage-module-documents.js#L18-L38


      
          privateClient.declareType('text', {
            description: 'A text document',
            type: 'object',
            '$schema': 'http://json-schema.org/draft-03/schema#',
            additionalProperties: true,
            properties: {
              title: {
                type: 'string',
                required: true
              },
              content: {
                type: 'string',
                required: true,
                default: ''
              },
              lastEdited: {
                type: 'integer',
                required: true
              }
            }
          })

(The type alias “text” is the first argument for the storeObject() function.)

additionalProperties: true in the schema means that validation will not fail when storing more than declared in that schema. This is what I mean by it being easy to extend.

However, the question would at least be if Litewrite also loads re-saves additional properties that it doesn’t know about! If it doesn’t, then editing a note with additional properties in Litewrite could silently destroy something added from another app.

This is why I think we need to think about something like schema versioning or migrations (also discussed in the past, but so far unresolved). There are definitely things we could easily implement in a module in order to not mess up existing data. Also, importing/copying existing data to a new path could also be a good first step. An app could ask if I want to import e.g. my Litewrite or Hyperdraft notes when I first start it up, or also from a settings screen.

This is actually such a common problem for apps and modules that I’m convinced by now that this should be solved by a utility module that we can share between modules. I’m working on the same thing, again in a custom way, for Webmarks/bookmarks collections right now. But almost every module I’ve touched before needed some kind of collections/lists.

raucao · September 1, 2021, 8:07am

Adding a quick brainstorming idea:

Simple versioning could be implemented by adding it to the context/alias, like so: text-v1.

The getAll() function actually used to have a filter option for types, which could be re-added, so we could get only objects of a certain type, and in this case version, for example.

DougReeder · September 4, 2021, 6:34pm

I’ve concluded that you can convert the abstract syntax tree (AST) of the editor to Markdown, convert the Markdown to an AST, and usually get the same AST back. However, it’s essentially impossible to have confidence that this is true for every reasonable case. So, I’m going with HTML to serialize rich text.

I’m also pursuing the idea of allowing different notes to have different serializations, so users could keep all their notes in /documents/notes, and the existing plain text notes need not be altered.

raucao · September 5, 2021, 8:57am

Great idea! I guess a simple way of doing this would be to add the respective file extensions to the various documents then.

DougReeder · September 5, 2021, 7:17pm

I was thinking each note would have a an array of type records, each type record containing a MIME type and hint. For example

[
{mime: "text/markdown", hint: "COMMONMARK"},
{mime: "text/plain"}
]

Apps could go through the list in order until they find a type they understand. So apps that understood Commonmark Markdown would parse as strict Commonmark, apps that knew some other dialect of Markdown would parse it with a compatability flag, and other apps would just treat it as plain text.

(Again, serializing anything else to Markdown is a bad idea, but there are text editors specialized for Markdown.)

When apps import from text files, if they don’t recognize the file type, they could set the type as

[{mime: "text/plain", hint: "EXT"}]

where EXT is the file extension.

Obviously, few apps will have custom code for more than a handful of types. This allows apps to separate notes into native, compatible and incompatible. It would behoove apps to allow saving as plain text.

I’ll probably keep the title field as plain text, so incompatible notes are at least labeled.

This system wouldn’t eliminate app incompatibility, but it would be no worse than saving as files.

raucao · September 6, 2021, 6:57am

What’s actually the difference between Markdown and plain text? I always thought the best thing about Markdown is that it’s just plain text by itself, and can be used and read even when there’s no specific Markdown support, as I currently do with Litewrite for example.

If it’s only about editor support, as in e.g. buttons for formatting text, then I would imagine in the case that someone prefers a different plain text markup language, they would likely prefer it for all of their notes. In which case, maybe an app-wide switch for preferred plain text markup/editor setting could make a bit more sense.

I think, currently, all RS-enabled notes apps use Markdown in case there’s extra formatting, by the way. And I think they all store it as plain text. So maybe it would also make sense to make Markdown the de-facto standard for plain text note formatting in RS-enabled apps, since that’s already almost the case for most plain-text editing on the Web. This could simplify things a little bit perhaps.

DougReeder · September 6, 2021, 9:19pm

The type {mime: "text/plain" hint: "MARKDOWN"} would be synonymous with {mime: "text/markdown"}

Much of the time, {mime: "text/markdown"} would be treated the same as {mime: "text/plain"}. One scenario where they should be treated differently is changing a note from plain text to rich text. {mime: "text/markdown"} or {mime: "text/plain" hint: "MARKDOWN"} should be run through a Markdown processor, {mime: "text/plain"} should not. Likewise publishing a note (as Hyperdraft does).

I’m fond of Markdown and use it a lot. But it’s not the be-all and end-all of lightweight markup. When an Asciidoc, BBCode or Textile enthusiast want to write support for that in a RS notes app, I don’t want to turn them away or make it more difficult.

Realistically, I don’t expect any given app will support more than a couple of formats.

The only additional requirement this imposes is that an app must record the type and flavor of its saved data (much as we expect apps that save data in files to do). This does minimize the hassle if you have to switch from using one app (because it’s no longer maintained, or whatever) to a newer one.

raucao · September 7, 2021, 8:43am

Fully agree. My question about why not just store plain text was aimed at that, too. But I do see the issue when publishing Textile while assuming Markdown for example.

Actually, if the conversion and publishing functions are part of the data module, then it would be incredibly easy for any app to support a variety of formats without even implementing anything special, no? I think it would make sense to abstract such functionality away from app code in the first place. And sharing it in the data module means other apps can easily, and safely, access the same data.

DougReeder · September 7, 2021, 3:56pm

An intriguing possibility.