profile
viewpoint
Stanisław Małolepszy stasm @google Warsaw, Poland https://piesku.com 80% Cloud, 20% Message Format Working Group @google. In my spare time, I make games that I like playing with my friends.

piesku/goodluck 132

🤞 A hackable template for creating small and fast browser games.

potch/glow 102

Client page for the Glow project

clouserw/tower 34

A library that builds on Babel and Jinja to make extracting strings easy and uniform

michalbe/cervus 28

:video_game: tiny WebGL 3d game engine

piesku/backcountry 18

A submission for js13kGames 2019

piesku/urwisek 8

Submission to js13kGames.com 2021. The theme was 'Space.'

piesku/mirrorisk 6

The Risk board game like you don't remember it from childhood. A Submission to Gamdev.js 2021 Jam

fwenzel/pontoon 4

A proof-of-concept implementation of an in-place web localization tool

piesku/duszki 4

#js13k 2022

piesku/roar 3

A WebXR submission to js13kGames 2020.

Pull request review commentunicode-org/message-format-wg

Create 0483-beauty-contest.md

+# Syntax Variation Beauty Contest++As discussed in the 2023-09-25 teleconference, we need to choose a syntax to proceed with.+This page hosts various options for considering in the 2023-10-02 call.++## Current++This is the current syntax.+The list of messages in this section also serves as the basis for all other examples.++```+{Hello world!}++{Hello {$user}}++input $var :function option=value+{Hello {$var}}++input $var :function option=value+local $foo = {$bar :function option=value}+{Hello {$var}, you have a {$foo}}++match {$foo} {$bar}+when foo bar {Hello {$foo} you have a {$var}}+when * * {{$foo} hello you have a {$var}}++match {$foo :function option=value} {$bar :function option=value}+when a b {  {$foo} is {$bar}  }+when x y {  {$foo} is {$bar}  }+when * * {  {$foo} is {$bar}  }++input $var :function option=value local $foo = {$bar :function option=value}{Hello {$var}, you have a {$foo}}++match {$foo} {$bar} when foo bar {Hello {$foo} you have a {$var}} when * * {{$foo} hello you have a {$var}}++match {$foo :function option=value} {$bar :function option=value}when a b {  {$foo} is {$bar}  }when x y {  {$foo} is {$bar}  }when * * {  {$foo} is {$bar}  }+```++## Invert for Text Mode++Consumes exterior whitespace.++```+Hello world!++Hello {$user}++{input $var :function option=value}+Hello {$var}++{input $var :function option=value}+{local $foo = $bar :function option=value}+Hello {$var}, you have a {$foo}++{match {$foo} {$bar}}+{when foo bar} Hello {$foo} you have a {$var}+{when * *} {$foo} hello you have a {$var}++{match {$foo :function option=value} {$bar :function option=value}}+{when a b} {  {$foo} is {$bar}  }+{when x y} {  {$foo} is {$bar}  }+{when * *} {|  |}{$foo} is {$bar}{|  |}++{input $var :function option=value}{local $foo = $bar :function option=value}Hello {$var}, you have a {$foo}++{match {$foo} {$bar}}{when foo bar} Hello {$foo} you have a {$var}{when * *} {$foo} hello you have a {$var}++{match {$foo :function option=value}{$bar :function option=value}}{when a b} {  {$foo} is {$bar}  }{when x y} {  {$foo} is {$bar}  }{when * *} {|  |}{$foo} is {$bar}{|  |}+```++## Use sigils for code mode++Try to redues the use of `{`/`}` to just expressions and placeholders instead of the three+uses we have now (the other use is for patterns). This requires escaping whitespace or using+a placeholder for it.++The sigil `#` was chosen because `#define` type constructs are fairly common. Introduces `[`/`]` for keys.++```+#{input $var :function option=value}+Hello {$var}++#{input $var :function option=value}

(Same for the single-line version.)

aphillips

comment created time in 3 days

PullRequestReviewEvent

Pull request review commentunicode-org/message-format-wg

Create 0483-beauty-contest.md

+# Syntax Variation Beauty Contest++As discussed in the 2023-09-25 teleconference, we need to choose a syntax to proceed with.+This page hosts various options for considering in the 2023-10-02 call.++## Current++This is the current syntax.+The list of messages in this section also serves as the basis for all other examples.++```+{Hello world!}++{Hello {$user}}++input $var :function option=value+{Hello {$var}}++input $var :function option=value+local $foo = {$bar :function option=value}+{Hello {$var}, you have a {$foo}}++match {$foo} {$bar}+when foo bar {Hello {$foo} you have a {$var}}+when * * {{$foo} hello you have a {$var}}++match {$foo :function option=value} {$bar :function option=value}+when a b {  {$foo} is {$bar}  }+when x y {  {$foo} is {$bar}  }+when * * {  {$foo} is {$bar}  }++input $var :function option=value local $foo = {$bar :function option=value}{Hello {$var}, you have a {$foo}}++match {$foo} {$bar} when foo bar {Hello {$foo} you have a {$var}} when * * {{$foo} hello you have a {$var}}++match {$foo :function option=value} {$bar :function option=value}when a b {  {$foo} is {$bar}  }when x y {  {$foo} is {$bar}  }when * * {  {$foo} is {$bar}  }+```++## Invert for Text Mode++Consumes exterior whitespace.++```+Hello world!++Hello {$user}++{input $var :function option=value}+Hello {$var}++{input $var :function option=value}+{local $foo = $bar :function option=value}+Hello {$var}, you have a {$foo}++{match {$foo} {$bar}}+{when foo bar} Hello {$foo} you have a {$var}+{when * *} {$foo} hello you have a {$var}++{match {$foo :function option=value} {$bar :function option=value}}+{when a b} {  {$foo} is {$bar}  }+{when x y} {  {$foo} is {$bar}  }+{when * *} {|  |}{$foo} is {$bar}{|  |}++{input $var :function option=value}{local $foo = $bar :function option=value}Hello {$var}, you have a {$foo}++{match {$foo} {$bar}}{when foo bar} Hello {$foo} you have a {$var}{when * *} {$foo} hello you have a {$var}++{match {$foo :function option=value}{$bar :function option=value}}{when a b} {  {$foo} is {$bar}  }{when x y} {  {$foo} is {$bar}  }{when * *} {|  |}{$foo} is {$bar}{|  |}+```++## Use sigils for code mode++Try to redues the use of `{`/`}` to just expressions and placeholders instead of the three+uses we have now (the other use is for patterns). This requires escaping whitespace or using+a placeholder for it.++The sigil `#` was chosen because `#define` type constructs are fairly common. Introduces `[`/`]` for keys.++```+#{input $var :function option=value}+Hello {$var}++#{input $var :function option=value}

For consistency with #local and #match, this could be:

#input {$var :function option=value}
Hello {$var}

#input {$var :function option=value}
aphillips

comment created time in 3 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentunicode-org/message-format-wg

Add message parse mode (code vs text) design doc

+# Message Parse Mode++Status: **Proposed**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Contributors</dt>+		<dd>@eemeli</dd>+		<dd>@aphillips</dd><!-- Seville and other inserted edits -->+		<dt>First proposed</dt>+		<dd>2023-09-13</dd>+		<dt>Pull Request</dt>+		<dd><a href="https://github.com/unicode-org/message-format-wg/pull/474">#474</a></dd>+	</dl>+</details>++## Objective++Decide whether text patterns or code statements should be enclosed in MF2.++## Background++Existing message and template formatting languages tend to start in "text" mode,+and require special syntax like `{{` or `{%` to enter "code" mode.++ICU MessageFormat and Fluent both support inline selectors+separated from the text using `{ ... }` for multi-variant messages.++[Mustache templates](https://mustache.github.io/mustache.5.html)+and related languages wrap "code" in `{{ ... }}`.+In addition to placeholders that are replaced by their interpolated value during formatting,+this also includes conditional blocks using `{{#...}}`/`{{/...}}` wrappers.++[Handlebars](https://handlebarsjs.com/guide/) extends Mustache expressions+with operators such as `{{#if ...}}` and `{{#each ...}}`,+as well as custom formatting functions that become available as e.g. `{{bold ...}}`.++[Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/) separate+`{% statements %}` and `{{ expressions }}` from the base text.+The former may define tests that determine the inclusion of subsequent text blocks in the output.++A cost that the message formatting and templating languages mentioned above need to rely on+is some rule or behaviour that governs how to deal with whitespace at the beginning and end of a pattern,+as statements may be separated from each other by newlines or other constructs for legibility.++Other formats supporting multiple message variants tend to rely on a surrounding resource format to define variants,+such as [Rails internationalization](https://guides.rubyonrails.org/i18n.html#pluralization) in Ruby or YAML+and [Android String Resources](https://developer.android.com/guide/topics/resources/string-resource.html#Plurals) in XML.+These formats rely on the resource format providing clear delineation of the beginning and end of a pattern.++## Use-Cases++Most messages in any localization system do not contain any expressions, statements or variants.+These should be expressible as easily as possible.++Many messages include expressions that are to be interpolated during formatting.+For example, a greeting like "Hello, user!" may be formatted in many locales with the `user`+being directly set by an input variable.++Sometimes, interpolated values need explicit formatting within a message.+For example, formatting a message like "You have eaten 3.2 apples"+may require the input numerical value+to be formatted with an explicit `minimumFractionDigits` option.++Some messages require multiple variants.+This is often related to plural cases, such as "You have 3 new messages",+where the value `3` is an input and the "messages" needs to correspond with its plural category.++Rarely, messages needs to include leading or trailing whitespace due to+e.g. how they will be concatenated with other text,+or as a result of being segmented from some larger volume of text.++---++Users editing a simple message and who wish to add an `input` or `local` annotiation+to the message do not wish to reformat the message extensively.++Users who have messages that include leading or trailing whitespace+want to ensure that this whitespace is included in the translatable+text portion of the message. Which whitespace characters are displayed at runtime+should not be surprising.++## Requirements++Common things should be easy, uncommon things should be possible.++Developers and translators should be able to read and write the syntax easily in a text editor.++Translators (and their tools) are not software engineers, so we want our syntax+to be as simple, robust, and non-fussy as possible.+Multiple levels of complex nesting should be avoided,+along with any constructs that require an excessive+level of precision on the part of non-technical users.++As MessageFormat 2 will be at best a secondary language to all its users,+it should conform to user expectations and require as little learning as possible.++The syntax should avoid footguns,+in particular as it's passed through various tools during formatting.++ASCII-compatible syntax. While support for non-ASCII characters for variable names,+values, literals, options, and the like are important, the syntax itself should+be restricted to ASCII characters. This allows the message to be parsed+visually by humans even when embedded in a syntax that requires escaping.++Whitespace is forgiving. We _require_ the minimum amount of whitespace and allow+users to format or change unimportant whitespace as much as they want.+This avoids the need for translators or tools to be super pedantic about+formatting.++## Constraints++Limiting the range of characters that need to be escaped in plain text is important.+Following past precedent,+this design doc will only consider encapsulation styles which+start with `{` and end with `}`.++The current syntax includes some plain-ascii keywords:+`input`, `local`, `match`, and `when`.++The current syntax and active proposals include some sigil + name combinations,+such as `:number`, `$var`, `|literal|`, `+bold`, and `@attr`.++The current syntax supports unquoted literal values as operands.++Messages themselves are "simple strings" and must be considered to be a single+line of text. In many containing formats, newlines will be represented as the local+equivalent of `\n`.++## Proposed Design++**Start in text mode**++In this option, whitespace is trimmed from `pattern` constructs unless+the pattern is quoted.++```+Hello world!++Hello {$user}!++{input $now :datetime dateStyle=long}+Hello {$user}. Today is {$now}++{local $now = {:systemGetCurrentTime :datetime dateStyle=medium}}+Hello {$user}. Today is {$now}++{match {$count :number integer=true}}+{when 0}   Hello {$user}. Today is {$now} and you have no geese.+{when one} Hello {$user}. Today is {$now} and you have {$count} goose.

Overall, I see potential in the #[...] syntax for code (statements). A lot of languages allows these extra directive for code, often called attributes: C++ uses [[foo]], C# uses [foo], Rust uses #[foo].

eemeli

comment created time in 5 days

PullRequestReviewEvent

Pull request review commentunicode-org/message-format-wg

Add message parse mode (code vs text) design doc

+# Message Parse Mode++Status: **Proposed**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Contributors</dt>+		<dd>@eemeli</dd>+		<dd>@aphillips</dd><!-- Seville and other inserted edits -->+		<dt>First proposed</dt>+		<dd>2023-09-13</dd>+		<dt>Pull Request</dt>+		<dd><a href="https://github.com/unicode-org/message-format-wg/pull/474">#474</a></dd>+	</dl>+</details>++## Objective++Decide whether text patterns or code statements should be enclosed in MF2.++## Background++Existing message and template formatting languages tend to start in "text" mode,+and require special syntax like `{{` or `{%` to enter "code" mode.++ICU MessageFormat and Fluent both support inline selectors+separated from the text using `{ ... }` for multi-variant messages.++[Mustache templates](https://mustache.github.io/mustache.5.html)+and related languages wrap "code" in `{{ ... }}`.+In addition to placeholders that are replaced by their interpolated value during formatting,+this also includes conditional blocks using `{{#...}}`/`{{/...}}` wrappers.++[Handlebars](https://handlebarsjs.com/guide/) extends Mustache expressions+with operators such as `{{#if ...}}` and `{{#each ...}}`,+as well as custom formatting functions that become available as e.g. `{{bold ...}}`.++[Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/) separate+`{% statements %}` and `{{ expressions }}` from the base text.+The former may define tests that determine the inclusion of subsequent text blocks in the output.++A cost that the message formatting and templating languages mentioned above need to rely on+is some rule or behaviour that governs how to deal with whitespace at the beginning and end of a pattern,+as statements may be separated from each other by newlines or other constructs for legibility.++Other formats supporting multiple message variants tend to rely on a surrounding resource format to define variants,+such as [Rails internationalization](https://guides.rubyonrails.org/i18n.html#pluralization) in Ruby or YAML+and [Android String Resources](https://developer.android.com/guide/topics/resources/string-resource.html#Plurals) in XML.+These formats rely on the resource format providing clear delineation of the beginning and end of a pattern.++## Use-Cases++Most messages in any localization system do not contain any expressions, statements or variants.+These should be expressible as easily as possible.++Many messages include expressions that are to be interpolated during formatting.+For example, a greeting like "Hello, user!" may be formatted in many locales with the `user`+being directly set by an input variable.++Sometimes, interpolated values need explicit formatting within a message.+For example, formatting a message like "You have eaten 3.2 apples"+may require the input numerical value+to be formatted with an explicit `minimumFractionDigits` option.++Some messages require multiple variants.+This is often related to plural cases, such as "You have 3 new messages",+where the value `3` is an input and the "messages" needs to correspond with its plural category.++Rarely, messages needs to include leading or trailing whitespace due to+e.g. how they will be concatenated with other text,+or as a result of being segmented from some larger volume of text.++---++Users editing a simple message and who wish to add an `input` or `local` annotiation+to the message do not wish to reformat the message extensively.++Users who have messages that include leading or trailing whitespace+want to ensure that this whitespace is included in the translatable+text portion of the message. Which whitespace characters are displayed at runtime+should not be surprising.++## Requirements++Common things should be easy, uncommon things should be possible.++Developers and translators should be able to read and write the syntax easily in a text editor.++Translators (and their tools) are not software engineers, so we want our syntax+to be as simple, robust, and non-fussy as possible.+Multiple levels of complex nesting should be avoided,+along with any constructs that require an excessive+level of precision on the part of non-technical users.++As MessageFormat 2 will be at best a secondary language to all its users,+it should conform to user expectations and require as little learning as possible.++The syntax should avoid footguns,+in particular as it's passed through various tools during formatting.++ASCII-compatible syntax. While support for non-ASCII characters for variable names,+values, literals, options, and the like are important, the syntax itself should+be restricted to ASCII characters. This allows the message to be parsed+visually by humans even when embedded in a syntax that requires escaping.++Whitespace is forgiving. We _require_ the minimum amount of whitespace and allow+users to format or change unimportant whitespace as much as they want.+This avoids the need for translators or tools to be super pedantic about+formatting.++## Constraints++Limiting the range of characters that need to be escaped in plain text is important.+Following past precedent,+this design doc will only consider encapsulation styles which+start with `{` and end with `}`.++The current syntax includes some plain-ascii keywords:+`input`, `local`, `match`, and `when`.++The current syntax and active proposals include some sigil + name combinations,+such as `:number`, `$var`, `|literal|`, `+bold`, and `@attr`.++The current syntax supports unquoted literal values as operands.++Messages themselves are "simple strings" and must be considered to be a single+line of text. In many containing formats, newlines will be represented as the local+equivalent of `\n`.++## Proposed Design++**Start in text mode**++In this option, whitespace is trimmed from `pattern` constructs unless+the pattern is quoted.++```+Hello world!++Hello {$user}!++{input $now :datetime dateStyle=long}+Hello {$user}. Today is {$now}++{local $now = {:systemGetCurrentTime :datetime dateStyle=medium}}+Hello {$user}. Today is {$now}++{match {$count :number integer=true}}+{when 0}   Hello {$user}. Today is {$now} and you have no geese.+{when one} Hello {$user}. Today is {$now} and you have {$count} goose.

With the warning that # is a comment in some of the container formats (for example Java properties and gettext .po files)

OTOH, it's only a comment when it's the first character on a line. When paired with some identifier, this shouldn't be as big of an issue? E.g.:

my.res=#match {$foo} #[*] My pattern
eemeli

comment created time in 5 days

Pull request review commentunicode-org/message-format-wg

Document the design of quoted literals

+# Quoted Literals++Status: **Accepted**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Pull Request</dt>+		<dd><a href="https://github.com/unicode-org/message-format-wg/pull/477">#477</a></dd>+	</dl>+</details>++## Objective++Document the rationale for including quoted literals in MessageFormat and for delimiting them with the vertical line character, `|`.++## Background++MessageFormat allows both quoted and unquoted literals. Unquoted literals satisfy many common use-cases for literals: they are sufficient to represent numbers and single-word option values and variant keys. Quoted literals are helpful in exotic use-cases.++In early drafts of the MessageFormat syntax, quoted literals used to be delimited first with quotation marks (`"foo bar"`), and then with round parentheses, e.g. `(foo bar)`. See [#263](https://github.com/unicode-org/message-format-wg/issues/263).++In [#414](https://github.com/unicode-org/message-format-wg/pull/414) proposed to revert these changes and go back to using single and/or double quotes as delimiters. The propsal was rejected. This document is an artifact of that rejection.++## Use-Cases++_What use-cases do we see? Ideally, quote concrete examples._++In general, quoted literals are useful for:++1. encoding literals containing whitespace, like literals consisting of multiple words,+1. encoding literals containing exotic characters that do not conform to the `unquoted` production in ABNF.++More specifically:++- Message authors and translators need to be able to use the apostrophe in the message content, and may want to use the single quote character to represent it instead of the typograhic (curly) apostrophe.++  > ```+  > {…{|New Year's Eve|}…}+  > ```++- Message authors may want to use literals to define locale-aware dates as literals in the RFC 7231 format:++  > ```+  > {The Unix epoch is defined as {|Thu, 01 Jan 1970 00:00:00 GMT| :datetime}.}+  > ```++- Message authors may want to use multiple words as values of certain options passed to custom functions and markup elements:++  > ```+  > {{+button title=|Click here!|}Submit{-button}}+  > ```+  >+  > Note that quoted literals cannot contain placeholders, making interpolating data into them impossible.+  >+  > ```+  > -- This is impossible in MessageFormat 2.0.+  > {{+button title=|Goodbye, {$userName}!|}Sign out{-button}}+  > ```++- Selector function implementers may want to support exotic characters in variant keys to effectively create "mini-DSLs" for the matching logic:++  > ```+  > match {$count :myNumber}+  > when |<10| {A handful.}+  > when * {Lots.}+  > ```++- Message authors may want to protect untranslatable strings:++  > ```+  > {Visit {|http://www.example.com| @translate=false}.}+  > ```+  >+  > See [design proposal 0002](https://github.com/unicode-org/message-format-wg/blob/main/exploration/0002-expression-attributes.md).++- Message authors may want to decorate substrings as being written in a particular language, different from the message's language, for the purpose of accessibility, text-to-speech, and semantic correctness.++  > ```+  > {The official native name of the Republic of Poland is {|Rzeczpospolita Polska| @lang=pl}.}+  > ```+  >+  > See [design proposal 0002](https://github.com/unicode-org/message-format-wg/blob/main/exploration/0002-expression-attributes.md).++- Developers may want to embed messages with quoted literals in code written in another programming language which uses single or double quotes to delimit strings.++  > ```js+  > let message = new MessageFormat("en", "{A message with {|a literal|}.}");+  > ```

@eemeli ended up fixing this via a prettier-ignore comment, thanks.

stasm

comment created time in 6 days

PullRequestReviewEvent

push eventunicode-org/message-format-wg

Stanisław Małolepszy

commit sha fcd1904f9dd0d5ba26de06630166b9dc3940af4c

Use single quotes for the JS example Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

view details

push time in 6 days

Pull request review commentunicode-org/message-format-wg

Document the design of quoted literals

+# Quoted Literals++Status: **Accepted**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Pull Request</dt>+		<dd><a href="https://github.com/unicode-org/message-format-wg/pull/477">#477</a></dd>+	</dl>+</details>++## Objective++Document the rationale for including quoted literals in MessageFormat and for delimiting them with the vertical line character, `|`.++## Background++MessageFormat allows both quoted and unquoted literals. Unquoted literals satisfy many common use-cases for literals: they are sufficient to represent numbers and single-word option values and variant keys. Quoted literals are helpful in exotic use-cases.++In early drafts of the MessageFormat syntax, quoted literals used to be delimited first with quotation marks (`"foo bar"`), and then with round parentheses, e.g. `(foo bar)`. See [#263](https://github.com/unicode-org/message-format-wg/issues/263).++In [#414](https://github.com/unicode-org/message-format-wg/pull/414) proposed to revert these changes and go back to using single and/or double quotes as delimiters. The propsal was rejected. This document is an artifact of that rejection.++## Use-Cases++_What use-cases do we see? Ideally, quote concrete examples._++In general, quoted literals are useful for:++1. encoding literals containing whitespace, like literals consisting of multiple words,+1. encoding literals containing exotic characters that do not conform to the `unquoted` production in ABNF.++More specifically:++- Message authors and translators need to be able to use the apostrophe in the message content, and may want to use the single quote character to represent it instead of the typograhic (curly) apostrophe.++  > ```+  > {…{|New Year's Eve|}…}+  > ```++- Message authors may want to use literals to define locale-aware dates as literals in the RFC 7231 format:++  > ```+  > {The Unix epoch is defined as {|Thu, 01 Jan 1970 00:00:00 GMT| :datetime}.}+  > ```++- Message authors may want to use multiple words as values of certain options passed to custom functions and markup elements:++  > ```+  > {{+button title=|Click here!|}Submit{-button}}+  > ```+  >+  > Note that quoted literals cannot contain placeholders, making interpolating data into them impossible.+  >+  > ```+  > -- This is impossible in MessageFormat 2.0.+  > {{+button title=|Goodbye, {$userName}!|}Sign out{-button}}+  > ```++- Selector function implementers may want to support exotic characters in variant keys to effectively create "mini-DSLs" for the matching logic:++  > ```+  > match {$count :myNumber}+  > when |<10| {A handful.}+  > when * {Lots.}+  > ```++- Message authors may want to protect untranslatable strings:++  > ```+  > {Visit {|http://www.example.com| @translate=false}.}+  > ```+  >+  > See [design proposal 0002](https://github.com/unicode-org/message-format-wg/blob/main/exploration/0002-expression-attributes.md).++- Message authors may want to decorate substrings as being written in a particular language, different from the message's language, for the purpose of accessibility, text-to-speech, and semantic correctness.++  > ```+  > {The official native name of the Republic of Poland is {|Rzeczpospolita Polska| @lang=pl}.}+  > ```+  >+  > See [design proposal 0002](https://github.com/unicode-org/message-format-wg/blob/main/exploration/0002-expression-attributes.md).++- Developers may want to embed messages with quoted literals in code written in another programming language which uses single or double quotes to delimit strings.++  > ```js+  > let message = new MessageFormat("en", "{A message with {|a literal|}.}");+  > ```++- Developers and localization engineers may want to embed messages with quoted literals in a container format, such as JSON.++  > ```json+  > {+  >   "msg": "{A message with {|a literal|}.}"+  > }+  > ```++## Requirements++_What properties does the solution have to manifest to enable the use-cases above?_++- **[r1; high priority]** Minimize the need to escape characters inside literals. In particular, choose a delimiter that isn't frequently used in translation content. Having to escape characters inside literals is inconvenient and error-prone when done by hand, and it also introduces the backslash into the message, `\`, which is the escape introducer. The backslash then needs to be escaped too, when the message is embedded in code or containers. (This is how some syntaxes produce the gnarly `\\\`.)+- **[r2; high priority]** Minimize the need to escape characters when embedding messages in code or containers. In particular, choose a delimiter that isn't frequently used as a string delimiter in programming languages and container formats. However, note that many programming languages also provide alternative ways of delimiting strings, e.g. _raw strings_ or triple-quoted literals.+- **[r3; medium priority]** Minimize the need to change the message in other ways than to escape some of its characters (e.g. rephrase content, use typographic apostrophes, or switch to using a second set of delimtiers).+- **[r4; medium priority]** Don't surprise users with syntax that's too exotic. We expect quoted literals to be rare, which means fewer opportunities to get used to their syntax and remember it.+- **[r5; low priority]** Be able to pair the opening and the closing delimiter, to aid parsers recover from syntax errors, and to leverage IDE's ability to highlight matching pairs of delimiters, to visually indicate to the user editing a message the bounds of the literal under caret. However, quoted literals are usually short and already enclosed in a placeholder (which has its own delimiters) or are outside patterns (when used as variant keys).++## Constraints++_What prior decisions and existing conditions limit the possible design?_++- **[c1]** MessageFormat uses the backslash, `\`, as the escape sequence introducer.+- **[c2]** Straight quotation marks, `'` and `"`, are common in content across many languages, even if other Unicode codepoints should be used in well-formatted text.+- **[c3]** Straight quotation marks, `'` and `"`, are common as string delimiters in many programming languages.++## Proposed Design++_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._++Use the vertical line character, `|`, to delimit quoted strings. For example:++> ```+> {The Unix epoch is defined as {|Thu, 01 Jan 1970 00:00:00 GMT| :datetime}.}+> ```++```abnf+literal       = quoted / unquoted+quoted        = "|" *(quoted-char / quoted-escape) "|"+quoted-char   = %x0-5B         ; omit \+              / %x5D-7B        ; omit |+              / %x7D-D7FF      ; omit surrogates+              / %xE000-10FFFF+quoted-escape = backslash ( backslash / "|" )+```++By being both uncommon in text content and uncommon as a string delimiter in other programming languages, the vertical line sidesteps the "inwards" and "outwards" problems of escaping.++- [r1 GOOD] Writing `"` and `'` in literals doesn't require escaping them via `\`. This means no extra `\` that need escaping.

I added a comparison matrix in 9d53b4d6ad4c1932d7e3e142065391844606e294.

stasm

comment created time in 6 days

PullRequestReviewEvent

push eventunicode-org/message-format-wg

Stanisław Małolepszy

commit sha 9d53b4d6ad4c1932d7e3e142065391844606e294

Add a comparison table

view details

Stanisław Małolepszy

commit sha 875f0d4e83e16084c88444edb6dee6348949ac45

Change angle brackets' r2 from POOR to FAIR It's POOR when embedding into an XML dialect, but otherwise, <> shouldn't cause too many issues for r2.

view details

push time in 6 days

pull request commentunicode-org/message-format-wg

Allow colon in name-start, matching XML Name

I feel rather strongly about not differing by a single character from XML's Nmtoken

Can you clarify why you feel this way? Is it because Nmtoken is important as a namespace construct in its own right? Because Nmtoken is implemented elsewhere, so it will be easier to validate? Or does it have technical qualities that are important to our design?

Thanks for asking. I should have elaborated in my previous comment, but it was already late here. What I meant to say is effectively the same as you did:

I tend to thinking that being "mostly compatible" with some standard, such as Nmtoken, is the same thing as "not compatible" and we should either be compatible or not even mention Nmtoken

The "feel strongly" part was about the fact that today we keep talking about nmtoken but in fact, it's not the nmtoken. If we keep referencing it, then I would strongly want us to not differ by just one or a couple characters from the XML's one.

Then, there's also the matter of principles of design. I don't want us to reinvent concepts where well-established alternatives exist, in particular in matters not directly related to i18n. I think nmtoken is potentially one such alternative for the concept of unquoted literals (are there others?). The 100% compatibility with LDML is a nice touch, but as noted, we have a solution for that: quoted literals.

Related to the above point is this:

Because Nmtoken is implemented elsewhere, so it will be easier to validate?

This is a nice side effect of reusing well-established concepts. It's likely not enough on its own to be the reason for sticking to XML's Nmtoken, but it's an example of additional benefits that we can reap if we do.


Most importantly, I agree with you that we should first solve other issues currently in flight and then come back here and figure out what we want name and nmtoken to be like.

eemeli

comment created time in 7 days

Pull request review commentunicode-org/message-format-wg

Add message parse mode (code vs text) design doc

+# Message Parse Mode++Status: **Proposed**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Contributors</dt>+		<dd>@eemeli</dd>+		<dd>@aphillips</dd><!-- Seville and other inserted edits -->+		<dt>First proposed</dt>+		<dd>2023-09-13</dd>+		<dt>Pull Request</dt>+		<dd><a href="https://github.com/unicode-org/message-format-wg/pull/474">#474</a></dd>+	</dl>+</details>++## Objective++Decide whether text patterns or code statements should be enclosed in MF2.++## Background++Existing message and template formatting languages tend to start in "text" mode,+and require special syntax like `{{` or `{%` to enter "code" mode.++ICU MessageFormat and Fluent both support inline selectors+separated from the text using `{ ... }` for multi-variant messages.++[Mustache templates](https://mustache.github.io/mustache.5.html)+and related languages wrap "code" in `{{ ... }}`.+In addition to placeholders that are replaced by their interpolated value during formatting,+this also includes conditional blocks using `{{#...}}`/`{{/...}}` wrappers.++[Handlebars](https://handlebarsjs.com/guide/) extends Mustache expressions+with operators such as `{{#if ...}}` and `{{#each ...}}`,+as well as custom formatting functions that become available as e.g. `{{bold ...}}`.++[Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/) separate+`{% statements %}` and `{{ expressions }}` from the base text.+The former may define tests that determine the inclusion of subsequent text blocks in the output.++A cost that the message formatting and templating languages mentioned above need to rely on+is some rule or behaviour that governs how to deal with whitespace at the beginning and end of a pattern,+as statements may be separated from each other by newlines or other constructs for legibility.++Other formats supporting multiple message variants tend to rely on a surrounding resource format to define variants,+such as [Rails internationalization](https://guides.rubyonrails.org/i18n.html#pluralization) in Ruby or YAML+and [Android String Resources](https://developer.android.com/guide/topics/resources/string-resource.html#Plurals) in XML.+These formats rely on the resource format providing clear delineation of the beginning and end of a pattern.++## Use-Cases++Most messages in any localization system do not contain any expressions, statements or variants.+These should be expressible as easily as possible.++Many messages include expressions that are to be interpolated during formatting.+For example, a greeting like "Hello, user!" may be formatted in many locales with the `user`+being directly set by an input variable.++Sometimes, interpolated values need explicit formatting within a message.+For example, formatting a message like "You have eaten 3.2 apples"+may require the input numerical value+to be formatted with an explicit `minimumFractionDigits` option.++Some messages require multiple variants.+This is often related to plural cases, such as "You have 3 new messages",+where the value `3` is an input and the "messages" needs to correspond with its plural category.++Rarely, messages needs to include leading or trailing whitespace due to+e.g. how they will be concatenated with other text,+or as a result of being segmented from some larger volume of text.++---++Users editing a simple message and who wish to add an `input` or `local` annotiation+to the message do not wish to reformat the message extensively.++Users who have messages that include leading or trailing whitespace+want to ensure that this whitespace is included in the translatable+text portion of the message. Which whitespace characters are displayed at runtime+should not be surprising.++## Requirements++Common things should be easy, uncommon things should be possible.++Developers and translators should be able to read and write the syntax easily in a text editor.++Translators (and their tools) are not software engineers, so we want our syntax+to be as simple, robust, and non-fussy as possible.+Multiple levels of complex nesting should be avoided,+along with any constructs that require an excessive+level of precision on the part of non-technical users.++As MessageFormat 2 will be at best a secondary language to all its users,+it should conform to user expectations and require as little learning as possible.++The syntax should avoid footguns,+in particular as it's passed through various tools during formatting.++ASCII-compatible syntax. While support for non-ASCII characters for variable names,+values, literals, options, and the like are important, the syntax itself should+be restricted to ASCII characters. This allows the message to be parsed+visually by humans even when embedded in a syntax that requires escaping.++Whitespace is forgiving. We _require_ the minimum amount of whitespace and allow+users to format or change unimportant whitespace as much as they want.+This avoids the need for translators or tools to be super pedantic about+formatting.++## Constraints++Limiting the range of characters that need to be escaped in plain text is important.+Following past precedent,+this design doc will only consider encapsulation styles which+start with `{` and end with `}`.++The current syntax includes some plain-ascii keywords:+`input`, `local`, `match`, and `when`.++The current syntax and active proposals include some sigil + name combinations,+such as `:number`, `$var`, `|literal|`, `+bold`, and `@attr`.++The current syntax supports unquoted literal values as operands.++Messages themselves are "simple strings" and must be considered to be a single+line of text. In many containing formats, newlines will be represented as the local+equivalent of `\n`.++## Proposed Design++**Start in text mode**++In this option, whitespace is trimmed from `pattern` constructs unless+the pattern is quoted.++```+Hello world!++Hello {$user}!++{input $now :datetime dateStyle=long}+Hello {$user}. Today is {$now}++{local $now = {:systemGetCurrentTime :datetime dateStyle=medium}}+Hello {$user}. Today is {$now}++{match {$count :number integer=true}}+{when 0}   Hello {$user}. Today is {$now} and you have no geese.+{when one} Hello {$user}. Today is {$now} and you have {$count} goose.

The {when one} Hello {$user} part makes me really nervous: it looks like two placeholders on both sides of Hello. Furthermore, I think the fact that the space left of Hello will be trimmed but the one on the right will not is a footgun. I think this is the main reason why I've been opposed to trimming (although I understand which problems it addresses).

Could we consider using a different set of brackets for statements? For example, if we made # special in patterns, too, we could consider something like the following:

#[one] Hello {$user}.

Does this look like less of a footgun now when it comes to the rules about which space will be trimmed and which one won't?


Interestingly, there's something about square brackets that I wouldn't actually mind the following:

#[one]Hello {$user}.
eemeli

comment created time in 7 days

PullRequestReviewEvent

Pull request review commentunicode-org/message-format-wg

Add message parse mode (code vs text) design doc

+# Message Parse Mode++Status: **Proposed**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Contributors</dt>+		<dd>@eemeli</dd>+		<dd>@aphillips</dd><!-- Seville and other inserted edits -->+		<dt>First proposed</dt>+		<dd>2023-09-13</dd>+		<dt>Pull Request</dt>+		<dd><a href="https://github.com/unicode-org/message-format-wg/pull/474">#474</a></dd>+	</dl>+</details>++## Objective++Decide whether text patterns or code statements should be enclosed in MF2.++## Background++Existing message and template formatting languages tend to start in "text" mode,+and require special syntax like `{{` or `{%` to enter "code" mode.++ICU MessageFormat and Fluent both support inline selectors+separated from the text using `{ ... }` for multi-variant messages.++[Mustache templates](https://mustache.github.io/mustache.5.html)+and related languages wrap "code" in `{{ ... }}`.+In addition to placeholders that are replaced by their interpolated value during formatting,+this also includes conditional blocks using `{{#...}}`/`{{/...}}` wrappers.++[Handlebars](https://handlebarsjs.com/guide/) extends Mustache expressions+with operators such as `{{#if ...}}` and `{{#each ...}}`,+as well as custom formatting functions that become available as e.g. `{{bold ...}}`.++[Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/) separate+`{% statements %}` and `{{ expressions }}` from the base text.+The former may define tests that determine the inclusion of subsequent text blocks in the output.++A cost that the message formatting and templating languages mentioned above need to rely on+is some rule or behaviour that governs how to deal with whitespace at the beginning and end of a pattern,+as statements may be separated from each other by newlines or other constructs for legibility.++Other formats supporting multiple message variants tend to rely on a surrounding resource format to define variants,+such as [Rails internationalization](https://guides.rubyonrails.org/i18n.html#pluralization) in Ruby or YAML+and [Android String Resources](https://developer.android.com/guide/topics/resources/string-resource.html#Plurals) in XML.+These formats rely on the resource format providing clear delineation of the beginning and end of a pattern.++## Use-Cases++Most messages in any localization system do not contain any expressions, statements or variants.+These should be expressible as easily as possible.++Many messages include expressions that are to be interpolated during formatting.+For example, a greeting like "Hello, user!" may be formatted in many locales with the `user`+being directly set by an input variable.++Sometimes, interpolated values need explicit formatting within a message.+For example, formatting a message like "You have eaten 3.2 apples"+may require the input numerical value+to be formatted with an explicit `minimumFractionDigits` option.++Some messages require multiple variants.+This is often related to plural cases, such as "You have 3 new messages",+where the value `3` is an input and the "messages" needs to correspond with its plural category.++Rarely, messages needs to include leading or trailing whitespace due to+e.g. how they will be concatenated with other text,+or as a result of being segmented from some larger volume of text.++---++Users editing a simple message and who wish to add an `input` or `local` annotiation+to the message do not wish to reformat the message extensively.++Users who have messages that include leading or trailing whitespace+want to ensure that this whitespace is included in the translatable+text portion of the message. Which whitespace characters are displayed at runtime+should not be surprising.++## Requirements++Common things should be easy, uncommon things should be possible.++Developers and translators should be able to read and write the syntax easily in a text editor.++Translators (and their tools) are not software engineers, so we want our syntax+to be as simple, robust, and non-fussy as possible.+Multiple levels of complex nesting should be avoided,+along with any constructs that require an excessive+level of precision on the part of non-technical users.++As MessageFormat 2 will be at best a secondary language to all its users,+it should conform to user expectations and require as little learning as possible.++The syntax should avoid footguns,+in particular as it's passed through various tools during formatting.++ASCII-compatible syntax. While support for non-ASCII characters for variable names,+values, literals, options, and the like are important, the syntax itself should+be restricted to ASCII characters. This allows the message to be parsed+visually by humans even when embedded in a syntax that requires escaping.++Whitespace is forgiving. We _require_ the minimum amount of whitespace and allow+users to format or change unimportant whitespace as much as they want.+This avoids the need for translators or tools to be super pedantic about+formatting.++## Constraints++Limiting the range of characters that need to be escaped in plain text is important.+Following past precedent,+this design doc will only consider encapsulation styles which+start with `{` and end with `}`.++The current syntax includes some plain-ascii keywords:+`input`, `local`, `match`, and `when`.++The current syntax and active proposals include some sigil + name combinations,+such as `:number`, `$var`, `|literal|`, `+bold`, and `@attr`.++The current syntax supports unquoted literal values as operands.++Messages themselves are "simple strings" and must be considered to be a single+line of text. In many containing formats, newlines will be represented as the local+equivalent of `\n`.++## Proposed Design++**Start in text mode**++In this option, whitespace is trimmed from `pattern` constructs unless+the pattern is quoted.++```+Hello world!++Hello {$user}!++{input $now :datetime dateStyle=long}+Hello {$user}. Today is {$now}++{local $now = {:systemGetCurrentTime :datetime dateStyle=medium}}+Hello {$user}. Today is {$now}++{match {$count :number integer=true}}+{when 0}   Hello {$user}. Today is {$now} and you have no geese.+{when one} Hello {$user}. Today is {$now} and you have {$count} goose.+{when few} {  Hello {$user}, this message has spaces on the front and end.  }

How would this work, parsing-wise?

{when few} {  Hello {$user}, this message has spaces on the front and end.  }
           ^
           how does a parser know this isn't a placeholder's open brace?

I think it may be a good idea to consider using double braces for certain fetures (e.g. for placeholders, or as pattern delimiters). Alternatively, we may want to revisit the idea of using double sigils for different meanings, e.g. {%, {[, and {{. See #269.

eemeli

comment created time in 7 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentunicode-org/message-format-wg

Design document for extending functions/options/namespacing

+# Design Proposal Template++Status: **Proposed**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Contributors</dt>+		<dd>@aphillips</dd>+		<dt>First proposed</dt>+		<dd>2023-09-13</dd>+		<dt>Pull Request</dt>+		<dd>#475</dd>+	</dl>+</details>++## Objective++_What is this proposal trying to achieve?_++Implementations will provide the functionality for selection and formatting,+including options and option values.+Much of this functionality will be mandated by the default registry.+We expect that default registry entries will serve the core needs for MF2 users.+However, there are many capabilities available in platform, library,+or operating environment APIs that could be useful to developers and translators+or which might be expected on a specific platform or in a specific programming language.+In addition, we expect to provide support for markup and templating regimes.+These need to be implemented using values not found in the default registry.++An additional hope is that a robust ecosystem of function libraries will be created.+A successful ecosystem will allow users to pick-and-choose or cherry-pick fuctions or+options to use in a given development environment.+Each function, option, or option value extension needs to work as seamlessly as possible+with other add-ons and with the built-in functionality.++To that end, we need to define how externally-authored functions appear in a _message_;+how externally-authored function options (and their values) can be supported;+and what, if any, effects this has on the namespace of functions and options.++## Background++_What context is helpful to understand this proposal?_++One example of potential add-on functionality that can help readers understand this proposal+is the use of _skeletons_ for date and number formatting.++The JavaScript `Intl.DateTimeFormat` API provides for the customization of date and time values+through the use of "option bags". For example:++```js+new Intl.DateTimeFormat("en-GB", {+  weekday: "long",+  year: "numeric",+  month: "short",+  day: "numeric",+  hour: "numeric",+  minute: "numeric",+});+```++This might be expressed in MessageFormat as:++```+Today is {$today :datetime weekday=long year=numeric month=short day=numeric hour=numeric minute=numeric}+```++The ICU family of libraries provide a shorthand mechanism called _skeletons_ for accessing+date and time format options without needing a verbose list of options.+The same message might look like this with a skeleton:++```+Today is {$today :datetime skeleton=EEEEyMdjm}+```++Skeletons are not proposed for inclusion in the default registry+because they are not universally available in all datetime formatting+libraries.++## Use-Cases++_What use-cases do we see? Ideally, quote concrete examples._++- Developers need to add options to the base functions to suit local needs.+  For example, ICU's skeletons as part of the `:datetime` function+  Support for this option needs to be specified for local implemented versions.++- Developers want to write a function and access it from messages.++- Developers want to import 3rd party formatting packages and use the package's+  features from within messages.++- Users want to import two or more formatting packages+  and these might have the same-named functions.+  For example, there might be both an HTML `p` and TTS `p`+  function.++- Users want to control how extensions are referenced in their messages.+  For example, they might wish to make a long namespace name shorter.

I've been thinking about these 3 use-cases, and I think there's a portability risk that we should call out.

I see two groups of users who could be in charge of controlling the namespace name:

  • (a) Authors of custom functions bundled in "packages"; this is similar to how C/C++ or Java handle naming.
  • (b) Developers who import those packages into their MFv2 implementation of choice; this is in line with how modern languages handle imports and namespaces, via some sort of import ... as mechanism.

The current solution proposed by this PR is to go with (b), and allow developers to adjust the namespace if necessary. The risk this brings is that these custom monikers chosen by developers are then hardcoded inside messages, hurting the portability of messages.

Example: If I choose myFoo as a namespace, this naming choice propagates down to all messages (myFoo:number etc.). These messages then require edits when they're migrated to another project, when codebases merge, or even when I change my mind and rename the namespace to myBar.

We can attempt to avoid or mitigate this risk (or accept it).

  • Avoid: Go with (a), i.e. authors choose and control the namespace name. Since this is still likely to produce conflicts, a naming scheme similar to Java's fully qualified names would be recommended. Unfortunately, it means that namespaced names become domain.name.like long.

  • Mitigate: Go with (b), i.e. developers importing packages can give them custom names. Embrace the fact that those custom names are then hardcoded in mesasges and make it easy for tooling to work with them and modify them.

aphillips

comment created time in 7 days

Pull request review commentunicode-org/message-format-wg

Design document for extending functions/options/namespacing

+# Design Proposal Template++Status: **Proposed**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Contributors</dt>+		<dd>@aphillips</dd>+		<dt>First proposed</dt>+		<dd>2023-09-13</dd>+		<dt>Pull Request</dt>+		<dd>#475</dd>+	</dl>+</details>++## Objective++_What is this proposal trying to achieve?_++Implementations will provide the functionality for selection and formatting,+including options and option values.+Much of this functionality will be mandated by the default registry.+We expect that default registry entries will serve the core needs for MF2 users.+However, there are many capabilities available in platform, library,+or operating environment APIs that could be useful to developers and translators+or which might be expected on a specific platform or in a specific programming language.+In addition, we expect to provide support for markup and templating regimes.+These need to be implemented using values not found in the default registry.++An additional hope is that a robust ecosystem of function libraries will be created.+A successful ecosystem will allow users to pick-and-choose or cherry-pick fuctions or+options to use in a given development environment.+Each function, option, or option value extension needs to work as seamlessly as possible+with other add-ons and with the built-in functionality.++To that end, we need to define how externally-authored functions appear in a _message_;+how externally-authored function options (and their values) can be supported;+and what, if any, effects this has on the namespace of functions and options.++## Background++_What context is helpful to understand this proposal?_++One example of potential add-on functionality that can help readers understand this proposal+is the use of _skeletons_ for date and number formatting.++The JavaScript `Intl.DateTimeFormat` API provides for the customization of date and time values+through the use of "option bags". For example:++```js+new Intl.DateTimeFormat("en-GB", {+  weekday: "long",+  year: "numeric",+  month: "short",+  day: "numeric",+  hour: "numeric",+  minute: "numeric",+});+```++This might be expressed in MessageFormat as:++```+Today is {$today :datetime weekday=long year=numeric month=short day=numeric hour=numeric minute=numeric}+```++The ICU family of libraries provide a shorthand mechanism called _skeletons_ for accessing+date and time format options without needing a verbose list of options.+The same message might look like this with a skeleton:++```+Today is {$today :datetime skeleton=EEEEyMdjm}+```++Skeletons are not proposed for inclusion in the default registry+because they are not universally available in all datetime formatting+libraries.++## Use-Cases++_What use-cases do we see? Ideally, quote concrete examples._++- Developers need to add options to the base functions to suit local needs.+  For example, ICU's skeletons as part of the `:datetime` function+  Support for this option needs to be specified for local implemented versions.++- Developers want to write a function and access it from messages.++- Developers want to import 3rd party formatting packages and use the package's+  features from within messages.++- Users want to import two or more formatting packages+  and these might have the same-named functions.+  For example, there might be both an HTML `p` and TTS `p`+  function.++- Users want to control how extensions are referenced in their messages.+  For example, they might wish to make a long namespace name shorter.++- Translators and tools would like a machine-readable way to find out the names+  and option values for add-on packages.++## Requirements++_What properties does the solution have to manifest to enable the use-cases above?_++- Developers must be able to write functions that do not later collide with items in the default registry.+- Developers must be able to write function add-ons that do not later collide with items in the default registry.+- Users should be able to tell visually when an add-on feature has been used vs. a built-in+- Users should be able to resolve conflicts between add-on packages that use the same+  function names without altering add-on packages+-++## Constraints++_What prior decisions and existing conditions limit the possible design?_++- A syntactical prefix or its separator(s) must not collide with characters valid in either+  the prefix or in any of the name productions.++- A prefix must not collide with unquoted literal values.++## Proposed Design++_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._++The actual addition and provisioning of features to an implementation is implementation specific.+Implementations are not required to read the registry format defined by MFv2+or use it for any particular purpose.++> For example, a Java implementation might use the `ServiceProvider` interface to load+> functionality, while a Node application might use `import`.++If an implementation supports user-installed formatters, selectors, function options,+or expression annotations, it must also support providing "namespace" prefixes for+each installed set of functionality.++In this design, each namespace prefix is a short string and is associated with a URL.+The URL is intended to point to some publically-available copy of the add-on library's+registry description, for use by tooling and as a reference to users such as translators.++There is no requirement that an implementation read the document at the end of the URL,+nor validate the contents in general or against the actual functionality installed.++> [!NOTE]+> It is a good idea to use ASCII strings for namespace identifiers.+> Remember that translators (and others) in many different languages and+> with many different keyboards need to be able to enter the prefix.++The namespace prefix is part of the `name` production.+The prefix must be at least one character in length.+It may be as long as desired, although users are cautioned that brevity+is desirable.+The prefix is separated from the name by a colon (U+003A COLON).++The choice of a `:` is intentional, as it already used for function identification+and might be familiar from similar usage in XML namespaces+as well as slightly similar to C++, e.g. `ns::function`.+This design leverages these sorts of "application familiarity"+as well as the current syntax's use of colon as the function sigil.++The namespace prefix `std` is reserved and refers to the default registry.+The default registry will have a well-known URL under `unicode.org`+but this URL is not yet specified.++```abnf+name      = [namespace] name-body+namespace = name-start *name-char namespace-sep+namespace-sep = ":"+name-body = name-start *name-char+```++> [!NOTE]+> The `name-start` and `name-char` productions will have to be altered to+> **_not_** permit U+003A COLON in a name and to otherwise address+> naming concerns.+> This design document does not show the naming changes because there are+> other issues in play for these names.+> For now, just consider that `name-char` will have no colon.++The `name` production as defined here applies to:++- function (selector/formatting) names+- option names+- spanable names+- expression annotation names (if approved)++Examples:++> Add-on function:+>+> ```+> Today is {$today :icu:datetime skeleton=EEEEyMdjm}+> ```+>+> Add on option:+>+> ```+> Today is {$today :datetime icu:skeleton=EEEEyMdjm}+> ```+>+> Add-on spannables (such as markup):+>+> ```+> Today is {+html:a}{$today}{-html:a}+> ```+>+> Add-on expression annotation:+>+> ```+> Today is {$today :datetime @my:annotation}+> ```+>+> Everything altogether all at once. This probably does not work+> correctly, since `std:datetime` may not understand `icu:skeleton`:+>+> ```+> Today is {+html:a}{$today :std:datetime icu:skeleton=EEEEyMdjm @my:annotation}{-html:a}+> ```++Users, such as developers writing messages or translators creating translations,+are not required to type the namespace prefix in message patterns unless there+is ambiguity in the given formatting content or in the runtime.+However, tooling might reject or have difficulty processing values without+the prefix being present.++> For example, if an implementation is using the ICU4J library, any of the+> following messages might be acceptable alternatives:+>+> ```+> Today is {$today :datetime skeleton=EEEEMd}+> Today is {$today :datetime icu:skeleton=EEEEMd}+> Today is {$today :icu:datetime skeleton=EEEEMd}+> Today is {$today :icu:datetime icu:skeleton=EEEEMd}+> ```++### Changes Required by This Design++Implementation of this design will require the following changes:++- Update the ABNF syntax and corresponding text in the syntax.md spec+- Changes to the `name`/`name-char` productions and related naming productions+- Additions to the formatting.md spec regarding namespace resolution+  to ensure that the correct function is called+- Additional error type for namespace resolution failure; alternatively+  this might take the form of the existing resolution error+- Addition of namespace to the data model for all relative items++It is possible that the registry description will need to include slots for URL+and default namespace name.++### Potential Negatives++This design is based on the assertion that implementors will provide an+extension mechanism and that users will want to use that mechanism to install formatting+or selection functionality.+Any non-standard functions, options, option values, or expressions have the potential+to be disruptive or fragmenting to the overall tooling or localization space.+Any extension that is widely adopted would thus be better off in the default registry+if at all possible.+On the other hand, language- or platform-specific extensions can make MFv2 feel+more "fluent" or consistent for users in a given environment.++## Alternatives Considered++_What other solutions are available?_+_How do they compare against the requirements?_+_What other properties they have?_++### Choice of namespace separator++The choice of the `:` character as a separator is, as noted above, a considered part+of the design, but it is not required.+Other characters are possible for the same purpose.++A concern is that `name-char` reserves `:`, `-`, and `.`, which are probably the+most likely candidates.+Choosing one of these characters would require altering the `name-char` (and+thus `name`) production.+Moving to use `NCName` instead of `Nmtoken` as a basis for `name` would address+the use of `:`.+Moving to `-` or `.` would not require alterations of the syntax if we decided+that namespacing is not formally a part of the specification but, rather, is+just a convention.++Of the remaining characters not currently in use, the most obvious candidates would+be `/` and `~`.++> Here are examples of the options:+>+> ```+> {$today :ns:function  ns:option=foo}{+ns:a}{-ns:a}+> {$today :ns-function  ns-option=foo}{+ns-a}{-ns-a}+> {$today :ns.function  ns.option=foo}{+ns.a}{-ns.a}+> {$today :ns/function  ns/option=foo}{+ns/a}{-ns/a}+> {$today :ns~function  ns~option=foo}{+ns~a}{-ns~a}+> ```++### No namespacing++Each implementation can install whatever additional functionality.+It is up to the implementation to describe what is permitted and to check messages.+Users will have to RTFM.++- **+** Flexible+- **-** Does not promote a healthy ecosystem of add-on packages+- **-** Does not supply a mechanism for tooling to leverage++### Reverse-domain-name namespacing++Use `com.foo.bar.baz.Function` type naming for functions, options, or expressions.++> ```+> Today is {$today :com.example.foo.datetime dateStyle=short}+> Today is {$today :datetime com.example.foo.skeleton=EEEEMd}+> ```++- **+** Familiarity. This is a familiar structure for developers.+- **-** Verbose. The resulting names are long and difficult to parse visually++### Namespacing sigils are customary not normative

Why do you prefer this one?

IIUC, it doesn't address the requirement about machine-readability: Translators and tools would like a machine-readable way to find out the names and option values for add-on packages.

Additionally, messages can end up using different namespacing conventions, in particular in big projects. Coupled with the first issue, unifying them automatically/programmatically can be problematic.

aphillips

comment created time in 8 days

PullRequestReviewEvent

pull request commentunicode-org/message-format-wg

Allow colon in name-start, matching XML Name

Our tie to XML is somewhat nebulous in any case: it's to ensure that LDML constructs are always supported. But it's doubtful that CLDR will test the limits of XML (and maybe they should be compatible with us in the future 🤣)

Right, this is important. Plus, realistically, any LDML troublemaker can still be quoted if needed. OTOH, I feel rather strongly about not differing by a single character from XML's Nmtoken.

Out of curiosity, I browsed the CLDR to look for any such troublemakers (i.e. LDML values which are XML Nmtoken but are not our current nmtoken). I only found -x and -Inf in RBNF.

eemeli

comment created time in 8 days

pull request commentunicode-org/message-format-wg

Allow colon in name-start, matching XML Name

Solving the syntax will help us close out naming, not the other way around.

Thanks, this sounds aligned with the process I'd like us to follow here:

  1. Identify problems (e.g. namespacing, spannables, text-first).
  2. Gather the use-cases.
  3. Distill requirements from them.
  4. Design a solution.
  5. Document newly imposed constraints.
  6. Change other parts of the syntax according to the constraints.

This is why I've been holding off the name/nmtoken discussion — we might not even need it if some of our other discussions in flight require to go with something else.

I'd be OK hitting pause on this PR, too.

eemeli

comment created time in 8 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentunicode-org/message-format-wg

Add negative-start rule

That agreement was based on the assumption that we would have completed the design of open/close in Seville. That didn't happen and the design is still ongoing in #470.

I want to this PR to stay open because I want this to be a friction point, so that we don't get complacent with the current proposed design on open/close spannables.

An alternative which I'd be happy to go with instead is to remove the current open/close features from the spec on the main branch, since it's pending a refactor in #470 anyways. We can then fix the negative literals the proper way, and then discuss if we want to compromise in order to enable the + and - syntax for spannables.

eemeli

comment created time in 9 days

Pull request review commentunicode-org/message-format-wg

Design document for extending functions/options/namespacing

+# Design Proposal Template++Status: **Proposed**++<details>+	<summary>Metadata</summary>+	<dl>+		<dt>Contributors</dt>+		<dd>@aphillips</dd>+		<dt>First proposed</dt>+		<dd>2023-09-13</dd>+		<dt>Pull Request</dt>+		<dd>#475</dd>+	</dl>+</details>++## Objective++_What is this proposal trying to achieve?_++Implementations will provide the functionality for selection and formatting,+including options and option values.+Much of this functionality will be mandated by the default registry.+We expect that default registry entries will serve the core needs for MF2 users.+However, there are many capabilities available in platform, library,+or operating environment APIs that could be useful to developers and translators+or which might be expected on a specific platform or in a specific programming language.+In addition, we expect to provide support for markup and templating regimes.+These need to be implemented using values not found in the default registry.++An additional hope is that a robust ecosystem of function libraries will be created.+A successful ecosystem will allow users to pick-and-choose or cherry-pick fuctions or+options to use in a given development environment.+Each function, option, or option value extension needs to work as seamlessly as possible+with other add-ons and with the built-in functionality.++To that end, we need to define how externally-authored functions appear in a _message_;+how externally-authored function options (and their values) can be supported;+and what, if any, effects this has on the namespace of functions and options.++## Background++_What context is helpful to understand this proposal?_++One example of potential add-on functionality that can help readers understand this proposal+is the use of _skeletons_ for date and number formatting.++The JavaScript `Intl.DateTimeFormat` API provides for the customization of date and time values+through the use of "option bags". For example:++```js+new Intl.DateTimeFormat("en-GB", {+  weekday: "long",+  year: "numeric",+  month: "short",+  day: "numeric",+  hour: "numeric",+  minute: "numeric",+});+```++This might be expressed in MessageFormat as:++```+Today is {$today :datetime weekday=long year=numeric month=short day=numeric hour=numeric minute=numeric}+```++The ICU family of libraries provide a shorthand mechanism called _skeletons_ for accessing+date and time format options without needing a verbose list of options.+The same message might look like this with a skeleton:++```+Today is {$today :datetime skeleton=EEEEyMdjm}+```++Skeletons are not proposed for inclusion in the default registry+because they are not universally available in all datetime formatting+libraries.++## Use-Cases++_What use-cases do we see? Ideally, quote concrete examples._++- Developers need to add options to the base functions to suit local needs.+  For example, ICU's skeletons as part of the `:datetime` function+  Support for this option needs to be specified for local implemented versions.++- Developers want to write a function and access it from messages.++- Developers want to import 3rd party formatting packages and use the package's+  features from within messages.++- Developers wish to import two or more formatting packages+  and these might have the same-named functions.+  For example, there might be both an HTML `p` and TTS `p`+  function.++- Translators and tools would like a machine-readable way to find out the names+  and option values for add-on packages.++## Requirements++_What properties does the solution have to manifest to enable the use-cases above?_++- Developers must be able to write functions that do not later collide with items in the default registry.+- Developers must be able to write function add-ons that do not later collide with items in the default registry.+- Users should be able to tell visually when an add-on feature has been used vs. a built-in+- Users should be able to resolve conflicts between add-on packages that use the same+  function names without altering add-on packages+-++## Constraints++_What prior decisions and existing conditions limit the possible design?_++- A syntactical prefix must not collide with `nmtoken`, to avoid parsing ambiguities with unquoted literals...++## Proposed Design++_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._++The actual addition and provisioning of features to an implementation is implementation specific.+Implementations are not required to read the registry format defined by MFv2+or use it for any particular purpose.++> For example, a Java implementation might use the `ServiceProvider` interface to load+> functionality, while a Node application might use `import`.++If an implementation supports user-installed formatters, selectors, function options,+or expression annotations, it must also support providing "namespace" prefixes for+each installed set of functionality.++In this design, each namespace prefix is a short string and is associated with a URL.+The URL is intended to point to some publically-available copy of the add-on library's+registry description, for use by tooling and as a reference to users such as translators.++There is no requirement that an implementation read the document at the end of the URL,+nor validate the contents in general or against the actual functionality installed.++> [!NOTE]+> It is a good idea to use ASCII strings for namespace identifiers.+> Remember that translators (and others) in many different languages and+> with many different keyboards need to be able to enter the prefix.++The namespace prefix is part of the `name` production.+The prefix is limited to eight characters in length and MUST be at least two characters+in length.+The prefix is separated from the name by a colon (U+003A COLON).+The namespace prefix `std` is reserved and refers to the default registry.

The way I understood the design is that {{1234 :std:number}} is guaranteed to call the implementation's builtin number formatter (which conforms to the requirements stated by the default registry), while {{1234 :number}} may call a different one, depending on (a) whether the implementation allows registering packages of custom functions under the anonymous namespace, and (b) such custom package has been somehow made available to the runtime.

If that's the intent, perhaps having some explicit mention of the anonymous namespace would be helpful?

aphillips

comment created time in 10 days

PullRequestReviewEvent
more