A method of converting machine codes to human-readable text in any language and phrasing.
Abstract
An on-chain system for providing user feedback by converting machine-efficient codes into human-readable strings in any language or phrasing. The system does not impose a list of languages, but rather lets users create, share, and use the localizated text of their choice.
Motivation
There are many cases where an end user needs feedback or instruction from a smart contract. Directly exposing numeric codes does not make for good UX or DX. If Ethereum is to be a truly global system usable by experts and lay persons alike, systems to provide feedback on what happened during a transaction are needed in as many languages as possible.
Returning a hard-coded string (typically in English) only serves a small segment of the global population. This standard proposes a method to allow users to create, register, share, and use a decentralized collection of translations, enabling richer messaging that is more culturally and linguistically diverse.
There are several machine efficient ways of representing intent, status, state transition, and other semantic signals including booleans, enums and ERC-1066 codes. By providing human-readable messages for these signals, the developer experience is enhanced by returning easier to consume information with more context (ex. revert). End user experience is enhanced by providing text that can be propagated up to the UI.
Specification
Contract Architecture
Two types of contract: LocalizationPreferences, and Localizations.
The LocalizationPreferences contract functions as a proxy for tx.origin.
A proxy contract that allows users to set their preferred Localization. Text lookup is delegated to the user’s preferred contract.
A fallback Localization with all keys filled MUST be available. If the user-specified Localization has not explicitly set a loalization (ie. textFor returns ""), the LocalizationPreferences MUST redelegate to the fallback Localization.
Registers a user’s preferred Localization. The registering user SHOULD be considered tx.origin.
functionset(Localization_localization)external;
textFor
Retrieve text for a code found at the user’s preferred Localization contract.
The first return value (bool _wasFound) represents if the text is available from that Localization, or if a fallback was used. If the fallback was used in this context, the textFor’s first return value MUST be set to false, and is true otherwise.
"Špeĉiäl chârãçtérs are permitted""As are non-Latin characters: アルミ缶の上にあるみかん。""Emoji are legal: 🙈🙉🙊🎉""Feel free to be creative: (ノ◕ヮ◕)ノ*:・゚✧"
Templates
Template strings are allowed, and MUST follow the ANSI C printf conventions.
"Satoshi's true identity is %s"
Text with 2 or more arguments SHOULD use the POSIX parameter field extension.
"Knock knock. Who's there? %1$s. %1$s who? %2$s!"
Rationale
bytes32 Keys
bytes32 is very efficient since it is the EVM’s base word size. Given the enormous number of elements (card(A) > 1.1579 × 1077), it can embed nearly any practical signal, enum, or state. In cases where an application’s key is longer than bytes32, hashing that long key can map that value into the correct width.
Designs that use datatypes with small widths than bytes32 (such as bytes1 in ERC-1066) can be directly embedded into the larger width. This is a trivial one-to-one mapping of the smaller set into the the larger one.
Local vs Globals and Singletons
This spec has opted to not force a single global registry, and rather allow any contract and use case deploy their own system. This allows for more flexibility, and does not restrict the community for opting to use singleton LocalizationPreference contracts for common use cases, share Localizations between different proxys, delegate translations between Localizations, and so on.
There are many practical uses of agreed upon singletons. For instance, translating codes that aim to be fairly universal and integrated directly into the broader ecosystem (wallets, frameworks, debuggers, and the like) will want to have a single LocalizationPreference.
Rather the dispersing several LocalizationPreferences for different use cases and codes, one could imagine a global “registry of registries”. While this approach allows for a unified lookups of all translations in all use cases, it is antithetical to the spirit of decentralization and freedom. Such a system also increases the lookup complexity, places an onus on getting the code right the first time (or adding the overhead of an upgradable contract), and need to account for use case conflicts with a “unified” or centralized numbering system. Further, lookups should be lightweight (especially in cases like looking up revert text).
For these reasons, this spec chooses the more decentralized, lightweight, free approach, at the cost of on-chain discoverability. A registry could still be compiled, but would be difficult to enforce, and is out of scope of this spec.
Off Chain Storage
A very viable alternative is to store text off chain, with a pointer to the translations on-chain, and emit or return a bytes32 code for another party to do the lookup. It is difficult to guarantee that off-chain resources will be available, and requires coordination from some other system like a web server to do the code-to-text matching. This is also not compatible with revert messages.
ASCII vs UTF-8 vs UTF-16
UTF-8 is the most widely used encoding at time of writing. It contains a direct embedding of ASCII, while providing characters for most natural languages, emoji, and special characters.
Returning a blank string to the requestor fully defeats the purpose of a localization system. The two options for handling missing text are:
A generic “text not found” message in the preferred language
The actual message, in a different language
Generic Option
This designed opted to not use generic fallback text. It does not provide any useful information to the user other than to potentially contact the Localization maintainer (if one even exists and updating is even possible).
Fallback Option
The design outlined in this proposal is to providing text in a commonly used language (ex. English or Mandarin). First, this is the language that will be routed to if the user has yet to set a preference. Second, there is a good chance that a user may have some proficiency with the language, or at least be able to use an automated translation service.
Knowing that the text fell back via textFors first return field boolean is much simpler than attempting language detection after the fact. This information is useful for certain UI cases. for example where there may be a desire to explain why localization fell back.
Decentralized Text Crowdsourcing
In order for Ethereum to gain mass adoption, users must be able to interact with it in the language, phrasing, and level of detail that they are most comfortable with. Rather than imposing a fixed set of translations as in a traditional, centralized application, this EIP provides a way for anyone to create, curate, and use translations. This empowers the crowd to supply culturally and linguistically diverse messaging, leading to broader and more distributed access to information.
printf-style Format Strings
C-style printf templates have been the de facto standard for some time. They have wide compatibility across most languages (either in standard or third-party libraries). This makes it much easier for the consuming program to interpolate strings with low developer overhead.
Parameter Fields
The POSIX parameter field extension is important since languages do not share a common word order. Parameter fields enable the reuse and rearrangement of arguments in different localizations.
("%1$s is an element with the atomic number %2$d!","Mercury",80);// => "Mercury is an element with the atomic number 80!"
Simplified Localizations
Localization text does not require use of all parameters, and may simply ignore values. This can be useful for not exposing more technical information to users that would otherwise find it confusing.
#!/usr/bin/env rubysprintf("%1$s é um elemento","Mercurio",80)# => "Mercurio é um elemento"
#!/usr/bin/envclojure(format"Element #%2$s""Mercury"80);; => Element #80
Interpolation Strategy
Please note that it is highly advisable to return the template string as is, with arguments as multiple return values or fields in an event, leaving the actual interpolation to be done off chain.
#!/usr/bin/env node
varprintf=require('printf');const{returnValues:{templateCode,atomCode,atomicNumber}}=eventResponse;consttemplate=awaitAppText.textFor(templateCode);// => "%1$s ist ein Element mit der Ordnungszahl %2$d!"constatomName=awaitPeriodicTableText.textFor(atomCode);// => "Merkur"printf(template,atomName,80);// => "Merkur ist ein Element mit der Ordnungszahl 80!"
Unspecified Behaviour
This spec does not specify:
Public or private access to the default Localization
Who may set text
Deployer
onlyOwner
Anyone
Whitelisted users
and so on
When text is set
constructor
Any time
Write to empty slots, but not overwrite existing text
and so on
These are intentionally left open. There are many cases for each of these, and restricting any is fully beyond the scope of this proposal.
Implementation
pragmasolidity^0.4.25;contractLocalization{mapping(bytes32=>string)privatedictionary_;constructor()public{}// Currently overwrites anything
functionset(bytes32_code,string_message)external{dictionary_[_code]=_message;}functiontextFor(bytes32_code)externalviewreturns(string_message){returndictionary_[_code];}}contractLocalizationPreference{mapping(address=>Localization)privateregistry_;LocalizationpublicdefaultLocalization;bytes32privateempty_=keccak256(abi.encodePacked(""));constructor(Localization_defaultLocalization)public{defaultLocalization=_defaultLocalization;}functionset(Localization_localization)externalreturns(bool){registry_[tx.origin]=_localization;returntrue;}functionget(bytes32_code)externalviewreturns(bool,string){returnget(_code,tx.origin);}// Primarily for testing
functionget(bytes32_code,address_who)publicviewreturns(bool,string){stringmemorytext=getLocalizationFor(_who).textFor(_code);if(keccak256(abi.encodePacked(text))!=empty_){return(true,text);}else{return(false,defaultLocalization.textFor(_code));}}functiongetLocalizationFor(address_who)internalviewreturns(Localization){if(Localization(registry_[_who])==Localization(0)){returnLocalization(defaultLocalization);}else{returnLocalization(registry_[tx.origin]);}}}