Skip to the main content of this page.
The sable mammal, Martes zibellina
SABLE Programming Language

Example 3: Text Substitution Framework

Now let's look at a real-world, useful program. This is a framework that makes text substitutions within a string.

Imagine you're developing a text editor for programmers. The user can specify text to appear in the header and footer when printing, but some of the data needs to be parameterized. Using (simplified) Linux-style tags to represent a parameter, these settings might appear as:

Header:
Footer:

Furthermore, the user can create customized menu items or command buttons to execute a program to build a file. Again, some items will be parameterized, and we might even want to support parameters taken from the environment variables. Since we're creating commands here, when running under Windows we would want to use Windows-style substitutions. A command may appear as:

Menu Name:
Command:

We need some code which can remove the placeholders and replace them with values, e.g. replace '$PAGE' with '1' when printing the first page. In the two cases above, the format of placeholders is different, but the task is essentially the same. Plus, we can imagine needing to perform the same task in other programs with entirely different placeholder formats. This calls for a generalized framework.

Our substitution framework defines these terms:

Substitution PatternA part of the input text which needs to be replaced. Ex: '${TAB}'
Lookup KeyPart of the Substitution Pattern which acts as a key telling what to replace it with. Ex: 'TAB'
Substitution ResultThe text to emit in place of a Pattern. Ex: A string containing only the TAB character.
MappingA set of STRING key to STRING value pairs, each mapping a potential Lookup Key to its Substitution Result.
Substitution StrategyHow a Substitution Pattern is identified, turned into a Lookup Key, and mapped to a Substitution Result, and what to do if there is no mapping defined.

The core of our framework is an abstract class Kuler.Text.SUBSTITUTER which scans a character stream looking for Substitution Patterns and writes the result to an output stream. Each specific subclass encapsulates a Substitution Strategy. They communicate using abstract methods whereby the SUBSTITUTER queries each character's role (if any) in a Substitution Pattern. The return value contains bit flags which the substituter uses to control its behavior.

SUBSTITUTER.SUBSTITUTION_CHAR_FLAGS
    {~ enumeration flags topsecret}
    "SUBSTITUTER uses these flags internally to control processing."
     "These flags tell the role of a character in a Substitution Pattern. A character..."
    |SubstFlag|.        "-is part of the Substitution Pattern"
    |LookupFlag|.       "-is part of the Lookup Key"
    |EndFlag|.          "-terminates the Substitution Pattern"
    |InvalidFlag|.      "-is invalid within a Substitution Pattern"

This enumeration class is nested in SUBSTITUTER, which is defined separately. This is a "flags" enumeration, so its literal values are assigned by bits. Not all flags combinations are valid; for example, a substituter should never describe a character with only #LookupFlag; that would mean the character is part of the Lookup Key but not part of the Substitution Pattern. Therefore, this class is given >#topsecret accessibility, so that only SUBSTITUTER can reference it directly.

Specific substituters instead use this enumeration to describe characters:

SUBSTITUTER.SUBSTITUTION_CHAR_TYPE
    {~ enumeration restricted parent: SUBSTITUTION_CHAR_FLAGS}
    "Framework methods use these values to specify how to process input characters.
     See comments of the abstract methods in SUBSTITUTER for more information."
     "These are combinations of flags used internally by SUBSTITUTER."
    |NotSubst|  := 0.
    |IsSubst|   := #SubstFlag.
    |IsLookup|  := #SubstFlag | #LookupFlag.
    |EndLookup| := #SubstFlag | #LookupFlag | #EndFlag.
    |EndSubst|  := #SubstFlag | #EndFlag.
    |ExitSubst| := #EndFlag.
    |BadChar|   := #InvalidFlag.

With >#restricted accessibility, any concrete substituter subclass can access it. (This enumeration "inherits" the literals from its parent.)

Here is the assembly configuration and the definition of our main class. Most of what's here, you've seen before. We'll cover any new syntax in the next pages.

SABLE assemblysubstituter.netmodule
    {~ module
        reference: 'mscorlib.dll';
        use: #System;
        use: #System.Collections.Generic;
        use: #System.IO;
        use: #System.Text}
/-#Kuler.TextSUBSTITUTER
    {~ object abstract}
    "Root class of a text substitution framework. Instances replace text items
     within a STRING or STREAM based on mappings in a {DICTIONARY[STRING,STRING]}.
      Construct with 'Mappings' from a 'Lookup Key' to a 'Substitution Result'.
     The Lookup Key is derived from a 'Substitution Pattern' found in the
     input stream; that pattern is replaced by the mapped Substitution Result.
     How the Substitution Pattern is identified, turned into a Lookup Key, and
     mapped to a Substitution Result, and what to do if there is no Mapping, is a
     'Substitution Strategy'; a concrete subclass implements a Substitution Strategy
     by overridding certain methods."
     {~ durables restricted}
    |mapping| {DICTIONARY[STRING,STRING]}.  "-Maps Lookup Keys to Substitution Results"
    |substPattern| := {STRING_BUILDER} new.
    |lookupKey|    := {STRING_BUILDER} new.
     {~ fields topsecret}
    |charPosition|  := -1.      "-Current position on the input"
    |substPosition| := -1.      "-Start position of the current Substitution Pattern"
+-'constructors' restrictednewMapping: mapping {DICTIONARY[STRING,STRING]}.
    "Initialize with a set of Lookup Key -> Substitution Result mappings.
     This class is allowed to modify the :mapping, so if the caller needs
     it intact, it should pass in a copy."
     My.mapping := mapping.
    Me initialize.
=-'framework' restrictedinitialize.
    {~ virtual}
    {~ cilName: 'Initialize'}
    "Subclass-specific initialization run by constructors after
     the internal :mapping is assigned. This version does nothing."
initiationCharUsage: ch {CHAR}  ^{SUBSTITUTION_CHAR_TYPE}.
    {~ abstract}
    {~ cilName: 'InitiationCharUsage'}
    "How should :ch be used as an Initiation Character? The character...
     #NotSubst  = is not part of a Substitution Pattern; emit it unchanged
     #IsSubst   = starts a Pattern, but is not part of the Lookup Key
     #IsLookup  = starts a Pattern and is part of the Lookup Key
     #EndLookup = starts and ends a Pattern and is part of the Lookup Key
     :ch will never be $END."
    {~ require: 'ch not end-of-stream' as: [ch ~= $END]}
    {~ ensure: 'valid result' as:
        [Result inByBits: ##(#NotSubst, #IsSubst, #IsLookup, #EndLookup)]}
patternCharUsage: ch {CHAR}  ^{SUBSTITUTION_CHAR_TYPE}.
    {~ abstract}
    {~ cilName: 'PatternCharUsage'}
    "How should :ch be used as a Pattern Character? The character...
     #IsSubst   = is part of the Pattern but not the Lookup Key
     #IsLookup  = is part of the Lookup Key
     #EndLookup = ends the Pattern and is part of the Lookup Key
     #EndSubst  = ends the Pattern but is not part of the Lookup Key
     #ExitSubst = is not part of the Substitution Pattern
     #BadChar   = is illegal inside the Substitution Pattern
      If end-of-stream ($END) is reached in error, answer #BadChar. If not an error,
     the result must be #ExitSubst."
    {~ ensure: 'valid result' as:
        [Result inByBits: ##(#IsSubst, #IsLookup, #EndLookup, #EndSubst, #ExitSubst, #BadChar)]}
    {~ ensure: '$END not in pattern' as: [ch = $END implies: [Result & #SubstFlag = 0]]}
substitutionResult  ^{STRING}.
    {~ abstract}
    {~ cilName: 'SubstitutionResult'}
    "The Substitution Result for the current :lookupKey, which was
     taken from a Substitution Pattern found in the input.
     It may come from the Mapping or from an algorithm."
throw: message {STRING}  ^{NO_RETURN}.
    {~ cilName: 'Throw'}
    "Throw a SUBSTITUTION_EXCEPTION with the given message and
     the character position of the current Substitution Pattern."
     ^^^{SUBSTITUTION_EXCEPTION} newMessage: message charPosition: substPosition
=-'substitute'substitute: instring {STRING}  ^{STRING}.
    {~ cilName: 'Substitute'}
    "Return :instring with substitutions made according to this class's
     Substitution Strategy and the Mapping given at initialization time."
     |size| := instring size.
    size += (size / 10 max: 200).
    |output| := {STRING_WRITER} newBuilder: ({STRING_BUILDER} newCapacity: size).
    Me substitute: ({STRING_READER} on: instring) into: output.
    ^output to_STRING
substitute: input {TEXT_READER} into: output {TEXT_WRITER}.
    {~ cilName: 'Substitute'}
    "Stream :input to :output making substitutions according to this subclass's
     Substitution Strategy and the Mapping given at initialization time."
     charPosition := substPosition := -1.
    "Set up local access to these fields."
    |substPattern| := My.substPattern.
    |lookupKey|    := My.lookupKey.
     |ch| := input readChar. charPosition += 1.
     "Read input stream until we hit the end."
    [ch ~= $END] whileTrue:
        [|usage| := My initiationCharUsage: ch.
         usage = #NotSubst
          then:
            [output write: ch.
             ch := input readChar. charPosition += 1]
          else:
            [substPosition := charPosition.
             substPattern clear.
             lookupKey clear.
              "Read until we hit the end of the pattern."
             [usage & #SubstFlag  ~= 0 then: [substPattern append: ch].
              usage & #LookupFlag ~= 0 then: [lookupKey append: ch].
              usage & #EndFlag = 0
              ] whileTrue:
                [ch := input readChar. charPosition += 1.
                 usage := My patternCharUsage: ch.
                 usage & #InvalidFlag ~= 0 then:
                    [^^Me throw:
                        'Substitution has an illegal character after "' + substPattern + '"'] ].
              "If the substitution pattern ended with a character which is part
              of the pattern, read the next character."
             usage & #SubstFlag ~= 0 then: [ch := input readChar. charPosition += 1].
              output write: My substitutionResult]].

So that you don't have to remember whether to use >#length or >#count, all collections alias these as >#size. This also uses >#max:, a macro defined on COMPARABLE objects which returns the greater of two values.

Above we used >#+=, commonly known in C languages but not typically seen in Smalltalk. SABLE lets you define assignment operators on a class in terms of its matching operator. The compiler simply converts the code accordingly, using an assignment for variables or the corresponding setter for getter messages. For example:

field += expression.
aReceiver value += expression.     "Gets converted to..."
field := field + (expression).
|temp| := aReceiver. temp value: (temp value + (expression)).

Our framework needs one more class, an exception to throw on errors.

/-#Kuler.TextSUBSTITUTER.SUBSTITUTION_EXCEPTION
    {~ object parent: APPLICATION_EXCEPTION}
    "An exception which substituters can throw. In addition to an error message,
     this carries the character position in the input text of the substitution
     pattern where the error occurred."
     {~ fields public}
    |charPosition| {INT32} {~ cilName: 'CharPosition'}.
+-'constructors'newMessage: message {STRING} charPosition: pos {INT32}.
    {Base} newMessage: message.
    charPosition := pos.
newMessage: message {STRING}
charPosition: pos {INT32}
innerException: innerException {EXCEPTION?}.
    {Base} newMessage: message innerException: innerException.
    charPosition := pos.
+-'constructors' restricteddeserializeFrom: info {System.Runtime.Serialization.SERIALIZATION_INFO}
context: context {System.Runtime.Serialization.STREAMING_CONTEXT}.

Let's look at the implementation of two strategies. Simple, aren't they?

/-#Kuler.TextPERCENT_SUBSTITUTER
    {~ object parent: SUBSTITUTER}
    "Concrete subclass of SUBSTITUTER which defines a substitution strategy.
      Our substitution patterns consist of %TEXT% and are entirely replaced by the
     mapped result based on the mapping for TEXT defined in the properties.
     If the mapping is not defined, then the substitution pattern is emitted unchanged.
     Also, '%%' is replaced by '%'. See method comments for more details."
+-'constructors'newMapping: mapping {DICTIONARY[STRING,STRING]}.
    "Initialize with a set of Lookup String -> Substitution Result mappings.
     There must not be a mapping for the empty string (as key); I will add one."
=-'framework' restrictedinitialize.
    {~ override}
    "Insert a mapping from an empty Lookup String to the single Initiation Character.
     This way, the Substitution Pattern '%%' will map directly to the Result '%'."
     mapping at: '' add: '%'.
initiationCharUsage: ch {CHAR}  ^{SUBSTITUTION_CHAR_TYPE}.
    {~ override}
    "Initiate on $% but don't include it in the Lookup Key."
     ^ch ~= $% then: [#NotSubst] else: [#IsSubst]
patternCharUsage: ch {CHAR}  ^{SUBSTITUTION_CHAR_TYPE}.
    {~ override}
    "Terminate on '%' but don't include it in the Lookup Key.
     Also, it's a substitution exception if the Key has anything
     but letters, digits, and underscore."
     ^IF test
        if: [ch isLetterOrDigit] then: [#IsLookup];
        if: [ch = $_]            then: [#IsLookup];
        if: [ch = $%]            then: [#EndSubst];
        else:                          [#BadChar]
substitutionResult  ^{STRING}.
    {~ override}
    "This is a simple lookup in the Mapping, and >#initialize takes care
     of the degenerate case of substituting '%' for an empty Lookup Key.
     If the Key isn't mapped, then emit the Substitution Pattern unchanged."
     ^mapping at: lookupKey to_STRING ifAbsent: [substPattern to_STRING]
/-#Kuler.TextDOLLAR_SUBSTITUTER
    {~ object parent: SUBSTITUTER}
    "Concrete subclass of SUBSTITUTER which defines a substitution strategy.
      Our substitution patterns consist of $TEXT or ${TEXT} and are entirely replaced
     by the mapped result based on the mapping for TEXT defined in the properties.
     If the mapping is not defined, then the substitution pattern is emitted unchanged.
     Also, '$$' is replaced by '$'. See method comments for more details."
     {~ literals}
    |ValidSymbols| := '.-:'.
     {~ fields}
    |inBraces| {BOOLEAN}.
+-'constructors'newMapping: mapping {DICTIONARY[STRING,STRING]}.
    "Initialize with a set of Lookup String -> Substitution Result mappings.
     There must not be a mapping for the dollar sign (as key); I will add one."
=-'framework' restrictedinitialize.
    {~ override}
    "Insert a mapping from a dollar sign Lookup String to the single Initiation Character.
     This way, the Substitution Pattern '$$' will map directly to the Result '$'."
     mapping at: '$' add: '$'.
initiationCharUsage: ch {CHAR}  ^{SUBSTITUTION_CHAR_TYPE}.
    {~ override}
    "Initiate on $$ but don't include it in the Lookup Key."
     ^ch ~= $$ then: [#NotSubst] else: [inBraces := False. #IsSubst]
patternCharUsage: ch {CHAR}  ^{SUBSTITUTION_CHAR_TYPE}.
    {~ override}
    "If the Substitution Pattern uses braces, e.g. ${TEXT}, then the Pattern
     ends with a closing brace, and the Lookup Key can contain letters, digits,
     underscores, and ValidSymbols; any other character inside braces is invalid.
     If the Pattern does not use braces, e.g. $TEXT, the Lookup Key contains
     letters, digits, and underscores; any other character terminates the pattern."
     substPattern size = 1 then:
        [IF test
          if: [ch = ${] then: [inBraces := True. ^#IsSubst];
          if: [ch = $$] then: [^#EndLookup] ].
     ch isLetterOrDigit  then: [^#IsLookup].
    ch = $_             then: [^#IsLookup].
    inBraces then:
        [IF test
            if: [ch = $}]                   then: [^#EndSubst];
            if: [ValidSymbols contains: ch] then: [^#IsLookup];
            else: [^#BadChar] ].
    ^#ExitSubst
substitutionResult  ^{STRING}.
    {~ override}
    "This is a simple lookup in the Mapping, and >#initialize takes care
     of the degenerate case of substituting '$' for '$$'.
     If the Key isn't mapped, then emit the Substitution Pattern unchanged."
     |lookup| := lookupKey to_STRING.
    lookup isEmpty
        then: [^^Me throw: 'Substitution has an empty lookup key.'].
    ^mapping at: lookup ifAbsent: [substPattern to_STRING]

We wrap up with a little test program.

SABLE assemblysubstituterTest.exe
    {~ console
        entryClass: #Kuler.Text.Test.SUBSTITUTION_TESTER method: #test;
         reference: 'mscorlib.dll';
        use: #System;
        use: #System.Collections.Generic;
         adopt: 'substituter.netmodule';
        use: #Kuler.Text}
/-#Kuler.Text.TestSUBSTITUTION_TESTER
    {~ object static}
    "Test the substitution framework."
*-'entrypoint'test.
    "Just print some substituted text."
     |mapping| :=
        {DICTIONARY[STRING,STRING]} new
            at: 'TITLE' put: 'Dr.';
            at: 'FI'    put: 'F.';
            at: 'First' put: 'Frasier';
            at: 'Last'  put: 'Crane';
            at: 'job'   put: 'psychiatrist';
            yourself.
     |substituter| {SUBSTITUTER} := {DOLLAR_SUBSTITUTER} newMapping: ({} newFrom: mapping).
    CONSOLE writeLine: '--- DOLLAR_SUBSTITUTER'.
    CONSOLE writeLine:
      (substituter substitute:
        'Let me introduce you to my good friend.$LINE`'
        '$FI$Last has a $$1,000,000 smile.$LINE`'
        'He is famous; you may $know him$LINE`'
        'as ${TITLE} $First $Last, a $job.$LINE`$LINE`' lineEscaped).
     substituter := {PERCENT_SUBSTITUTER} newMapping: mapping.
    CONSOLE writeLine: '--- PERCENT_SUBSTITUTER'.
    CONSOLE writeLine:
      (substituter substitute:
        'Let me introduce you to my good friend.$LINE`'
        '%FI%%Last% is a 100%% nice fellow.$LINE`'
        'He is famous; you may %know% him$LINE`'
        'as %TITLE% %First% %Last%, a %job%.$LINE`' lineEscaped).

It's Better in SABLE

This framework was originally written in Java (by this author) for purposes similar to what was described above. It was translated into SABLE to become part of the SABLE compiler. A single subclass implements the Substitution Strategy described for the >#escaped primitive.

The algorithm improved when translated to SABLE. The major contributor was the >#whileTrue: loop which accepts a sequence of statements before the loop test expression. The second such loop in >#substitute:into: uses this. In Java, there is no such possibility with the "while" loop's Boolean expression. While the current algorithm can be translated back into Java using [while (true) {... if (test) break; ...}], this buries the test inside the loop body, making it less clear whether and where the loop will end. The language syntax didn't help reveal this solution. Instead, the author used the kludgey BufferedReader.mark() feature. This is a case where language affected the way one thought about a problem and acted as a hinderance or a help. (Note: This author does not endorse all possible conclusions of the linked Sapir-Whorf hypothesis.)