Macro processing (M0 pre 0.4 macro processor)

This information is useful to understand the characteristics and flexibility of the processing of macros in m0. It is therefore advised to read this chapter before you start to define your own macros.

3.2 How m0 copies input to output

In m0 there is no concept of words, atoms or tokens to find macros. M0 works with Bitap algorithms and these algorithms work on a character or byte as a basis.

The input bytes are copied to the output until a macro is recognised. When a macro is recognised the macro is replaced by a definition of the macro.

in:  | | | |m|a|c|r|o| |n|a|m|e| | | | | | | | | | | | | |
           +-------------------+
                macro name
                    |
                    |
                    v

out: | | | |d|e|f|i|n|i|t|i|o|n| | | | | | | | | | | | | |
           +-------------------+
                definition

The Bitap algorithm works sequentially on a stream of bytes. Therefore a macro is recognised when the last byte of the macro name is processed. The macro name is then removed from the output. After this, the definition is placed in the output. The copying of input to output bytes continues after the definition.

The definition can have any length (also 0) and can contain any character or byte.

3.3 Macro names

Although the user does not have to handle these different types differently when defining a macro, they have different possibilities that should be known. Further the use of a vlm reduces the performance because the vlm uses another additional algorithm next to the normal macros.

3.3.1 Normal macros

In most macro processors whitespace is used to recognise macros. The characters that can be used in macro names are limited to a set of allowable characters. For example in m4 the allowable set of characters to recognise macros are alphanumeric characters and underscore.

The macro names in m0 can contain any byte value (0 – 255). There is no whitespace necessary in front or after the macro name. Because of this a macro can be recognised when included in a larger name.

This possibility can be used in e.g. emulation. The macro names can be started and ended with a whitespace. A whitespace often includes a space, newline, tab and other bytes in the range of 0 – 0x20. With this whitespace at the start and end of a macro it is no longer recognised inside larger names.

The current implementation using the Bitap algorithm results in a limit for the length of macro names. The macro name can have a maximum of 64 positions. This should be sufficient for most uses; to quote supposedly Bill Gates: 640 Kbytes ought be be enough for anyone!

3.3.2 Variable length macros

A macro name can also be defined having a variable length. This means that the macro name recognised in the processed text can have parts that are variable in length. A variable length macro is thus a macro that has in addition to normal macros parts that have a variable length.

Name
ame
NNName
nnnnnname
NnNname

The variable length macro can be used for recognising patterns or regular expressions. In the current implementation the variable part is limited to a single character or set of characters, although it is possible to have multiple variable parts. The possibilities are equal to the pattern parts explained later, except that a one time trigger is not supported.

The vlm has extra options compared to a normal macro. The length match can be set. It is e.g. possible to set the macro to use the shortest match or the longest match in a text.

3.4 Macros with arguments

A macro whereby only text is replaced is often too limited. Therefore most macro processors allow the use of arguments to increase the usefulness of macros.

Arguments are added to the macro with a defined syntax to be able to distinguish the arguments from other text. A macro in e.g. m4 looks like a function call as in programming languages. In m0 the syntax for arguments is not predefined. The user can define the syntax by using patterns (see Patterns).

The arguments in m0 are first placed on a stack. The first position on the stack (stack position 0) holds the name of the macro. The further positions (stack position 1 and up) hold the arguments, whereby the first argument is placed on the second position on the stack.

No limit exists for the number of arguments, except memory limitations of the computer.

In m0 the stack can be manipulated by small programs (see Programs for patterns and macros) used in patterns and in a macro. This enables the flexibility of m0, whereby the user is able to define the syntax or even define a functionality of a macro that is not built in.

In the next section the use of arguments in the definition is explained. The following figure shows symbolically the working of a macro with arguments and stack:

in:  | | | |m|a|c|r|o| |n|a|m|e| |a|r|g|u|m|e|n|t|s| | | |
           +-------------------+---------------------+
                macro name            arguments
           
                                           |          argument n
                                           |          ----------
                                           |          ...
                                           +----->    ----------
                                                      argument 2
                     +---------------------------     ----------
                     |                                argument 1
                     v                                ----------

out: | | | |d|e|f|i|n|i|t|i|o|n| | | | | | | | | | | | | |
           +-------------------+
                definition

3.5 Symbols for argument substitution

It is necessary to be able to substitute the arguments that are placed on the stack in the definition at specific positions in the definition. A default for doing this is defined in a similar way as m4. The position where an argument should be substituted is indicated by the symbol $. After this the number of the stack position is placed. For example $1 is the first argument on the stack.

These symbols are defined by default at startup. They can be changed by the user by using the builtin function chararg. See for the use of this function Functions for setting symbols.

3.6 Recursion

Recursion is running the output of a macro through the macro processor so that possibly more macros are detected from the output of the macro. Thus if the definition of a macro contains another macro, this macro will also be expanded.

In m0 this recursion can be activated per macro. This flexibility is needed depending on the purpose of a macro.

Another type of recursion is the detection of macros while collecting arguments of a macro. With this recursion it is possible to construct arguments using other macros. Also this recursion type can be activated per macro in m0.

These recursions can introduce problems when macros that are not wanted are used in the arguments or definition . In e.g. m4 this is controlled by using quoting. However in m0 there is no quoting built into the macro processor. If quoting is needed in m0 a macro can be defined that functions like quoting. The recursion in this quote macro is set to not active. In the M4 emulation example (see M4 emulation example) this quoting macro example can be seen.

The recursion in m0 goes even further than what is possible in other macro processors, because the macro processor uses a bitap algorithm that works on characters or bytes. It is possible to construct a macro that will be detected and expanded by using two other macros that will together expand to the constructed macro.

The following example that is also used in the description of define can be used to demonstrate this. First two macros are defined called aa and bb. Macro aa will expand to te and macro bb will expand to st. A third macro test is defined that will expand to Hello World!.


Define:aa;te
Define:bb;st
Define:test;Hello World!

aabb

Recursion is one of the difficult and confusing aspects of macro processors. It is therefore important to set the possible recursions properly when defining a macro. Other macro processors use quoting to overcome issues of recursion. This is also possible in m0. Another way to stop the recursion as in the example above is the use of a virtual character in the macro. This is further explained in the description of the define function.

The settings of recursion are made when defining a macro. See the function define in Functions for defining macros.

3.7 The complete working of macros in m0

In the previous sections the main characteristics and some flexibility of macros in m0 have been described. The macros in m0 are however more complex and flexible. This section tries to explain the complete working of macros.

An important part of macros that gives a big flexibility is the use of patterns and small programs in macros. The use of patterns is even more complex than the macros and therefore is described in a separate chapter, see Patterns. The instructions for the small programs can be found in Programs for patterns and macros.

in:  | | | |m|a|c|r|o| |n|a|m|e|(|a|r|g|u|m|e|n|t|s|)| | |
           +-------------------+-+-------------------+
                macro name     : :     arguments
           +-+                 +-+            
         pre-size            post-size     |
                               : :         v
                               : :  
                               | | | | | | | | | | | | |
                               +-+---+-----+-+----------
                           pattern for collecting arguments +
                                     small programs
    
                                           |
                                           |          argument n
                                           +----->    ----------
                                                         ...
       optional program for stack manipulation -->    ----------
                                                      argument 3
                                     +------or----    ----------
                                     |      |         argument 2
macro definition: | |$|1| |$|2| |    |      |         ----------  <--+
                  +-------------+    |      |         argument 1  ---|--+
                   |                 |      |         ----------     |  |
                   |                 |      |                        |  |
                   |                 |      +--->  builtin function -+  |
                   |                 |                   |              |
                   |    +----------- or --+              +---------+    |
                   |    |                 |                        |    |
                   |    |                 |                        |    |
           +-------or -----------------------> or <--argument 1---------+
           |            |                 |    |                   |
           v            v                 v    v                   |
        default substitution      pattern substitution +           |
                                     small programs                |
                |                         |                        |
                |                +--------+                        |
                |                v                                 |
                +--------------> or <------------------------------+
                                 |    
                                 v    
          
out: | | | | |d|e|f|i|n|i|t|i|o|n| | | | | | | | | | | | |
           +-+---------------------------+
           +-+   replacement of macro
       pre-size

3.7.1 pre-size and post-size

At the top of the figure a pre-size and post-size of the macro name can be seen. These are options set during the definition of a macro.

The pre-size is the part of a macro name that reappears in the output. It is however still part of the macro name to detect the macro. Normally this can be used to recognise word boundaries by having a pre-size character containing all bytes except the valid characters possible in a macro name.

The post-size is the part of a macro name that reappears in the collection of arguments. Normally this is the character to start argument collection, so that a macro can be defined which requires arguments. It is also used to function as a word boundary like the pre-size.

If the collecting of arguments is aborted (see abort instruction in Collecting argument strings), the post-size is returned to the input. This is used for macros that have optional arguments to collect.

The possibility to abort the collecting of arguments has as a consequence that macros will not be called during the processing of the post-size. If this was not the case, macros could be called before the abort, whereas they should be called after the abort. The post-size is however still used to detect a (part of a) macro.

3.7.2 Pattern for collecting arguments

After the macro name is detected, the collecting of arguments is the first thing executed. This collecting is however only executed if a pattern for the collecting is defined in the macro.

The pattern is used to recognise the delimiters of the arguments and copy the arguments to the stack. For example in m4 the delimiters are "(", "," and ")". The default pattern available at startup uses the delimiter ":" to define the start of the argument collection, the ";" to separate the arguments and the "\n" (newline) to end the collecting of arguments.

Small forth like programs are used to copy the input to the stack. This use of programs allows for flexibility in the syntax of the argument collection and to enable extra functionality not available in the builtin macros.

As an example of the flexibility an if else macro can be made using the pattern and the programs. See the ifelse macro of m4 in M4 emulation example as an example how this can be programmed. Another example is the eval macro of the m4 emulation that uses a big pattern for all the mathematical and logical operators.

3.7.3 The stack

The stack is the data store for all the programs. The stack is also the data store for the arguments of a macro.

There are actually eight stacks, whereby the first stack is used for the arguments. The additional stacks are used for storing other data during the collecting of arguments.

The first position on the first stack (position 0) holds the name of the macro. This is the name as in the input minus the pre-size and post-size. It can be used to get the actual name of the macro when multiple characters were used in the macro name. Further positions on the first stack are used to hold arguments. These arguments are placed on the stack by the small programs of patterns and can also be manipulated by these programs.

Also the optional program linked to a macro can be used to manipulate the stack. An example use is to set default values for a macro that are not set by the argument collection.

The life of the stack is limited. It is started when the macro is recognised and closed after the arguments in the definition are substituted. It is thus not possible to exchange information between macros through the stack.

3.7.4 Builtin functions

The builtin functions perform functions that are normally not possible using user defined macros. The most important functions being the defining functions for macros and patterns.

The builtin functions get their arguments from the stack to perform their function. The functions and the used arguments are described in Builtin functions.

Some functions will have output and some will also put output on the stack. Normally the output of these functions will appear in the output, but this can be suppressed by defining a definition string. The definition string can use the information on the stack to output the wanted information.

The use of a builtin function and the definition string at the same time is not excluded. The builtin function will be executed before the argument substitution.

If the builtin function is defined in the macro it will unconditionally be executed and if it is not defined in the macro it will obviously not be executed.

3.7.5 Default argument substitution

To have a syntax for argument substitution available a default argument substitution is defined similar to the substitution in m4. The default argument substitution will use the definition text to substitute default codes with the arguments. If no definition string is defined, then there will obviously be no argument substitution.

The default is used when no pattern substitution is defined. The default characters used for argument substitution are:

These default characters can be changed using the builtin function charagr. This function is described in Functions for setting symbols.

3.7.6 Pattern argument substitution

Instead of the default argument substitution a pattern argument substitution will be used when defined in a macro. This can be used for argument substitution with a syntax which you defined yourself. A pattern to handle this should be made before this can be defined in the macro.

If no definition string is defined in the macro, the first argument on the stack is used as replacement text. This can be used to have a macro whereby the first argument is e.g. a formatting text that is used by the pattern to substitute the arguments.

3.8 Sets of macros

Only one macro set can be active at any one time. By switching between macro sets it is possible to switch between groups of macros to be active. It is also possible to call macros in another set by using a special macro that calls these macros.

The macro sets thus e.g. allow to hide macros from unwanted being triggered and expanded or to switch between macros with a complete different syntax.

3.9 At startup

The macro 0_define exists at startup to name builtins and define new macros. It uses the pattern 0 to collect arguments and is defined in macro set 0. The pattern 0 uses the ":" as a start character for argument collection, the ";" as separator between the arguments and a newline (\n) as the stop character of the argument collection.

In the macro set 0 the characters for argument substitution and character sets are the default characters (see Functions for setting symbols).

3.10 Priority

It is possible to define a macro with a name whereby this name is part of the name of another macro.

The question for this situation is: will both macros be expanded or only one? If only one which one?

The triggering of a macro happens when the last character is matched. So if a macro is triggered a character in position in the text before another macro, this macro will expand. The expansion hereof will probably result in the text being replaced and thereby will result in a different name for the second macro which will thus not be triggered.

When two macros trigger at the same position in the text, the macro which has the longest name will be expanded and the other not. If also the length of the macro names is the same, then the first macro defined will be expanded and the other not.

• How m0 copies input to output:
• Macro names:
• Macros with arguments:
• Symbols for argument substitution:
• Recursion:
• The complete working of macros:
• Sets of macros:
• At startup:
• Priority:

3 Macro processing

3.1 Introduction

3.2 How `m0` copies input to output

3.3 Macro names

3.3.1 Normal macros

3.3.2 Variable length macros

3.4 Macros with arguments

3.5 Symbols for argument substitution

3.6 Recursion

3.7 The complete working of macros in `m0`

3.7.1 pre-size and post-size

3.7.2 Pattern for collecting arguments

3.7.3 The stack

3.7.4 Builtin functions

3.7.5 Default argument substitution

3.7.6 Pattern argument substitution

3.8 Sets of macros

3.9 At startup

3.10 Priority

3 Macro processing

3.1 Introduction

3.2 How m0 copies input to output

3.3 Macro names

3.3.1 Normal macros

3.3.2 Variable length macros

3.4 Macros with arguments

3.5 Symbols for argument substitution

3.6 Recursion

3.7 The complete working of macros in m0

3.7.1 pre-size and post-size

3.7.2 Pattern for collecting arguments

3.7.3 The stack

3.7.4 Builtin functions

3.7.5 Default argument substitution

3.7.6 Pattern argument substitution

3.8 Sets of macros

3.9 At startup

3.10 Priority

3.2 How `m0` copies input to output

3.7 The complete working of macros in `m0`