Next: , Previous: , Up: Top   [Contents]

2 Algorithms in m0

Two implementations of the bitap algorithm are used in m0. The first is called vertical bitap in this document. The second is called horizontal bitap in this document. This is also the version that is normally discussed in the articles explaining the algorithm.

The following sections introduce these bitap algorithms and their use for macro recognition in m0.

2.1 Vertical bitap

The look-up result only requires a single bit. The word size in computers is normally far larger; nowadays 64 bits is common. It is possible to use all the bits in the cpu word so that multiple words can be searched in one table look-up. With 64 bits in a cpu word, it is thus possible to search up to 64 different search words. Every bit position in the cpu word is used for a search word. This is displayed in the following figure:

cpu word:               1 2 3 4 5 6 ...
bit position  
      1      search 1:  H e l l o
      2      search 2:  w o r l d
      3      search 3:  h a i
      4      search 4:  s o m e t h i n g
      5         .
      .         .
      .         .
      64     search 64: l a s t   o n e

The cpu word is displayed in a vertical position in the figure. Therefore this is called the vertical bitap in this document.

What can be seen from using the algorithm in this way is:

In a macro processor a large number of macros could exist. Using this algorithm it would be possible to quickly find all the macros in the text. This algorithm is therefore used in m0, but not without a first pass using the horizontal bitap explained in the next part.

2.2 Horizontal bitap

Multiple times the vertical bitap will need to use the same character in the text for look-up. This is similar to a naive string search, and its efficiency can be improved.

A solution to this is using the bit positions in a cpu word for the search word, instead of using the bit positions for different words.

The following figure illustrates this:

search word: wor

text:       | | | |H|e|l|l|o| |w|o|r|l|d | | | |
                           |
                           v
search                 r o w
  bits                 0 1 0
                             |
                             v
search                   r o w
  bits                   0 0 0
                               |
                               v
search                     r o w
  bits                     0 0 1
                                 |
                                 v
search                       r o w
  bits                       0 1 0
                                   |
                                   v
search                         r o w
  bits                         1 0 0
  

In the figure can be seen that the separate bits of the search word will become true when the character in the text matches. To test if the whole search word has been found it is necessary to check if the bits will sequentially become true. If this is the case up to the last bit, then the search word is found.

This can easily be accomplished by a left shift of the previous result with a bit AND of the new check. What is needed is:

The following figure illustrates this:

search word: wor

text:       | | | |H|e|l|l|o| |w|o|r|l|d | | | |
                           |
                           v
search                 r o w
check bits             0 1 0
mask                   0 0 1
result << 1            0 0 0
result OR=   mask      0 0 1
result AND= check      0 0 0
                             |
                             v
search                   r o w
check bits               0 0 0
mask                     0 0 1
result << 1              0 0 0
result OR=   mask        0 0 1
result AND= check        0 0 0
                               |
                               v
search                     r o w
check bits                 0 0 1
mask                       0 0 1
result << 1                0 0 0
result OR=   mask          0 0 1
result AND= check          0 0 1
                                 |
                                 v
search                       r o w
check bits                   0 1 0
mask                         0 0 1
result << 1                  0 1 0
result OR=   mask            0 1 1
result AND= check            0 1 0
                                   |
                                   v
search                         r o w
check bits                     1 0 0
mask                           0 0 1
result << 1                    1 0 0
result OR=   mask              1 0 1
result AND= check              1 0 0
                               ^  
                               search word found

The cpu word is displayed in a horizontal position in this figure. Therefore this is called the horizontal bitap in this document.

What can be seen from using the algorithm in this way is:

This algorithm is used in m0 for a first pass for recognizing macros and directly in patterns. The following parts will show how the two versions are combined in m0.

2.3 Two pass macro recognition

For performant macro recognition in a macro processor it is necessary to be able to recognise a large number of macros and also to not do too much unnecessary work.

The horizontal bitap is performant, but only if the macros fit into one or a limited number of cpu words. This will often not be the case.

The vertical bitap can hold a large number of macros, but performance is throttled by unnecessary checks. Similar improvements as used for a naive search could be used. However for m0 a solution is chosen to combine both bitaps in a two pass method.

For the first pass all search words with the same length are combined in a cpu word. This makes use of the ability to have multiple possible characters in a single position. Multiple combined words of different length are put in the cpu word.

This is illustrated in the following figure with the example search words #, `, aa, bb, hai, two, one, four, hello:

size of search word           5|      4|    3|  2|1|
cpu word             | | | | | | | | | | | | | | | |
examples              o l l e h|r u o f|i a h|a a|#|
                                       |o w t|b b|`|
                                       |e n o|

In this example 15 characters of the cpu word are used to hold all search words with a length up to 5. In a single 64 bit cpu word search words with a length up to 10 can be put (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55).

However it is not clear which macro was recognised if multiple search words with the same length are searched. Besides this, the above method leads to false triggers. For example "ab" or "ba" will in the example above also trigger the search word with length 2.

When this first pass triggers, the length of the search word and the position in the text are known. The second pass using the vertical bitap can than efficiently find the macro or check for a false trigger.

The two pass method can thus be summarized as:

2.4 Two pass macro recognition in m0

Up to a word length of 10 can be put in a single 64 bit cpu word using the two pass method. Using two 64 bit cpu words results in lengths up to 15 (11 + 12 + 13 + 14 + 15 = 65, 55 + 65 = 115) by properly dividing the lengths over the two words.

In m0 macros can have a length up to 64 characters. Using more cpu words to fit all these, results in too many cpu words to be efficient. Most macros will be relatively short and be shorter than 15 characters. Using cpu words for words up to 64 characters length is thus also wasting a lot of space and processing time.

In m0 a third cpu word is used to hold all words with a length of 16 up to 64. These words overlap each other from the lower characters. This solution is illustrated in the following figure:

cpu word chars          |64 ...... 18|17|16|15 ..... 1|
search word length 16                   |-------------|
search word length 17                |----------------|
search word length 18             |-------------------|
        .                                .
        .                                .
search word length 64   |-----------------------------|

This leads to a lot of false triggers, but this should not be a big issue if not many macros with a length of more than 15 are used, or if the text does not have a lot of these larger macros.

The vertical bitap is split up in two sets; one for words with a length of up to 15 and a second for words with a length of 16 up to 64.

Of course also other choices can be made regarding the number of cpu words for the first pass and the number of sets for the the vertical bitap.

2.5 Further possibilities for two pass macro recognition

The choices described above lead to:

This is using a relative small amount of memory for holding the macros. In this document this small set is named small-15.

It is also possible to use a vertical bitap set separately for every length of the words. This is named large in this document. If still using the two cpu words for length up to 15, then this is named large-15.

It is also possible to use 1 vertical bitap for 1 cpu word. With again the 2 cpu words for length up to 15 this is named medium-15.

Extending these possibilities leads to the following solutions for different usages:

up to length     |   small      |   medium     |   large
separate words   |              |              |
-----------------+--------------+--------------+--------------
        10       |              |              | cpu words: 2
                 |              |              | vertical : 11
-----------------+--------------+--------------+--------------
        15       | cpu words: 3 | cpu words: 3 | cpu words: 3
                 | vertical : 2 | vertical : 3 | vertical : 16
-----------------+--------------+--------------+--------------
        22       | cpu words: 5 | cpu words: 5 | cpu words: 5
                 | vertical : 2 | vertical : 5 | vertical : 23
-----------------+--------------+--------------+--------------
        27       |              | cpu words: 7 | cpu words: 7
                 |              | vertical : 7 | vertical : 29
-----------------+--------------+--------------+--------------
        31       |              | cpu words: 9 | cpu words: 9
                 |              | vertical : 9 | vertical : 32
-----------------+--------------+--------------+--------------
        64       |              |              | cpu words: 33
                 |              |              | vertical : 64

Which choice will be the most performant will depend mostly on the workload. For a huge number of macros in a large text with many macros the large choice will probably be most performant. The performance will also depend on the hardware; a large set requires a large amount of memory that might induce more cpu cache misses and reduce performance.

Till so far m0 uses only the small-15 choice.


Next: , Previous: , Up: Top   [Contents]