Bitap for patterns in m0 (Bitap algorithms used in M0)

The patterns used in m0 are normally not large and can fit easily in one cpu word. Therefore the horizontal bitap is sufficient for use in the patterns. The horizontal bitap also allows possibilities that are not possible with the two pass method for macros.

The following sections explain the use of the bitap algorithm for the patterns in m0.

3.1 Patterns

A pattern is a collection of pattern parts. The pattern parts are the search words that must be recognised to trigger a program in m0. The pattern parts can thus be easily used in the horizontal bitap.

This gives the characteristics of the horizontal bitap, which are the most relevant for the patterns:

These are nice features for patterns, but can be extended for more functionality. The following parts show how extra functionalities are added to the horizontal bitap.

3.2 Zero or one character

The possibility to have a character or not on a position is implemented by a mask and an extra shift. The shift in the bitap will normally bring the result from the previous character. If this previous character should have the possibility to not exist, then the result of the character before the previous character should also be used.

This is shown in the following figure using a search for "a[a]?b", whereby the "[a]?" can be a single or no "a" character:

search word: a[a]?b

text:                       | | |a|b|b| | |
                                 |
                                 v
search                       b ? a
check bit                    0 1 1
mask                         0 0 1
mask_z                       0 1 0
result << 1                  0 0 0
result_z = result AND mask_z 0 0 0
result_z << 1                0 0 0
result OR=   mask            0 0 1
result OR= result_z          0 0 1
result AND= check            0 0 1     found the a
                                   |
                                   v
search                         b ? a
check bits                     1 0 0
mask                           0 0 1
mask_z                         0 1 0
result << 1                    0 1 0
result_z = result AND mask_z   0 1 0   the previous a
result_z << 1                  1 0 0   shifted the previous a
result OR=   mask              0 1 1
result OR= result_z            1 1 1   the previous a sets bit
result AND= check              1 0 0
                               ^
                               found ab

3.3 One or more characters

The possibility to have one or more characters in a single position uses a similar method as the zero or one character, but needs a latched memory. This latched memory should be true so long as the character in that position is still correct.

This is shown in the following figure using a search for "a[c]*b", whereby the "[c]*" can be a single or multiple "c" characters:

text:                       | | |a|c|c|b | |
                                 |
                                 v
search                       b * a
check bit                    0 0 1
mask                         0 0 1
mask_m                       0 1 0
result << 1                  0 0 0
result OR=   mask            0 0 1
result OR= mem << 1          0 0 1     memory still empty 
result AND= check            0 0 1     found a
mem AND= check               0 0 0
mem OR= result AND mask_m    0 0 0  
                                   |
                                   v
search                         b * a
check bit                      0 1 0
mask                           0 0 1
mask_m                         0 1 0
result << 1                    0 1 0
result OR=   mask              0 1 1
result OR= mem << 1            0 1 1
result AND= check              0 1 0
mem AND= check                 0 0 0
mem OR= result AND mask_m      0 1 0   memory for c is true  
                                     |
                                     v
search                           b * a
check bit                        0 1 0
mask                             0 0 1
mask_m                           0 1 0
result << 1                      1 0 0
result OR=   mask                1 0 1
result OR= mem << 1              1 0 1
result AND= check                0 0 0   acb is not found
mem AND= check                   0 1 0
mem OR= result AND mask_m        0 1 0   memory for c is still true  
                                       |
                                       v
search                             b * a
check bit                          1 0 0
mask                               0 0 1
mask_m                             0 1 0
result << 1                        0 0 0
result OR=   mask                  0 0 1
result OR= mem << 1                1 0 1
result AND= check                  1 0 0  found accb
mem AND= check                     0 0 0  memory is reset
mem OR= result AND mask_m          0 0 0

A zero or more characters can be detected by combining the zero or one and the one or more detection.

3.4 Pattern part depending on another pattern part

From the previous explanations can be seen that the shift and the AND are the main characteristic of the bitap to sequentially test the characters. The start of a word and thus of a sequence is done by setting a start bit. This is normally done using a bit mask (init bitmask), and can also be done in a different way as the zero or one and the one or more characters show.

It is also possible to start a pattern part when another pattern part is triggered. This can be implemented using a bit mask. In m0 it is implemented by directly placing the depending pattern part behind the previous part. The default shift will in this case start the depending part, because the start bit is not set by the init bitmask. The two pattern parts are thus working like one word.

3.5 Regular expression possibilities

By using a mask with start bits, which is activated by a triggered pattern part, it would be possible to make many different links between pattern parts. Thus when a certain pattern part is triggered, it could start one or multiple other pattern parts or even itself.

This functionality would allow to make regular expressions possible using the bitap as described till so far.

3.6 Variable length macros

The variable length macros in m0 use the same bitap algorithm as the patterns. This enables the variable length macros and gives these macros the same possibilities as the patterns.

It is expected that not too much variable length macros are normally used and that therefore the number of cpu words that is needed is limited.