Next: , Up: Top   [Contents]

1 Introduction

1.1 For the reader

This document provides insight in the algorithms used to detect macros and patterns in the m0 macro processor.

Useful for the curious and persons who want to use these for similar purposes.

Bitap algorithms are already known in different implementations. It is also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm (see e.g. wikipedia Bitap algorithm).

In Algorithms in m0 the bitap algorithms used in m0 in general and for detecting macros are described. In Bitap for patterns in m0 the implementation of the bitap algorithm for the patterns is described.

1.2 Main characteristics of the bitap algorithm

In many search algorithms the characters in a text are compared with the characters to be found using a compare instruction. A compare instruction is not used in the bitap algorithm, instead:

The bitap algorithm uses a look-up table instead of a compare.

The advantage of a look-up table is that multiple characters can be checked in a single position with a single look-up. The table has an index the size of the possible values of a character. When using all possible values in a byte the index size is thus 256.

The result of a look-up is either true or false, so that a single bit value is sufficient for the result. To check a word it is only necessary to use a bit AND to combine all the result of the positions of the word and if the result is true, then the word is found.

The following figure illustrates this:

search word: wor

text:  | | | |H|e|l|l|o| |w|o|r|l|d | | | |
search        w o r
  bits        0 0 0 = 0
search          w o r
  bits          0 0 0 = 0
search            w o r
  bits            0 0 0 = 0
search              w o r
  bits              0 1 0 = 0
search                w o r
  bits                0 0 0 = 0
search                  w o r
  bits                  0 0 0 = 0
search                    w o r
  bits                    1 1 1 = 1   found: wor

To check if a word has been found:

A bit AND is used on the look-up results to test if the word is found.

This AND is the "and" in the name of the shift-and. The shift is used to put the bits over each other. How this works is explained in the following parts.


Next: , Up: Top   [Contents]