FBB::Pattern(3bobcat)
Pattern matcher
(libbobcat-dev_6.13.00)
2005-2026
NAME
FBB::Pattern - Performs RE pattern matching
SYNOPSIS
#include <bobcat/pattern>
Linking option: -lbobcat
DESCRIPTION
Pattern objects may be used for Regular Expression (RE) pattern
matching. The class is a wrapper around the regcomp(3) family of
functions. By default it uses `extended regular expressions', requiring you to
escape multipliers and bounding-characters when they must be interpreted as
ordinary characters (i.e., *, +, ?, ^, $, |, (, ), [, ], {, } must be
escaped when used as literal characters).
The Pattern class supports the following (Perl-like)
special escape sequences:
\b - indicating a word-boundary
\d - indicating a digit ([[:digit:]]) character
\s - indicating a white-space ([:space:]) character
\w - indicating a word ([:alnum:]) character
The corresponding capitals (e.g., \W) define the complementary
character sets. Note that the capitalized character set shorthands are not
expanded inside explicit character-classes (i.e., [ ... ]
constructions). E.g., [\W] represents a set of two characters: \ and
W.
As the backslash (\) is treated as a special character it should be
handled carefully. Pattern converts the escape sequences \d \s \w (and
outside of explicit character classes the sequences \D \S \W) to their
respective character classes. All other escape sequences are kept as-is, and
the resulting regular expression is passed to the pattern matching compilation
function regcomp(3). The regcomp function interprets escape
sequences. Consequently some care should be exercised when defining patterns
containing escape sequences. Here are the rules:
- Special escape sequences (like \d) are converted to character
classes. E.g.,
---------------------------------------------------------
Specify: Converts to: regcomp uses: Matches:
---------------------------------------------------------
\d [[:digit:]] [[:digit:]] 3
---------------------------------------------------------
- Ordinary escape sequences (like \x) are kept as-is. E.g.,
---------------------------------------------------------
Specify: Converts to: regcomp uses: Matches:
---------------------------------------------------------
\x \x x x
---------------------------------------------------------
- To specify literal escape sequences, Raw String Literals are advised,
as they don't require doubling escape sequences. E.g., the following
regular expression matches an (alpha-numeric) word, followed by optional
blanks, a colon, more optional blanks and a (decimal) number:
R"((\w+)\s*:\s*\d+)"
Furthermore, by using (raw) string concatenation in source file the
legibility of complex regular expressions is greatly improved. The above
regular expression could also be specified as
R"((\w+))" // a word
R"(\s*:)" // then ':', maybe first some space chars
R"(\s*\d+)" // then digits, maybe first some space chars
NAMESPACE
FBB
All constructors, members, operators and manipulators, mentioned in this
man-page, are defined in the namespace FBB.
INHERITS FROM
-
TYPEDEF
- Pattern::Position:
A nested type representing the offsets of the first character and
the offset beyond the last character of the matched text or indexed
subexpression, defined as std::pair<size_t, size_t>.
OPTIONS
The following options can be specified with the non-default constructor
and the match member (see below):
- REG_EXTENDED:
Used by default: POSIX Extended Regular Expression syntax when
interpreting regex. If not set, POSIX Basic Regular Expression syntax
is used;
- REG_NOSUB:
Support for substring addressing of matches is not required. The
nmatch and pmatch parameters to regexec are ignored if the pattern
buffer supplied was compiled with this flag set;
- REG_NEWLINE:
Used by default: match-any-character operators don't match newlines.
A non-matching list ([^...]) not containing a newline does not match
a newline.
Match-beginning-of-line operator (^) matches the empty string
immediately after a newline, regardless of whether eflags, the
execution flags of regexec, contains REG_NOTBOL.
Match-end-of-line operator ($) matches the empty string immediately
before a newline, regardless of whether eflags contains REG_NOTEOL.
CONSTRUCTORS
- Pattern():
The default constructor does not define a pattern, but can be used in
e.g., containers requiring default constructors;
- Pattern(std::string const &pattern,
bool caseSensitive = true,
size_t nSub = 10,
int options = REG_EXTENDED | REG_NEWLINE):
This constructor compiles pattern, preparing the Pattern object
for pattern matches. The second parameter determines whether case
sensitive matching will be used (the default) or not. Subexpressions
are defined by pairs of parentheses. Each matching pair defines a
subexpression, where the order-number of their opening parentheses
determines the subexpression's index. By default at most 10
subexpressions are recognized.
Copy and move constructors (and assignment operators) are available.
MEMBER FUNCTIONS
- std::string before() const:
Following a successful match, before() returns the text before the
matched text;
- std::string beyond() const:
Following a successful match, beyond() returns the text beyond the
matched text;
- size_t end() const:
Returns the number of matched elements (the full pattern and
subexpressions). When no match was obtained string::npos is
returned. When specifying index values end() or beyond then
position(idx) returns two std::string::npos;
operator[](idx) returns an empty string;
- void match(std::string const &text, int options = 0):
Match a string with a pattern. If the text could not be matched, an
Exception(3bobcat) exception is thrown, using
Pattern::match() as its prefix-text;
- std::string matched() const:
Following a successful match, this
function returns the (completely) matched text;
- std::string const &pattern() const:
This member function returns the pattern that was specified by the
constructor and by the setPattern member;
- Pattern::Position position(size_t index) const:
When specifying index 0 the begin and end index values of the fully
matched text is returned. Other index values return the begin and end
index values of the specified (parenthesized) sub-expression. When
index is at least end() a
Position{ string::npos, string::npos } value is returned;
- void setPattern(std::string const &pattern,
bool caseSensitive = true,
size_t nSub = 10,
int options = REG_EXTENDED | REG_NEWLINE):
This member function redefines pattern as the regular expression
used my subsequent match-member calls. An FBB::Exception
exception is thrown if the new pattern could not be compiled;
- void swap(Pattern &other):
The content of the current object and the other object are
swapped.
OVERLOADED OPERATORS
- std::string operator[](size_t index) const:
Returns (for index value 0) the fully matched text or (for larger
index values up to the value returned by end()) the text of
sub-expression index. An empty string is returned for index values
at least equal to end();
- Pattern &operator<<(int matchOptions):
Sets match-options to be used (once) with the following overloaded
operator;
- bool operator<<(std::string const &text):
Performs a match(text, matchOptions) call, catching any exception
that might be thrown. If no matchOptions were set using
operator<<(int matchOptions) no match-options are used. The
options set this way are not `sticky': when necessary, they have to be
re-inserted before each new pattern matching. The function returns
true if the matching was successful, false otherwise.
EXAMPLE
#include "driver.h"
#include <bobcat/pattern>
using namespace std;
using namespace FBB;
#include <algorithm>
#include <cstring>
void showSubstr(string const &str)
{
static int count = 0;
cout << "String " << ++count << " is '" << str << "'\n";
}
void match(Pattern const &patt, string const &text)
try
{
Pattern pattern{ patt };
pattern.match(text);
Pattern p3(pattern);
cout << "before: " << p3.before() << "\n"
"matched: " << p3.matched() << "\n"
"beyond: " << pattern.beyond() << "\n"
"end() = " << pattern.end() << '\n';
for (size_t idx = 0; idx != pattern.end(); ++idx)
{
string str = pattern[idx];
if (str.empty())
cout << "part " << idx << " not present\n";
else
{
Pattern::Position pos = pattern.position(idx);
cout << "part " << idx << ": '" << str << "' (" <<
pos.first << "-" << pos.second << ")\n";
}
}
}
catch (exception const &exc)
{
cout << exc.what() << '\n';
}
int main(int argc, char **argv)
{
string patStr = R"(\d+)";
do
{
cout << "Pattern: '" << patStr << "'\n";
try
{
// by default: case sensitive
// use any args. for case insensitive
Pattern patt(patStr, argc == 1);
cout << "Compiled pattern: " << patt.pattern() << '\n';
while (true)
{
cout << "string to match : ";
string text;
getline(cin, text);
if (text.empty())
break;
cout << "String: '" << text << "'\n";
match(patt, text);
}
}
catch (exception const &exc)
{
cout << exc.what() << ": compilation failed\n";
}
cout << "New pattern: ";
}
while (getline(cin, patStr) and not patStr.empty());
}
FILES
bobcat/pattern - defines the class interface
SEE ALSO
bobcat(7), regcomp(3), regex(3), regex(7)
BUGS
None reported.
BOBCAT PROJECT FILES
- https://fbb-git.gitlab.io/bobcat/: gitlab project page;
Debian Bobcat project files:
- libbobcat6: debian package containing the shared library, changelog
and copyright note;
- libbobcat-dev: debian package containing the
static library, headers, manual pages, and developer info;
BOBCAT
Bobcat is an acronym of `Brokken's Own Base Classes And Templates'.
COPYRIGHT
This is free software, distributed under the terms of the
GNU General Public License (GPL).
AUTHOR
Frank B. Brokken (f.b.brokken@rug.nl).