Hello world

There are three progressive examples here, the most noddy possible, something a little more interesting, and finally a template that doesn’t just assure a match but also captures some data.

Really noddy

Let’s take this text input; two lines with “hello” on the first line and “world” on the second:

hello
world

We can parse it like this:

>>> import sttp
>>> in_template = '''m> hello
... m> world
... '''
>>> in_text = '''hello
... world
... '''
>>> parser     = sttp.Parser(template = in_template)
>>> out_struct = parser.parse(in_text)
>>> assert out_struct == None

All it does in effect is validate that the input text is as stated, i.e. two lines with “hello” on the first line, “world” on the second and nothing else. If the input deviates then an exception will be raised.

More interesting

This example parses a text file with any number of lines, all beginning with “hello” but where the second word may vary. Below the sample input is:

hello world
hello galaxy
hello universe

The template used is a one liner:

m*> hello {{ word }}

Here’s the full working example:

>>> import sttp
>>> in_template = '''m*> hello {{ word }}'''
>>> in_text = '''hello world
... hello galaxy
... hello universe
... '''
>>> parser     = sttp.Parser(template = in_template)
>>> out_struct = parser.parse(in_text)
>>> assert out_struct == None

The m* instead of m means match zero or more times, so if you give it an empty input, the above will still parse successfully. If it was m+ then this would require at least one line that matched, just like the regex modifiers.

Capturing data

Let’s expand the previous example to capture the words:

>>> import sttp
>>> in_template = '''m+> hello {{ hello = word }}'''
>>> in_text = '''hello world
... hello galaxy
... hello universe
... '''
>>> parser     = sttp.Parser(template = in_template)
>>> out_struct = parser.parse(in_text)
>>> assert out_struct == [
...     {'hello': 'world'},
...     {'hello': 'galaxy'},
...     {'hello': 'universe'}
... ]

When a list result is implied (as with m+ and m*) the list elements will be a dict so that any number of named values can be derived per line. If your template was m+> {{ hello = word }} {{ world = word }} then “goodbye galaxy” would parse as {'hello': 'goodbye', 'world': 'galaxy'}:

>>> import sttp
>>> in_template = '''m+> {{ hello = word }} {{ world = word }}'''
>>> in_text     = '''goodbye galaxy'''
>>> parser     = sttp.Parser(template = in_template)
>>> out_struct = parser.parse(in_text)
>>> assert out_struct == [{'hello': 'goodbye', 'world': 'galaxy'}]