terom@16: terom@16: """ terom@16: Parsing trees of node stored using a python-like syntax. terom@16: terom@16: A file consists of a number of lines, and each line consists of indenting whitespace and data. Each line has a parent terom@16: """ terom@16: terom@17: class TreeParseError (Exception) : terom@17: """ terom@17: Error parsing a tree file terom@17: """ terom@17: terom@17: pass terom@17: terom@16: def _read_lines (path, stop_tokens='') : terom@16: """ terom@16: Reads lines from the given path, ignoring empty lines, and yielding (line_number, indent, line) tuples, where terom@16: line_number is the line number, indent counts the amount of leading whitespace, and line is the actual line terom@16: data with whitespace stripped. terom@16: terom@16: Stop tokens is a list of chars to stop counting indentation on - if such a line begins with such a char, its terom@16: indentation is taken as zero. terom@16: """ terom@16: terom@16: for line_number, line in enumerate(open(path, 'rb')) : terom@16: indent = 0 terom@16: terom@16: # count indent terom@16: for char in line : terom@16: # tabs break things terom@16: assert char != '\t' terom@16: terom@16: # increment up to first non-space char terom@16: if char == ' ' : terom@16: indent += 1 terom@16: terom@16: elif char in stop_tokens : terom@16: # consider line as not having any indentation at all terom@16: indent = 0 terom@16: break terom@16: terom@16: else : terom@16: break terom@16: terom@16: # strip whitespace terom@16: line = line.strip() terom@16: terom@16: # ignore empty lines terom@16: if not line : terom@16: continue terom@16: terom@16: # yield terom@16: yield line_number + 1, indent, line terom@16: terom@16: def parse (path, stop_tokens='') : terom@16: """ terom@16: Reads and parses the file at the given path, returning a list of (line_number, line, children) tuples. terom@16: """ terom@16: terom@16: # stack of (indent, PageInfo) items terom@16: stack = [] terom@16: terom@16: # the root item terom@16: root = None terom@16: terom@16: # the previous item processed, None for first one terom@16: prev = None terom@16: terom@16: # read lines terom@16: for line_number, indent, line in _read_lines(path, stop_tokens) : terom@16: # create item terom@16: item = (line_number, line, []) terom@16: terom@16: # are we the first item? terom@16: if not prev : terom@16: # root node does not have a parent terom@16: parent = None terom@16: terom@16: # set root terom@16: root = item terom@16: terom@16: # initialize stack terom@16: stack.append((0, root)) terom@16: terom@16: else : terom@16: # peek stack terom@16: stack_indent, stack_parent = stack[-1] terom@16: terom@16: # new indent level? terom@16: if indent > stack_indent : terom@16: # set parent to previous item, and push new indent level + parent to stack terom@16: parent = prev terom@16: terom@16: # push new indent level + its parent terom@16: stack.append((indent, parent)) terom@16: terom@16: # same indent level as previous terom@16: elif indent == stack_indent : terom@16: # parent is the one of the current stack level, stack doesn't change terom@16: parent = stack_parent terom@16: terom@16: # unravel stack terom@16: elif indent < stack_indent : terom@16: while True : terom@16: # remove current stack level terom@16: stack.pop(-1) terom@16: terom@16: # peek next level terom@16: stack_indent, stack_parent = stack[-1] terom@16: terom@16: # found the level to return to? terom@16: if stack_indent == indent : terom@16: # restore prev terom@16: parent = stack_parent terom@16: terom@16: break terom@16: terom@16: elif stack_indent < indent : terom@17: raise TreeParseError("Bad unindent on %s:%d, %d < %d" % (path, line_number, stack_indent, indent)) terom@16: terom@16: # add to parent? terom@16: if parent : terom@16: parent[2].append(item) terom@16: terom@16: # update prev terom@16: prev = item terom@16: terom@16: # return the root terom@16: return root terom@16: