We have talked about PLY several times[1]. In this post I want to explain how to work and implement lexer states in PLY. In PLY documentation [2], you can find a whole explanation about the lexer states. I only pretend to show a very simple example with lexer states and how to use it in the parser code. The example shows how to ignore the content between two tokens and return it as a token.
How to build states
It is easy to create states. PLY wait to find the states variable. In this variable we specify the token and the type of state:
- Inclusive: we extends the previous states for lexer. So we can use other tokens defined in the previous states.
- Exclusive: we create a new state. In this case is necessary to implement an error state and a ignore state because we do not extends from previous tokens.
For instance:
states = (
('foo', 'exclusive'),
('bar', 'inclusive'), )
Then we join on these states to token variable.
Example
I want to show a very simple example that ignores all content and return it as a token. In some occasions we need to do it, for example to create a special ignore content node in the AST. Note the regular expression of the rule t_foo_CONTENT. This expression match all until #end string.
Lexer
#!/usr/bin/env python # -*- coding: utf-8 -*- import ply.lex as lex states = ( ('foo', 'exclusive'), ) tokens = ['foo', 'NUMBERS', 'END', 'CONTENT'] def t_NUMBERS(t): r'\d+' print 'General numbers ' return t def t_foo(t): r'foo' t.lexer.code_start = t.lexer.lexpos print 'I have detected foo token' t.lexer.begin('foo') return t def t_foo_END(t): r'foo' print 'End of state foo' t.lexer.begin('INITIAL') return t def t_foo_CONTENT(t): r'[^$]+\#end' return t t_ignore = ' \t' def t_foo_error(t): print 'Lexical error: "' + str(t.value[0]) + '" in line ' + str(t.lineno) t.lexer.skip(1) def t_error(t): print 'Lexical error: "' + str(t.value[0]) + '" in line ' + str(t.lineno) t.lexer.skip(1) def test(data, lexer): lexer.input(data) while True: tok = lexer.token() if not tok: break print tok lexer = lex.lex() if __name__ == '__main__': data = ''' foo 123 123 esto bla bla bla #endfoo 123''' lexer.input(data) test(data, lexer)
_______________________________________________________Parser
#!/usr/bin/env python # -*- coding: utf-8 -*- import ply.yacc as yacc from lexer import tokens def p_start(p): ''' start : NUMBERS foo prod END ''' p[0] = p[1] def p_prod(p): ''' prod : CONTENT ''' p[0] = p[1] def p_error(p): print 'Syntax error at line ' + str(p.lexer.lineno) parser = yacc.yacc() if __name__ == '__main__': data = ''' 123 foo 321 #endfoo ''' parser.parse(data) data = ''' 123 foo bla bla bla bla #endfoo ''' parser.parse(data)_______________________________________________________
When foo is detected, the DFA entries in a new state (foo state) and will be in this state until #endfoo is detected.
References
[1] Previous entries for PLY
[2] PLY webpage: http://www.dabeaz.com/ply/

0 comentarios:
Publicar un comentario en la entrada