SQL Lex Yacc Grammar

SQL lex yacc grammar

hi there is a solution in google projects yaxx:

yac file:
lex file

enjoy

previous token on yacc/lex

Don't read yytext in the parser -- as you've discovered it can be unpredictable which token it contains, as the parser may read ahead a token to decide when to shift or reduce.

Instead, you should read yytext in the lexer (and only in the lexer) and make a copy of it if you'll need the value in the parser. So you end up with a lexer rule like:

[a-zA-Z][a-zA-Z0-9]*    { yylval.str = strdup(yytext); return NAME }

and in your parser:

%union {
    char *str;
      :
}
%token <str> NAME
     :
%%
     :
table:  NAME { setGlobalTableName($1); }
     |  NAME '.' NAME  ...

How to implement the grammar for this exercise in lex & yacc

That grammar allows four syllables (ignoring the non-syntactic distinction between b and d):

ba
bab
ab
aba

A word is a sequence containing an odd number of syllables. Since syllable boundaries are not marked in any way, the division into syllables is ambiguous, which is why bison reports conflicts. For example, babab could be parsed as

ba-bab  (Explosión / Explosión Alto)
bab-ab  (Explosión Alto / "a" Alto)

In a few cases, the fact that words must consist of an odd number of syllables can disambiguate a parse. For example, bababa can only be decomposed as ba-ba-ba (Explosión / Explosión / Explosión), because bab-aba (Explosión Alto / "a" Explosión) has only two syllables, an even number. In particular, this means that the greedy solution -- always using the longest possible syllable -- will sometimes produce an incorrect parse. Moreover, you cannot guarantee the correct parse without arbitrary lookahead, making an LR parser inappropriate. Consider the following two words:

abababbabbabbabbabbabbab
abababbabbabbabbabbabbabbab

A double consonant is an unambiguous syllable break, so the only ambiguous break is in the first six characters, which -- as above -- could be two or three syllables. Looking at the entire word, it will be clear which of these choices leads to a word with an odd number of syllables. But until the entire word has been seen, the parser cannot know. In particular, after it sees the initial ab, it must decide whether to resolve that as a syllable of the form a Alto or shift the following a in order to resolve it as a Explosión. But any fixed lookahead limit won't be sufficient to make this decision in all cases.

I suppose that the exercise is intended to help you understand this ambiguity (unless there is more to the exercise than what you showed in the question). I'm not sure what else you're being asked to do. If you want to use bison in order to actually produce parse trees, you'll have to deal with the ambiguity in some way, which probably means using a GLR parser.

SQL Parser Syntax error

After some different ways of trying, the sugested solution (now deleted by the author) worked in some way.

What he suggested was updating the lex and yacc definition of comparision.

In the lex file change

<SQL>"="    | 
<SQL>"<>"   |
<SQL>"<"    |
<SQL>">"    |
<SQL>"<="   |
<SQL>">="       TOK(COMPARISON)

<SQL>"="    TOK(EQ)
<SQL>"<>"   TOK(NE)
<SQL>"<"    TOK(LT)
<SQL>">"    TOK(GT)
<SQL>"<="   TOK(LE)
<SQL>">="   TOK(GE)

In the yacc file add:

comparison: 
        EQ
    | NE
    | LT
    | GT
    | LE
    | GE
    ;

And change all references to = with EQ and the other symbols and COMPARISON with comparison:

%left COMPARISON /* = <> < > <= >= */

%left EQ NE LT GT LE GE /* = <> < > <= >= */

assignment:
        column = scalar_exp
    |   column = NULLX 
    ;

assignment:
        column EQ scalar_exp
    |   column EQ NULLX 
    ;

And

comparison_predicate:
        scalar_exp COMPARISON scalar_exp
    |   scalar_exp COMPARISON subquery
    ;

comparison_predicate:
        scalar_exp comparison scalar_exp
    |   scalar_exp comparison subquery
    ;

And it works!

Lexical & Grammar Analysis using Yacc and Lex

The problem is that yytext is a reference into lex's token scanning buffer, so it is only valid until the next time the parser calls yylex. You need to make a copy of the string in yytext if you want to return it. Something like:

{id}                   { yylval.str = strdup(yytext); return tID; }

will do the trick, though it also exposes you to the possibility of memory leaks.

Also, in general when writing lex/yacc parsers involving single character tokens, it is much clearer to use them directly as charcter constants (eg ',', ';', and '=') rather than defining named tokens (tVIR, tPV, and tEQ in your code).

LEX and YACC : grammar implementation in NLP

Your lexical analyzer is not retaining the spelling of the words — it is simply returning a number, without making sure that the word is available other than as yytext. Your grammar is not copying yytext as the tokens arrive. So, if you need to keep the strings (so you can distinguish between "reply" and "display", for example — two alternative spellings for token ASK), then you have to ensure that the information is saved, copied, released. By the time a grammar rule is operating, it may well have read more tokens — possibly even have encountered EOF. Consequently, you normally need a more complex structure for YYSTYPE (and %union) so that you can get hold of the information you need later.

Syntax error by Yacc even though the syntax is per grammar

You have a lexer fallback rule which silently discards unrecognised characters:

.               ;

That's really not a good idea, as you have just discovered. In this case, no other rule recognises ( or ), so they are ignored by the above rule. Your parser, however, is expecting a parenthesis. It doesn't get one, so it reports a syntax error.

Better is the following fallback rule, which could replace the preceding rule:

   /* [-+=;]          {return yytext[0];} */ /* now redundant*/
.               {return yytext[0];}

This accepts any character in the lexer. However, most characters are not used as character literals anywhere in the grammar, so they will be treated as invalid tokens by the parser, causing a syntax error.

You could get a more precise error message by writing the error in your lex fallback rule, but yhen you need to make sure that all vslid characters are recognised:

[-+=;(){}]      {return yytext[0];}
.               {return yytext[0];}

Personally, I'd add <> to that list rather than having dedicated rules (and unnecessary token names.)

SQL Lex Yacc Grammar