Equation (expression) parser with precedence

algorithmequationparsing

I've developed an equation parser using a simple stack algorithm that will handle binary (+, -, |, &, *, /, etc) operators, unary (!) operators, and parenthesis.

Using this method, however, leaves me with everything having the same precedence – it's evaluated left to right regardless of operator, although precedence can be enforced using parenthesis.

So right now "1+11*5" returns 60, not 56 as one might expect.

While this is suitable for the current project, I want to have a general purpose routine I can use for later projects.

Edited for clarity:

What is a good algorithm for parsing equations with precedence?

I'm interested in something simple to implement and understand that I can code myself to avoid licensing issues with available code.

Grammar:

I don't understand the grammar question – I've written this by hand. It's simple enough that I don't see the need for YACC or Bison. I merely need to calculate strings with equations such as "2+3 * (42/13)".

Language:

I'm doing this in C, but I'm interested in an algorithm, not a language specific solution. C is low level enough that it'll be easy to convert to another language should the need arise.

Code Example

I posted the test code for the simple expression parser I was talking about above. The project requirements altered and so I never needed to optimize the code for performance or space as it wasn't incorporated into the project. It's in the original verbose form, and should be readily understandable. If I do anything further with it in terms of operator precedence, I'll probably choose the macro hack because it matches the rest of the program in simplicity. If I ever use this in a real project, though, I'll be going for a more compact/speedy parser.

Related question

Smart design of a math parser?

-Adam

Best Answer

The shunting yard algorithm is the right tool for this. Wikipedia is really confusing about this, but basically the algorithm works like this:

Say, you want to evaluate 1 + 2 * 3 + 4. Intuitively, you "know" you have to do the 2 * 3 first, but how do you get this result? The key is to realize that when you're scanning the string from left to right, you will evaluate an operator when the operator that follows it has a lower (or equal to) precedence. In the context of the example, here's what you want to do:

  1. Look at: 1 + 2, don't do anything.
  2. Now look at 1 + 2 * 3, still don't do anything.
  3. Now look at 1 + 2 * 3 + 4, now you know that 2 * 3 has to to be evaluated because the next operator has lower precedence.

How do you implement this?

You want to have two stacks, one for numbers, and another for operators. You push numbers onto the stack all the time. You compare each new operator with the one at the top of the stack, if the one on top of the stack has higher priority, you pop it off the operator stack, pop the operands off the number stack, apply the operator and push the result onto the number stack. Now you repeat the comparison with the top of stack operator.

Coming back to the example, it works like this:

N = [ ] Ops = [ ]

  • Read 1. N = [1], Ops = [ ]
  • Read +. N = [1], Ops = [+]
  • Read 2. N = [1 2], Ops = [+]
  • Read *. N = [1 2], Ops = [+ *]
  • Read 3. N = [1 2 3], Ops = [+ *]
  • Read +. N = [1 2 3], Ops = [+ *]
    • Pop 3, 2 and execute 2*3, and push result onto N. N = [1 6], Ops = [+]
    • + is left associative, so you want to pop 1, 6 off as well and execute the +. N = [7], Ops = [].
    • Finally push the [+] onto the operator stack. N = [7], Ops = [+].
  • Read 4. N = [7 4]. Ops = [+].
  • You're run out off input, so you want to empty the stacks now. Upon which you will get the result 11.

There, that's not so difficult, is it? And it makes no invocations to any grammars or parser generators.

Related Topic