codEX Project
Normalizing as the first step to improve
Token Identification and Analysis

In a first phase the compilation of the original source code to the intermediate MetaCode™ is done. This builds the foundation for further analysis. The further dissection of the reduced, normalized and formal language is much easier and brings more performance than doing the same within the original source code.

The next phase of analysis is going to dissect the operation codes and operands of the MetaCode™ construct. A division between the three different data types controls structures, variables and strings is used:

Controls:
¦ Type (if | while | for | until)
¦ Condition (n > 1 | c == "a")
¦ Change (+ | - | *)
¦ Level (n)

Strings:
¦ Content (X)
¦ Type (int | char)
¦ Quote (single | double)
¦ Length (n bytes)

Variables:
¦ Name (X)
¦ Content (Y)
¦ Type (int | char | array)
¦ Length (n bytes)
¦ Source (local | get | post | cookie | server | global)


A simple example shall be discussed. Lets say the MetaCode™ generation has produced the following simple line:

004:009:043 string "Variable set to 1"


The dissection of this intermediate code creates the following local environment for this string:

004:009:043:
¦ Content: "Variable set to 1"
¦ Type: char
¦ Quote: double
¦ Length: 19


Another example introducing our HTTP GET variable is the following MetaCode™ line:

006:014:113 varget test


The dissection identifies this token as variable with the following attributes:

006:014:113:
¦ Name: test
¦ Content: (?)
¦ Type: char
¦ Length: (?)
¦ Source: get


Because there was no further data flow analysis, it is not possible to identify the content and length of the variable. This data shall be calculated within the next pass of analysis.

Identifying the attributes of the tokens are not just nice to have. However, they are very important for further analysis in depth. The definition of code coverage, generation of the control flow diagramm and determination of vulnerabilities rely on this kind of data (e.g. a cross site scripting vulnerability).

Bibliography

Alfred V.A., Monica S.L., Ravi S., Jeffrey D.U. (1985), Compilers - Principles, Techniques and Tools, Addison-Wesley Longman, Amsterdam, ISBN-13: 978-0201101942

Wirth, N. (1996), Grundlagen und Techniken des Compilerbaus, Oldenbourg, ISBN-13: 978-3486243741