Code Formatting Register - Saturday, July 31, 2010
Learn to create a code-to-HTML formatting engine in C# Minimize

 

 

 CodeToHTML1

 

Introduction

This article will walk you through the building of a Class Library that can read in raw code (C#, VB, Java, XML, etc.) text and format it into HTML so that it can be displayed with the same look that you use in your Visual Studio environment. Software developers use syntax coloring in their development environments because it helps make the code understandable with a quick glance. So when you are trying to explain code to another developer it would be better to show that code with syntax coloring. Spending all the time it would take to color all the many parts of a large chunk of code however would not seem worth it to most developers. Even with tools made to make HTML easy like Front Page it would still be a tedious and error prone task. The class library presented in this article is only one way that the task of syntax coloring raw code text can be automated... but it is a way that works... All syntax colored code on this website was generated from the raw code by a tool I built based on this class library.

Background

Code unlike conventional written text is very well structured. This means that we can create rules that can scan through code and find the beginning and ending of any particular syntax tokens within the language. Syntax tokens are any part of a language that has individual meaning. For example ++ in "C" based languages means to "add 1" and is considered what is called an "operator". When you turn on syntax coloring in Visual Studio it allows you to go through a long list of these named syntax entities and format them in various ways. This makes it quick to see "strings" separate from "key words", "numbers", and "operators".

However the extreme flexibility of most code languages makes it very difficult to parse the text compared to more structured text systems like XML. So this requires a technique that looks at each character of text one by one and even "looking ahead" as you make your way through the code to determine exactly what it is you got at any one point within the code. In addition it can get even more hairy when you deal with HTML because you are formatting HTML with HTML and you can have Javascript ( a completely different language) embedded in the middle.

Many code editors offer syntax coloring within their own environment. However even Microsoft Visual Studio does not simply allow you to copy all that nice looking code out and paste it into a web page. All you end up with is the basic black and white text. In addition HTML editors allow you to format any text, but they know nothing about the rules that distinguish between the different parts of all the different coding languages out there. This class library will show you a way that you can format many different languages and expand the list by inheriting from the base classes.

Understanding the Model

The class model being presented consists of a hierarchy of classes that represent parsers for each language, a TokenFormatter class which wraps the the individual language elements with HTML, a Token class that represents each language element found, some enumerations for organizing types, and a Formatter class that manages the whole process and acts as the interface to the developer.

The hierarchy of language parsers are divided into two main groups or categories of languages. Procedural and Markup languages are both represented by the ProceduralLanguage class and the MarkupLanguage class. Both of these classes inherit from the Language abstract class. This architecture allows functionality that is common to procedural or markup languages to be centralized in classes from which more exact language classes can inherit. For example the markup languages are all expected to delimit the individual language elements with characters that mark the beginning and ending of each element in the language i.e. the < or the > which is used in HTML and XML.

The TokenFormatter class takes a token (language element) that has been parsed out of the code by a language parser and wraps that token with HTML. Depending on how the TokenFormatter is written the HTML that gets generated can get really large. You can easily generate 10 HTML characters to format just a single ' ; ' (semi-colon). To help keep the size down CSS classes are used in the HTML tags that wrap each token instead of putting specific style commands in each tag for each token. This means we will include a section of CSS code at the top of the generated HTML to define what formatting each CSS class represents.

 

Using the code

Now lets go over the key areas in the code and see what is happening. Below is the constructor for the Formatter class and then the Format method that is a member of the Formatter class.  Here we can see that in the constructor we determine what kind of language formatting we are going to be doing. With this information we use a switch-case statement to create the correct parser object for the requested language.


           public Formatter(FormatterTypes Language)
          {
            switch(Language)
            {
                case FormatterTypes.CPP:
                   
                    //Initialize the parser object for this language
                    parser = new CPP();
                   
                    //Store the format object in a local variable for use later
                    mFormat = (Language)parser;
                    break;

                case FormatterTypes.CSharp:
                   
                    parser = new CSharp();
                    
                    mFormat = (Language)parser;
                    break;

                case FormatterTypes.HTML:
                   
                    parser = new HTML();
                   
                    mFormat = (Language)parser;
                    break;

                case FormatterTypes.Java:
                  
                    parser = new Java();
                    
                    mFormat = (Language)parser;
                    break;

                case FormatterTypes.JavaScript:
                  
                    parser = new Javascript();
                    
                    mFormat = (Language)parser;
                    break;

                case FormatterTypes.VisualBasicNet:
                   
                    parser = new VB();
                    
                    mFormat = (Language)parser;
                    break;

                case FormatterTypes.VB6:
                   
                    parser = new VB6();
                    
                    mFormat = (Language)parser;
                    break;

                case FormatterTypes.XML:
                   
                    parser = new XML();
                    
                    mFormat = (Language)parser;
                    break;
            }

            
          }

 

Now we are ready to call the Format method. We pass in a reference to the string that holds the text we want to format. We use a reference "ref string code" instead of passing the string by value because we want to use memory as efficiently as possible and passing a reference back to the original string is far faster than making another copy of the text and passing it in. Since we are not going to make any changes to the original code that is passed in this is a fast, efficient, and safe method. The next thing to notice is we are using a StringBuilder class to hold the output string that will be passed back to the caller. Using a string builder is far faster since it does not have to continuously allocate more RAM to hold the ever lengthening output string AND it does not have to make complete copies of the original string plus the appended token each time it adds a token to the end. Since we know that at minimum the final output is going to be at least five times more text than what we pass in we start the string builder off with enough memory to hold five times the length of the original code. Next the CSS class definition script is appended to the beginning of the output string.

The CSS definition script that is appended to the beginning of the output string might look something like this...

<style type=text/css>.C{color:green;}.K{color:blue;}.N{color:red;}.P{color:#cc6633;}.SL{background-color:yellow;}</style>

We then get to the processing loop. This while loop will continue scanning for more tokens forward through the string until it gets to the end. Each time a token is found the current pointer into the string is adjusted by the length of the token so that we are always scanning across new territory in the string. Once a token is found it is formated with a HTML tag such as <span class="KS">.</span> which formats the ' . ' (period) as a "KS" or KeySymbol. Whatever formatting is assigned to the KeySymbol CSS class is what will be applied to the period character in the output string. This formated token is then appended to the end of the output string. Finally after the loop is complete the final output is returned to the caller.

 

         //Pass in the raw code text to be formatted
          public string Format(ref string code)
          {
            
                                   
            int CurrentLocation = 0;

            //Create a StringBuilder to hold the generated HTML...
            //start it off at 5 times the length of the code itself
            //HTML can be pretty bloated...
            StringBuilder sb = new StringBuilder(code.Length*5);
                        
            Token foundToken = null;

            //Add the CSS style code to the output string first
            sb.Append(parser.CSSFormatting.CSS());
            
            //Loop from the beginning of the code to the end
            while(CurrentLocation < code.Length)
            {                
                //Find a token
                foundToken = parser.ScanToNextToken(ref code, CurrentLocation);
                
                //We found the end or some problem
                if(foundToken == null)
                {
                    break;
                }
            
                //Set the next starting point to scan from the end of this token
                CurrentLocation += foundToken.Text.Length;

                //Format the found token then append it to the output string
                sb.Append(parser.FormatToken(foundToken, TransformTypes.HTML));
            }


            //Return the final output
            return sb.ToString();
            
          }

 

Now lets take a look at the ScanToNextToken method:

       public override Token ScanToNextToken(ref string Text, int Location)
        {
            //Initialize the NewToken object
            if(NewToken == null)
            {
                NewToken = new Token("", TokenTypes.NULL);
            }

            //Set current location
            CurrentLocation = Location;


            //Scan for each type of token...

            //If Whitespace was found
            if((FoundToken = FindWhiteSpace(ref Text)) != "")
            {
                //Set NewToken properties to match found token and then return it
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.WhiteSpace;
                return NewToken;
            }



            if((FoundToken = FindWhiteSpaceCRLF(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.WhiteSpaceCRLF;
                return NewToken;
            }



            if((FoundToken = FindWhiteSpaceLF(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.WhiteSpaceLF;
                return NewToken;
            }



            if((FoundToken = FindWhiteSpaceCR(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.WhiteSpaceCR;
                return NewToken;
            }



            if((FoundToken = FindWhiteSpaceTab(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.WhiteSpaceTab;
                return NewToken;
            }



            if((FoundToken = FindPreprocessor(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.Preprocessor;
                return NewToken;
            }



            if((FoundToken = FindComment(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.Comment;
                return NewToken;
            }



            if((FoundToken = FindStringLiteral(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.StringLiteral;
                return NewToken;
            }



            if((FoundToken = FindKeyword(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.Keyword;
                return NewToken;
            }



            if((FoundToken = FindNumber(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.Number;
                return NewToken;
            }



            if((FoundToken = FindKeySymbol(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.KeySymbol;
                return NewToken;
            }



            if((FoundToken = FindUnknownSegment(ref Text)) != "")
            {
                NewToken.Text = FoundToken;
                NewToken.TokenType = TokenTypes.UnknownSegment;
                return NewToken;
            }



            return null;

        }

The ScanToNextToken method simply runs through a list of IFs each calling a function to look for a specific type of token. When a token is found then a Token object is setup and returned to the caller. If no token is found then a FindUnkownSegment function is called which basically is a catch all that grabs whatever text it finds until it runs into certain "break-point" characters like a space or a language specific key symbol like a period or a qoute.

Now lets see some of these functions that look for tokens:

        protected virtual string FindWhiteSpace(ref string text)
        {
            //Check for a lone space
            if(text.Substring(CurrentLocation, 1) == " ")
            {                
                //Return the token text
                return text.Substring(CurrentLocation, 1);
            }

            return "";
        }

        protected virtual string FindWhiteSpaceTab(ref string text)
        {
            //Look for a lone tab
            if((testString = text.Substring(CurrentLocation, 1)) == "\t")
            {             
                return testString;
            }

            return "";
        }

        protected virtual string FindUnknownSegment(ref string text)
        {
            Match m = null;
            //Find the end of a segment of text using a regex
            m = EndOfWordRegex.Match(text, CurrentLocation);

            if(m.Success)
            {
                return text.Substring(CurrentLocation, m.Index - CurrentLocation);
            }

            return "";
        }

        protected virtual string FindKeyword(ref string text)
        {
            //If a keyword is found
            int EndOfMatch = ScanUntil(ref text, CurrentLocation, LongestKeywordLength, KeywordHash);

            //No match found
            if(EndOfMatch == -1)
            {
                return "";
            }

            //Do tests
            bool Fail = false;

            //Was the match an actual Keyword or was it in the middle of another word
            //If the next character after the found keyword is an EndOfWord match then an
            //actual keyword was found
            if(EndOfWordRegex.Match(text.Substring(CurrentLocation + EndOfMatch, 1)).Success == false)
            {
                Fail = true;
            }

            if(Fail == true)
            {
                return "";
            }


            //Return the token text
            return text.Substring(CurrentLocation, EndOfMatch);
        }

        protected virtual string FindPreprocessor(ref string text)
        {
            //If a keyword is found
            int EndOfMatch = ScanUntil(ref text, CurrentLocation, LongestPreprocessorLength, PreprocessorHash);

            //No match found
            if(EndOfMatch == -1)
            {
                return "";
            }

            //Return the token text
            return text.Substring(CurrentLocation, EndOfMatch);
        }

These functions look ahead in the string from the current position looking for matches against rules they are using to determine if they have found a certain kind of token. If a rule is violated or no match is found then an empty string is returned. If a match is found then the part of the string that matched is returned.

The function FindKeyword uses another function called ScanUntil and a regular expression object called EndOfWordRegex. The ScanUntil function requires that you pass in a reference to the original code string, the location to begin searching and how far into the string to search, and a HashTable containing the hashes of all possible keywords for the language. It then loops such that each time through the loop the search length from the starting point decreases by one character. This continues until the search length equals zero or a match is found. We start with the length of the longest keyword and work our way to shorter and shorter keywords so that we dont waste a lot of time search for one or two or three character keywords when there are not that many of them. Each time through the loop the hash of the current search length text is calculated and then compared to all the keyword hashes in the HashTable parameter. If a hash match is found then we now have a keyword.

Lets take a quick look at the ScanUntil function:

        protected int ScanUntil(ref string text, int Start, int MaxSearchLength, Hashtable MatchList)
        {           
            //Start with the longest search length and search shorter
            //and shorter strings until a match is found
            //or we get to zero length
            for(int x = MaxSearchLength; x > 0; x--)
            {
                //If we are not searching beyond the end of the string AND we have a match on our hash
                if(Start + x <= text.Length && MatchList[text.Substring(Start, x).GetHashCode()] != null)
                {
                    //Return the length of the keyword that was found
                    return x;
                }
            }           

            return -1;
        }

 As mentioned earlier the FindKeyword function also uses a regular expression object called EndOfWordRegex. The regex string used for C# in the EndOfWordRegex object looks like this:

[+,|,=,\\-,(,),!,/,{,},[,\\],:,;,>,<,\\,,.,\r\n,\n,\\s]

This regex looks for any characters that are key symbols in the C# language and any characters that indicate an end to a word such as a space or a carriage return line feed.

I have found that working with complex regular expressions is greatly helped by a tool called RegexBuddy. This tool really breaks down a regular expression and makes it pretty easy to interactively get an expression string by changing the expression around and see immediately what affect it had on a sample string of text. It is like an interactive debugger for regular expressions.

The individual language classes like CSharp which inherits from the class ProceduralLanguage add method overrides to customize how certain searches for tokens occur. These language classes also setup all the initialization required for that languages such as the list of key words and key symbols. For example here is the list of key words for C#:

  Keywords = "abstract|as|base|bool|break|byte|case|catch|char|"
                + "checked|class|const|continue|decimal|default|delegate|do|double|else|"
                + "enum|event|explicit|extern|false|finally|fixed|float|for|foreach|get|goto|"
                + "if|implicit|in|int|interface|internal|is|lock|long|namespace|new|null|"
                + "object|operator|out|override|params|private|protected|public|readonly|"
                + "ref|return|sbyte|sealed|set|short|sizeof|stackalloc|static|string|struct|"
                + "switch|this|throw|true|try|typeof|uint|ulong|unchecked|unsafe|ushort|using|"
                + "virtual|void|while";

and here is an overriden method:

          protected override string FindStringLiteral(ref string text)
          {
            //Are we in qouted or @-qouted mode?
            //qouted:
            //escapes are processed ...any characters before a " except \ or " continues string
            //@-qouted:
            //escapes are NOT processed .. the first " encountered is the end of the string UNLESS
            //another " imediately follows.
            //ms-help://MS.VSCC.2003/MS.MSDNQTR.2003FEB.1033/csref/html/vclrfstring.htm

            if(text.Substring(CurrentLocation, 1) == "\"")
            {
                //Use rules to determine if the " is a valid end qoute
                //If not move to next "
                int index = 1;
            ScanAgain:
                index = text.IndexOf("\"", CurrentLocation + index);

                if(index != -1)
                {
                    //Do tests
                    bool Fail = false;

                    //If there is a \ before the " AND not a \\ then it is not a valid end point
                    if(text.Substring(index - 1, 1) == "\\" && text.Substring(index - 2, 2) != "\\\\")
                    {
                        Fail = true;
                    }

                    //If there is a " after the " then it is not a valid end point
                    if(text.Substring(index + 1, 1) == "\"")
                    {
                        Fail = true;
                    }

                    if(Fail == true)
                    {
                        index = index - CurrentLocation +1;
                        goto ScanAgain;
                    }
                }
                else
                {
                    //Searched entire string and didnt find end
                    return "";
                }

                return text.Substring(CurrentLocation, index - CurrentLocation +1);
            }

           
            //Escaped string literal @
            if(text.Substring(CurrentLocation, 1) == "@" && EscapedStringLiteralRegex.IsMatch(text, CurrentLocation) == true)
            {
                //ScanUntil a " is found. Use rules to determine if the " is a valid end qoute
                //If not move to next "
                int index = 2;

                ScanAgain2:
                index = text.IndexOf("\"", CurrentLocation + index);

                

                if(index != -1)
                {
                    //Do tests
                    bool Fail = false;
                   
                    //If there is a " after the " then it is not a valid end point
                    if(index + 1 < text.Length)
                    {
                        if(text.Substring(index + 1, 1) == "\"")
                        {
                            Fail = true;
                        } 
                    }
                    else
                    {
                        return "";
                    }

                    if(Fail == true)
                    {
                        index = index - CurrentLocation + 1;
                        goto ScanAgain2;
                    }
                }
                else
                {
                    //Searched entire string and didnt find end
                    return "";
                }

                return text.Substring(CurrentLocation, index - CurrentLocation + 1);
            }
                                       
               return "";
          }

The following class diagram shows the CSS class model. The CSSClassFormatting class is a container for lists of CSSClass classes and TokenFormatters.  The CSSClass classes hold the name of the CSS class and the formatting that will be applied to this class i.e. .C{color:green;} where "C" is the class name and "color:green" is the style. The CSSClass collection is eventually used to create the string that defines the CSS class definitions that gets added to the beginning of the output HTML string... for example ...

<style type=text/css>.C{color:green;}.K{color:blue;}.N{color:red;}.P{color:#cc6633;}.SL{background-color:yellow;}</style>

The TokenFormatters contain properties that determine how to format a token of a certain type. A TokenFormatter can either replace a found token all together or it can wrap the token with text such as HTML tags. The collection of TokenFormatters in the CSSClassFormatting class comprises all the different tokens to look for when looking to see if any global formatting overrides need to occur. Each language class has its own default TokenFormatters, but additional TokenFormatters can be created and added to the CSSClassFormatting class to provide overrides to format tokens differently from the default for that language.

 

The following function is the ProceduralLanguage classes version of the FormatToken code called by the Formatter after a scan has found a token in the source text. This function replaces the token text with a formatted HTML version of the same token text. The CSS class model allows for there to be two types of overrides on how tokens get formatted. There is a global override which can be applied to all tokens (CSSClassFormatting). There are also formatter overrides for varies types of tokens such as comments or number literals. Each token that is found during the token scan gets sent through the FormatToken function before being appended to the output HTML string.

        public override string FormatToken(string Text, TokenTypes tokenType, TransformTypes transformType)
        {
            //Replace special characters with HTML formatting
            Text = AmpersandHTMLRegex.Replace(Text, "&amp;");
            Text = OpenHTMLTagRegex.Replace(Text, "&lt;");
            Text = CloseHTMLTagRegex.Replace(Text, "&gt;");


            switch(tokenType)
            {
                case TokenTypes.Comment:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        //Loop through each TokenFormatter to see if this current token
                        //has a matching override which will format the token differently
                        //from the default formatting for this token
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            //If there is a match then format the token text and return it
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType (Comment) override
                    if(FormatOverrides_Comment != null)
                    {
                        //Loop through all the formatters for this type of token to see
                        //if this current token has any overrides to format this token
                        //differently than the default formatting for this token type
                        foreach(TokenFormatter formatter in FormatOverrides_Comment)
                        {
                            //If there is a match then format the token text and return it
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.Comment].Format(Text);

                    break;

                case TokenTypes.Keyword:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_Keyword != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_Keyword)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.Keyword].Format(Text);
                    break;

                case TokenTypes.Preprocessor:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_Preprocessor != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_Preprocessor)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.Preprocessor].Format(Text);
                    break;

                case TokenTypes.StringLiteral:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_StringLiteral != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_StringLiteral)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.StringLiteral].Format(Text);
                    break;

                case TokenTypes.UnknownSegment:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_UnknownSegment != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_UnknownSegment)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.UnknownSegment].Format(Text);
                    break;

                case TokenTypes.KeySymbol:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_KeySymbol != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_KeySymbol)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.KeySymbol].Format(Text);
                    break;

                case TokenTypes.Number:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_Number != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_Number)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.Number].Format(Text);
                    break;

                case TokenTypes.WhiteSpace:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_WhiteSpace != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_WhiteSpace)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.WhiteSpace].Format(Text);
                    break;

                case TokenTypes.WhiteSpaceCRLF:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_WhiteSpaceCRLF != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_WhiteSpaceCRLF)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.WhiteSpaceCRLF].Format(Text);
                    break;

                case TokenTypes.WhiteSpaceLF:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_WhiteSpaceLF != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_WhiteSpaceLF)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.WhiteSpaceLF].Format(Text);
                    break;

                case TokenTypes.WhiteSpaceTab:

                    //CSSFormatting TokenFormatters is a global override
                    if(CSSFormatting.TokenFormatters.Count > 0)
                    {
                        foreach(TokenFormatter tf in CSSFormatting.TokenFormatters.Values)
                        {
                            if(tf.Override == Text)
                            {
                                return tf.Format(Text);
                            }
                        }
                    }

                    //TokenType override
                    if(FormatOverrides_WhiteSpaceTab != null)
                    {
                        foreach(TokenFormatter formatter in FormatOverrides_WhiteSpaceTab)
                        {
                            if(Text == formatter.Override)
                            {
                                return formatter.Format(Text);
                            }
                        }
                    }

                    //Default formatting
                    return TokenFormatters[(int)TokenTypes.WhiteSpaceTab].Format(Text);
                    break;
            }


            return "";
        }

 

The Format method of the TokenFormatter class is very simple. It either does an out right replace of the token text or it returns the token text wraped with a before and after tags:

          public string Format(string Text)
        {
            switch(Type)
            {
                case TokenFormatterType.Replace:
                    return Replace;
                    break;

                case TokenFormatterType.Wrap:
                    return WrapBefore + Text + WrapAfter;
                    break;
            }

            return "";
        }

Points of Interest

  • Formatting of code for publication as HTML
  • A highly organized class hierarchy provides for code reuse and expandability by adding additional languages for parsing out language tokens
  • A complex formatting mechanism that allows multiple levels of overrides to allow users to determine how the formatting should be performed if the default formatting is not satisfactory.
  • Uses various techniques to speed the process of formatting large amounts of code and conserve memory while doing so.

**************************************
CodeTalk <<< Check this out if you write code!
Code Encrypter instead of Obfuscator for C# or VB.NET

Ad Minimize

Encrypt your C# and VB.Net DLL files to protect your code from decompilers, competitors, hackers, and viruses.

Assembly Lockbox... The anti-Decompiler.

ALBBoxMediumTransparent.gif

Copyright (c) 2010 GeraldGibson.Net - DotNetNuke