I just finished building a completely custom discussion board system, its mostly done, but the only part that is giving me problems is the parsing of the text that gets inserted into the database and then shown on the thread/post page.

Here is the process that it goes through

1) You enter your text in the post page. It should allow you to enter some basic html, some BBCode, Smilies, etc.
2) When you click submit is that the text gets parsed, so, I need the html tags to be checked for validity, the bbcode to be checked and converted to html tags, images converted to proper tags, etc.
3) When viewing the page, NO PROCESSING is done to the text. (This makes for faster loading times).

My problem is in step 2. I have the code that does all of it, but its hard to customize. I'm having problems with regular expressions mostly. So here is what I need help with, it would be great if you could point me in the direction of a script that does this better or a way to fix it.

This is what I need:

-I need to be able to specify what HTML I want to use.
-Hopefully I want to make the bbcode/html it outputs into XHTML code.
-A way to convert HTML back into BBCode would be completly amazing.

So, here is the code:

PHP Code:
/**
* Converts BBCode to HTML
*/
function parsebbcode($text) {
    
$searcharray = array(
          
"/(\[)(list)(=)(['\"]?)([^\"']*)(\\4])(.*)(\[\/list)(((=)(\\4)([^\"']*)(\\4]))|(\]))/esiU",
          
"/(\[)(list)(])(.*)(\[\/list\])/esiU",
          
"/(\[)(url)(=)(['\"]?)([^\"']*)(\\4])(.*)(\[\/url\])/esiU",
          
"/(\[)(url)(])(.*)(\[\/url\])/esiU",
          
"/(\[)(code)(])(\r\n)*(.*)(\[\/code\])/esiU",
          
"/(\[)(php)(])(\r\n)*(.*)(\[\/php\])/esiU"
        
);
    
$replacearray = array(
          
"createlists('\\7', '\\5')",
          
"createlists('\\4')",
          
"checkurl('\\5', '\\7')",
          
"checkurl('\\4')",
          
"stripbrsfromcode('\\5')",
          
"phphighlite('\\5')"
        
);

    
$doubleRegex "/(\[)(%s)(=)(['\"]?)([^\"']*)(\\4])(.*)(\[\/%s\])/siU";
    
$singleRegex "/(\[)(%s)(])(.*)(\[\/%s\])/siU";
    
    
$bbcodes_q QUERY see bellow.
/* 
Grabs bbcode that we allow from a database, these are two examples of rows returned from a database with this query

Example 1: 
id = 1
tag = b
replacement = <strong>\4</strong>
example = [b]Bold[/b]     
twoparams (flag) = 0

Example 2:
id = 2
tag = link
replacement = <a href="\4">\4</a>
example = [link="www.google.com"]google[/link]    
twoparams (flag) =  true

*/

      
while($r mysql_fetch_array($bbcodes_q)) {
        if (
$r['twoparams']) {
          
$regex sprintf($doubleRegex$r['tag'], $r['tag']);
        } else {
          
$regex sprintf($singleRegex$r['tag'], $r['tag']);
        }
        
$searcharray[] = $regex;
        
$replacearray[] = $r['replacement'];
        
// and get nested ones:
        
$searcharray[] = $regex;
        
$replacearray[] = $r['replacement'];
        
$searcharray[] = $regex;
        
$replacearray[] = $r['replacement'];
      }
    
    
$text str_replace("\\'""'"$text);
    
$text preg_replace($searcharray$replacearray$text);
    return 
$text;
}

/**
* Clean included HTML tags (called from parsehtml)
* @param array $tag
* Original code from phpBB
*/
function clean_html($tag)
{
    if (empty(
$tag[0])) { return ''; }
    
    
$allowed_html_tags preg_split('/, */''b,i,u,em,strong,span,href,src,a,img,center,br,div,li,ol,ul');
    
$disallowed_attributes '/^(?:style|on)/i';

    
// Check if this is an end tag
    
preg_match('/<[^\w\/]*\/[\W]*(\w+)/'$tag[0], $matches);
    if (
sizeof($matches)) {
        if (
in_array(strtolower($matches[1]), $allowed_html_tags)) {
            return  
'</' $matches[1] . '>';
        } else {
            return  
htmlspecialchars('</' $matches[1] . '>');
        }
    }

    
// Check if this is an allowed tag
    
if (in_array(strtolower($tag[1]), $allowed_html_tags)) {
        
$attributes '';
        if (!empty(
$tag[2])) {
            
preg_match_all('/[\W]*?(\w+)[\W]*?=[\W]*?(["\'])((?:(?!\2).)*)\2/'$tag[2], $test);
            for (
$i 0$i sizeof($test[0]); $i++) {
                if (
preg_match($disallowed_attributes$test[1][$i])) {
                    continue;
                }
                
$attributes .= ' ' $test[1][$i] . '=' $test[2][$i] . str_replace(array('['']'), array('['']'), htmlspecialchars($test[3][$i])) . $test[2][$i];
            }
        }
        if (
in_array(strtolower($tag[1]), $allowed_html_tags)) {
            return 
'<' $tag[1] . $attributes '>';
        } else {
            return 
htmlspecialchars('<' $tag[1] . $attributes '>');
        }
    }
    
// Finally, this is not an allowed tag so strip all the attibutes and escape it
    
else {
        return 
htmlspecialchars('<' .   $tag[1] . '>');
    }
}

/**
* Parses HTML Code & Cleans it
* This approach is quite agressive and anything that does not look like a valid tag
* is going to get converted to HTML entities
*/
function parsehtml($text) {
    
$text stripslashes($text);
    
$html_match '#<[^\w<]*(\w+)((?:"[^"]*"|\'[^\']*\'|[^<>\'"])+)?>#';
    
$matches = array();

    
$text_split preg_split($html_match$text);
    
preg_match_all($html_match$text$matches);

    
$text '';

    foreach (
$text_split as $part)
    {
        
$tag = array(array_shift($matches[0]), array_shift($matches[1]), array_shift($matches[2]));
        
$text .= htmlspecialchars($part) . clean_html($tag);
    }

    
$text addslashes($text);
    return 
$text;

I know this is quite difficult, and more than just a quick fix, but any help is greatly appreciated.