Php – Using regex in php to add a cell in a row


As usual I have trouble writing a good regex.

I am trying to make a plugin for Joomla to add a button to the optional print, email and PDF buttons produced by the core on the right of article titles. If I succeed I will distribute it under the GPL. None of the examples I found seem to work and I would like to create a php-only solution.

The idea is to use the unique pattern of the Joomla output for article titles and buttons for one or more regex. One regex would find the right table by looking for a table with class "contentpaneopen" (of which there are several in a page) and containing a cell with class "contentheading". A second regex could check if in that table there is a cell with class "buttonheading". The number of these cells could be from zero to three but I could use this check if the first regex returns more than one match. With this, I would like to replace the table by the same table but with an extra cell holding the button I want to add. I could do that by taking off the last row and table closing tags and inserting my button cell before adding those closing tags again.

The normal Joomla output looks like this:

<table class="contentpaneopen">
            <td width="100%" class="contentheading">
                <a class="contentpagetitle" href="url">Title Here</a>
            <td width="100%" align="right" class="buttonheading">
                <a rel="nofollow" onclick="etc" title="PDF" href="url"><img alt="PDF" src="/templates/neutral/images/pdf_button.png"/></a>
            <td width="100%" align="right" class="buttonheading">
                <a rel="nofollow" onclick="etc" title="Print" href="url"><img alt="Print" src="/templates/neutral/images/printButton.png" ></a>

The code would very roughly be something like this:

$subject = $article;
$pattern1 = '[regex1]'; //<table class="contentpaneopen">etc</table>
preg_match($pattern, $subject, $match);
$pattern2 = '[regex2]'; //</tr></tbody></table>
$replacement = [mybutton];
echo preg_replace($pattern2, $replacement, $match);

Without a good regex there is little point doing the rest of the code, so I hope someone can help with that!

Best Solution

This is a common question on SO and the answer is always the same: regular expressions are a poor choice for parsing or processing HTML or XML. There are many ways they can break down. PHP comes with at least three built-in HTML parsers that will be far more robust.

Take a look at Parse HTML With PHP And DOM and use something like:

$html = new DomDocument;
$html->preserveWhiteSpace = false; 
$tables = $html->getElementsByTagName('table'); 
foreach ($tables as $table) {
  if ($table->getAttribute('class') == 'contentpaneopen') {
    // replace it with something else