Regular Expression to find the start end of a list in HTML


I have a TextBox in a webpage that i'm using javascript to parse and modify to format for HTML. 90% of it works really well, the last main thing i'm trying to be able to support is copying and pasting from a word document. I got it mostly completely, i just am kinda stuck on finding list and wrapping them in a UL tag..

So, using regular expressions, i'd like to find the list in this text:

<p>paragraph goes here

<li>goes here<br/>
<li>list item 2<br/>
<li>list item 3<br/>

<p>another paragraph

and wrap the <li> section with a <ul> tag. my regexp foo isn't that good, can someone help?

While I appreciate all the feedback basically indicating that I need to start from scratch with this issue, I do not have the time to do that. I completely understand that regex is not the ideal way to handle HTML formatting, but how I am using it now, it will handle most of what my users are looking to do. I only need a subset of HTML tags, not a full HTML editor.

The source of my content will be a user copying and pasting from a word document (about 99.9% ) of the time. i use regex to insert HTML tags into plain text. for the lists, i find the bullet character MS word inserts into it's copied text and replace that with the <LI> tag. I just want to make it more user friendly to wrap the <LI> tags with a <UL> tag.

I'll look into being able to end my tags properly, so.. assuming they're properly ended, what would be the regex to wrap my list items with a <ul> tag?


Best Solution

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski

  1. Regular expressions and HTML are a particularly bad fit.

  2. This is 2009, use closing tags in your HTML. (That alone will help you, if you really want to regex your html.

  3. If you've already got this page inside a browser, use the DOM! Let the browser parse the HTML for you (shove it into a hidden div if you must) and navigate the resulting DOM tree.