Programmatically editing Sharepoint Wiki content

sharepointsharepoint-wiki

I'd like to edit my Sharepoint Wiki content programmatically. One of the advantages would be to add indexes to the Wiki environment automatically.

Has anyone been able to do this? Language doesn't matter, but looking for scripting solution.

Best Solution

Yes. I have rolled my own Metaweblog API that programmatically manages wiki pages in Sharepoint 2010 and 2007.

My sources:

The service code for both SP 2010 and 2007 is pretty much identical, but there are a few caveats:

  • In 2010, don't need to worry about managing wiki link markup (e.g. [[brackets]]).
  • In 2007, wiki markup is converted on your request, so you have to re-convert it to Wiki markup before posting back. On posting back, you cannot use UpdateListItems, you must use the Copy service. This is because UpdateListItems will escape any wiki markup, effectively making your efforts useless.
  • In our environment, we require RecordType to be filled in before checking in. Maybe this is standard? If you don't set this field, your page will remain checked out to you. So, I have a conditional that sets this field for SP2007.
  • In 2010, SP adds a bunch of markup in the raw WikiField value, and if it's missing it could mess up layouts. I just insert it around the value WLW is posting, then strip it out on getting. See below.

I use the Copy service as in the first link to create AND update the wiki pages. In 2010, you can use the Lists service to update, but not to add. I use the Imaging service to upload images automatically to a picture library.

Here is a function to replace the "ms-wikilinks" to wiki markup:

Note: I use the HTMLAgilityPack in case the markup returned is malformed. You could use Regex to do this too. I also use Microsoft Anti-XSS 4.1 library to sanitize markup.

Note 2: My UrlDecode function does not take a dependency on System.Web, taken from here.

/// <summary>
/// Sharepoint 2007 is mean and converts [[wiki links]] once the page is saved in the Sharepoint editor.
/// Luckily, each link is decorated with class="ms-wikilink" and follows some conventions.
/// </summary>
/// <param name="html"></param>
/// <returns></returns>
private static string ConvertAnchorsToWikiLinks(this string html)
{
    HtmlDocument htmlDoc = new HtmlDocument();

    htmlDoc.LoadHtml(html);

    var anchorTags = (from d in htmlDoc.DocumentNode.Descendants()
                      where d.Attributes.Contains("class") && d.Attributes["class"].Value == "ms-wikilink"
                      select d).ToList();

    foreach (var anchor in anchorTags)
    {
        // Two kinds of links
        // [[Direct Link]]
        // [[Wiki Page Name|Display Name]]
        var wikiPageFromLink = UrlDecode(anchor.Attributes["href"].Value.Split('/').LastOrDefault().Replace(".aspx", ""));
        var wikiPageFromText = anchor.InnerText;

        HtmlNode textNode = null;

        if (wikiPageFromLink == wikiPageFromText)
        {
            // Simple link
            textNode = HtmlTextNode.CreateNode("[[" + wikiPageFromText + "]]");
        }
        else
        {
            // Substituted link
            textNode = HtmlTextNode.CreateNode(String.Format("[[{0}|{1}]]", wikiPageFromLink, wikiPageFromText));
        }

        if (textNode != null)
        {
           anchor.ParentNode.ReplaceChild(textNode, anchor);
        }
    }

    return htmlDoc.DocumentNode.InnerHtml;
}

The function to strip SharePoint's HTML is:

/// <summary>
/// Gets editable HTML for a wiki page from a SharePoint HTML fragment.
/// </summary>
/// <param name="html"></param>
/// <returns></returns>
public static string GetHtmlEditableContent(string html)
{
    HtmlDocument htmlDoc = new HtmlDocument();

    htmlDoc.LoadHtml(html);

    HtmlNode divNode = (from d in htmlDoc.DocumentNode.Descendants()
                        where d.Attributes.Contains("class") && d.Attributes["class"].Value == "ms-rte-layoutszone-inner"
                        select d).FirstOrDefault();
    HtmlNode divNode2 = (from d in htmlDoc.DocumentNode.Descendants()
                         where d.Attributes.Contains("class") && d.Attributes["class"].Value.StartsWith("ExternalClass")
                         select d).FirstOrDefault();

    if (divNode != null)
    {
        // SP 2010
        return divNode.InnerHtml;
    }
    else if (divNode2 != null)
    {
        // SP 2007 or something else
        return divNode2.InnerHtml.ConvertAnchorsToWikiLinks();
    }
    else
    {
        return null;
    }
}

And finally, the function that adds that markup all back:

/// <summary>
/// Inserts SharePoint's wrapping HTML around wiki page content. Stupid!
/// </summary>
/// <param name="html"></param>
/// <returns></returns>
public static string InsertSharepointHtmlWrapper(string html, SharePointVersion spVersion)
{
    // No weird wrapper HTML for 2007
    if (spVersion == SharePointVersion.SP2007)
        return Microsoft.Security.Application.Sanitizer.GetSafeHtmlFragment(html);

    HtmlDocument htmlDoc = new HtmlDocument();

    htmlDoc.LoadHtml(@"<table id='layoutsTable' style='width:100%'>
                            <tbody>
                                <tr>
                                    <td>
                                        <div class='ms-rte-layoutszone-outer' style='width:99.9%'>
                                            <div class='ms-rte-layoutszone-inner' style='min-height:60px;word-wrap:break-word'>
                                            </div>
                                        </div>
                                    </td>
                                </tr>
                            </tbody>
                        </table>
                        <span id='layoutsData' style='display:none'>false,false,1</span>");

    HtmlNode divNode = (from d in htmlDoc.DocumentNode.Descendants()
                        where d.Attributes.Contains("class") && d.Attributes["class"].Value == "ms-rte-layoutszone-inner"
                        select d).FirstOrDefault();

    divNode.InnerHtml = Microsoft.Security.Application.Sanitizer.GetSafeHtmlFragment(html);

    return htmlDoc.DocumentNode.InnerHtml;
}

This works great.

  • Pages still retain last modified and correct user
  • Pages will retain all their history
  • Pages are easier to manage

I am thinking of publishing my API, it's not a lot of code I think is super helpful for those of us that want to better manage our Sharepoint wikis. With WLW I get auto-image upload, better HTML editing support, and support for plugins like PreCode Snippet. It's awesome!

Related Question