R – an XML parser? Using Expat


This might seem like a simple question.

But I have been looking for an XML parser to use in one of my applications that is running on Linux.

I am using Expat and have parsed my XML file by reading one in. However, the output is the same as the input.

This is my file I am reading in:

<?xml version="1.0" encoding="utf-8"?>
              <name>Hello, world!</name>

However, after I have passed this, I get exactly the same as the output. It makes me wonder what the parser is for?

Just one more thing. I am using Expat. Which seems quite difficult to use. My code is below: This reads in a file. But my application will have to parse a buffer that will be received by a socket, and not from a file. Is there any samples of this that anyone has?

int parse_xml(char *buff)
    FILE *fp;
    fp = fopen("mybook.xml", "r");
    if(fp == NULL)
        printf("Failed to open file\n");
        return 1;

   /* Obtain the file size. */
    fseek (fp, 0, SEEK_END);
    size_t file_size = ftell(fp);

    XML_Parser parser = XML_ParserCreate(NULL);
    int done;
    memset(buff, 0, sizeof(buff));

        size_t len = fread(buff, 1, file_size, fp);
        done = len < sizeof(buff);

        if(XML_Parse(parser, buff, len, done) == XML_STATUS_ERROR)
            printf("%s at line %d\n", XML_ErrorString(XML_GetErrorCode(parser)),
            return 1;


    return 0;

Best Solution

It took a while to wrap my head around XML parsing (though I do it in Perl, not C). Basically, you register callback functions. The parser will ping your callback for each node and pass in a data structure containing all kinds of juicy bits (like plaintext, any attributes, children nodes, etc). You have to maintain some kind of state information--like a hash tree you plug stuff into, or a string that contains all the guts, but none of the XML.

Just remember that XML is not linear and it doesn't make much sense to parse it like a long hunk of text. Instead, you parse it like a tree. Good luck.