Javascript – Regex for parsing single key: values out of JSON in Javascript

javascriptjsonregex

I'm trying to see if it's possible to lookup individual keys out of a JSON string in Javascript and return it's Value with Regex. Sort of like building a JSON search tool.

Imagine the following JSON

"{
    "Name": "Humpty",
    "Age": "18",
    "Siblings" : ["Dracula", "Snow White", "Merlin"],
    "Posts": [
        {
            "Title": "How I fell",
            "Comments": [
                { 
                    "User":"Fairy God Mother",
                    "Comment": "Ha, can't say I didn't see it coming"
                }
            ]
        }
    ]
}"

I want to be able to search through the JSON string and only pull out individual properties.

lets assume it's a function already, it would look something like.

function getPropFromJSON(prop, JSONString){
    // Obviously this regex will only match Keys that have
    // String Values.
    var exp = new RegExp("\""+prop+"\"\:[^\,\}]*");
    return JSONString.match(exp)[0].replace("\""+prop+"\":","");    
}

It would return the substring of the Value for the Key.

e.g.

getPropFromJSON("Comments")

> "[
    { 
        "User":"Fairy God Mother",
        "Comment": "Ha, can't say I didn't see it coming"
    }
]"

If your wondering why I want to do this instead of using JSON.parse(), I'm building a JSON document store around localStorage. localStorage only supports key/value pairs, so I'm storing a JSON string of the entire Document in a unique Key. I want to be able to run a query on the documents, ideally without the overhead of JSON.parsing() the entire Collection of Documents then recursing over the Keys/nested Keys to find a match.

I'm not the best at regex so I don't know how to do this, or if it's even possible with regex alone. This is only an experiment to find out if it's possible. Any other ideas as a solution would be appreciated.

Best Solution

I would strongly discourage you from doing this. JSON is not a regular language as clearly stated here: https://cstheory.stackexchange.com/questions/3987/is-json-a-regular-language

To quote from the above post:

For example, consider an array of arrays of arrays:

[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ] 

Clearly you couldn't parse that with true regular expressions.

I'd recommend converting your JSON to an object (JSON.parse) & implementing a find function to traverse the structure.

Other than that, you can take a look at guts of Douglas Crockford's json2.js parse method. Perhaps an altered version would allow you to search through the JSON string & just return the particular object you were looking for without converting the entire structure to an object. This is only useful if you never retrieve any other data from your JSON. If you do, you might as well have converted the whole thing to begin with.

EDIT

Just to further show how Regex breaks down, here's a regex that attempts to parse JSON

If you plug it into http://regexpal.com/ with "Dot Matches All" checked. You'll find that it can match some elements nicely like:

Regex

"Comments"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\") 

JSON Matched

"Comments": [
                { 
                    "User":"Fairy God Mother",
                    "Comment": "Ha, can't say I didn't see it coming"
                }
            ]

Regex

"Name"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")

JSON Matched

"Name": "Humpty"

However as soon as you start querying for the higher structures like "Posts", which has nested arrays, you'll find that you cannot correctly return the structure since the regex does not have context of which "]" is the designated end of the structure.

Regex

"Posts"[ :]+((?=\[)\[[^]]*\]|(?=\{)\{[^\}]*\}|\"[^"]*\")

JSON Matched

"Posts": [
  {
      "Title": "How I fell",
      "Comments": [
          { 
              "User":"Fairy God Mother",
              "Comment": "Ha, can't say I didn't see it coming"
          }
      ]