The situation:
Lets say we are implementing a blog engine based on JCR with support for localization.
The content structure looks something like this /blogname/content/[node name]
The problem:
What is the best way to name the content nodes (/blogname/content/[nodename]) to satisfy the following requirements:
- The node name must be usable in HTML to support REST like URLs i.e.: blogname.com/content/nodename should point to a single content item.
- The above requirement must not produce ugly URLs i.e.: /content/node_name is good, /content/node%20name is bad.
- Programmatic retrieval should be easy given the node name i.e.: //content[@node_name=some-name]
- The naming scheme must guarantee node name uniqueness.
PS: The JCR implementation used is JackRabbit
Best Solution
For 1. to 3. the answer is simple: just use characters you want to see in the node name, ie. escape whatever input string you have (eg. the blog post title) against a restricted character set such as the one for URIs.
For example, do not allow spaces (which are allowed for JCR node names, but would produce the ugly
%20
in URLs) and other chars that must be encoded in URLs. You can remove those chars or simply replace them with a underscore, because that looks good in most cases.Regarding unique names (4.), you can either include the current time incl. milliseconds into it or you explicitly check for collisions. The first might look a bit ugly, but should probably never fail for a blog scenario. The latter can be done by reacting upon the exception thrown if a node with such a name already exists and adding eg. an incrementing counter and try again (eg.
my_great_post1
,my_great_post2
, etc.). You can also lock the parent node so that only one session can actually add a node at the same time, which avoids a trial loop, but comes at the cost of blocking.Note:
//content[@node_name=some-name]
is not a valid JCR Xpath query. You probably want to use/jcr:root/content//some-name
for that.