The trouble with MarkusQ's solution is knowing which characters are special inside double quotes - there are quite a lot of them, including back-ticks, dollar-open parenthesis, dollar-variable, etc.
I would suggest it is better to enclose the string inside single quotes; then, each single quote inside the string needs to be replaced by the sequence quote, backslash, quote, quote:
sqlite3.bin contacts.db 'select * from contacts
where source = "Nancy'\''s notes"'
The first quote in the replacement terminates the current single-quoted string; the backslash-quote represents a literal single quote, and the final quote starts a new single-quoted string. Further, this works with Bourne, Korn, Bash and POSIX shells in general. (C Shell and derivatives have more complex rules needing backslashes to escape newlines, and so on.)
If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.
XML escape characters
There are only five:
" "
' '
< <
> >
& &
Escaping characters depends on where the special character is used.
The examples can be validated at the W3C Markup Validation Service.
Text
The safe way is to escape all five characters in text. However, the three characters "
, '
and >
needn't be escaped in text:
<?xml version="1.0"?>
<valid>"'></valid>
Attributes
The safe way is to escape all five characters in attributes. However, the >
character needn't be escaped in attributes:
<?xml version="1.0"?>
<valid attribute=">"/>
The '
character needn't be escaped in attributes if the quotes are "
:
<?xml version="1.0"?>
<valid attribute="'"/>
Likewise, the "
needn't be escaped in attributes if the quotes are '
:
<?xml version="1.0"?>
<valid attribute='"'/>
Comments
All five special characters must not be escaped in comments:
<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>
CDATA
All five special characters must not be escaped in CDATA sections:
<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>
Processing instructions
All five special characters must not be escaped in XML processing instructions:
<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>
XML vs. HTML
HTML has its own set of escape codes which cover a lot more characters.
Best Answer
There are two easy and safe rules which work not only in
sh
but alsobash
.1. Put the whole string in single quotes
This works for all chars except single quote itself. To escape the single quote, close the quoting before it, insert the single quote, and re-open the quoting.
sed command:
sed -e "s/'/'\\\\''/g; 1s/^/'/; \$s/\$/'/"
2. Escape every char with a backslash
This works for all characters except newline. For newline characters use single or double quotes. Empty strings must still be handled - replace with
""
sed command:
sed -e 's/./\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'
.2b. More readable version of 2
There's an easy safe set of characters, like
[a-zA-Z0-9,._+:@%/-]
, which can be left unescaped to keep it more readablesed command:
LC_ALL=C sed -e 's/[^a-zA-Z0-9,._+@%/-]/\\&/g; 1{$s/^$/""/}; 1!s/^/"/; $!s/$/"/'
.Note that in a sed program, one can't know whether the last line of input ends with a newline byte (except when it's empty). That's why both above sed commands assume it does not. You can add a quoted newline manually.
Note that shell variables are only defined for text in the POSIX sense. Processing binary data is not defined. For the implementations that matter, binary works with the exception of NUL bytes (because variables are implemented with C strings, and meant to be used as C strings, namely program arguments), but you should switch to a "binary" locale such as latin1.
(You can easily validate the rules by reading the POSIX spec for
sh
. For bash, check the reference manual linked by @AustinPhillips)