The correct way to avoid SQL injection attacks, no matter which database you use, is to separate the data from SQL, so that data stays data and will never be interpreted as commands by the SQL parser. It is possible to create SQL statement with correctly formatted data parts, but if you don't fully understand the details, you should always use prepared statements and parameterized queries. These are SQL statements that are sent to and parsed by the database server separately from any parameters. This way it is impossible for an attacker to inject malicious SQL.
You basically have two options to achieve this:
Using PDO (for any supported database driver):
$stmt = $pdo->prepare('SELECT * FROM employees WHERE name = :name');
$stmt->execute([ 'name' => $name ]);
foreach ($stmt as $row) {
// Do something with $row
}
Using MySQLi (for MySQL):
$stmt = $dbConnection->prepare('SELECT * FROM employees WHERE name = ?');
$stmt->bind_param('s', $name); // 's' specifies the variable type => 'string'
$stmt->execute();
$result = $stmt->get_result();
while ($row = $result->fetch_assoc()) {
// Do something with $row
}
If you're connecting to a database other than MySQL, there is a driver-specific second option that you can refer to (for example, pg_prepare()
and pg_execute()
for PostgreSQL). PDO is the universal option.
Correctly setting up the connection
Note that when using PDO to access a MySQL database real prepared statements are not used by default. To fix this you have to disable the emulation of prepared statements. An example of creating a connection using PDO is:
$dbConnection = new PDO('mysql:dbname=dbtest;host=127.0.0.1;charset=utf8', 'user', 'password');
$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$dbConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
In the above example the error mode isn't strictly necessary, but it is advised to add it. This way the script will not stop with a Fatal Error
when something goes wrong. And it gives the developer the chance to catch
any error(s) which are throw
n as PDOException
s.
What is mandatory, however, is the first setAttribute()
line, which tells PDO to disable emulated prepared statements and use real prepared statements. This makes sure the statement and the values aren't parsed by PHP before sending it to the MySQL server (giving a possible attacker no chance to inject malicious SQL).
Although you can set the charset
in the options of the constructor, it's important to note that 'older' versions of PHP (before 5.3.6) silently ignored the charset parameter in the DSN.
Explanation
The SQL statement you pass to prepare
is parsed and compiled by the database server. By specifying parameters (either a ?
or a named parameter like :name
in the example above) you tell the database engine where you want to filter on. Then when you call execute
, the prepared statement is combined with the parameter values you specify.
The important thing here is that the parameter values are combined with the compiled statement, not an SQL string. SQL injection works by tricking the script into including malicious strings when it creates SQL to send to the database. So by sending the actual SQL separately from the parameters, you limit the risk of ending up with something you didn't intend.
Any parameters you send when using a prepared statement will just be treated as strings (although the database engine may do some optimization so parameters may end up as numbers too, of course). In the example above, if the $name
variable contains 'Sarah'; DELETE FROM employees
the result would simply be a search for the string "'Sarah'; DELETE FROM employees"
, and you will not end up with an empty table.
Another benefit of using prepared statements is that if you execute the same statement many times in the same session it will only be parsed and compiled once, giving you some speed gains.
Oh, and since you asked about how to do it for an insert, here's an example (using PDO):
$preparedStatement = $db->prepare('INSERT INTO table (column) VALUES (:column)');
$preparedStatement->execute([ 'column' => $unsafeValue ]);
Can prepared statements be used for dynamic queries?
While you can still use prepared statements for the query parameters, the structure of the dynamic query itself cannot be parametrized and certain query features cannot be parametrized.
For these specific scenarios, the best thing to do is use a whitelist filter that restricts the possible values.
// Value whitelist
// $dir can only be 'DESC', otherwise it will be 'ASC'
if (empty($dir) || $dir !== 'DESC') {
$dir = 'ASC';
}
Now that MySQL 8.0 supports recursive queries, we can say that all popular SQL databases support recursive queries in standard syntax.
WITH RECURSIVE MyTree AS (
SELECT * FROM MyTable WHERE ParentId IS NULL
UNION ALL
SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;
I tested recursive queries in MySQL 8.0 in my presentation Recursive Query Throwdown in 2017.
Below is my original answer from 2008:
There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:
- Adjacency List (the "parent" column) and
- Path Enumeration (the dotted-numbers in your name column).
Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.
I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.
I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.
CREATE TABLE ClosureTable (
ancestor_id INT NOT NULL REFERENCES FlatTable(id),
descendant_id INT NOT NULL REFERENCES FlatTable(id),
PRIMARY KEY (ancestor_id, descendant_id)
);
Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:
INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
(1,1), (1,2), (1,4), (1,6),
(2,2), (2,4),
(3,3), (3,5),
(4,4),
(5,5),
(6,6);
Now you can get a tree starting at node 1 like this:
SELECT f.*
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;
The output (in MySQL client) looks like the following:
+----+
| id |
+----+
| 1 |
| 2 |
| 4 |
| 6 |
+----+
In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.
Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length
" column to the ClosureTable
to make it easier to query specifically for an immediate child or parent (or any other distance).
INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
(1,1,0), (1,2,1), (1,4,2), (1,6,1),
(2,2,0), (2,4,1),
(3,3,0), (3,5,1),
(4,4,0),
(5,5,0),
(6,6,0);
Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length
is 1.
SELECT f.*
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
AND path_length = 1;
+----+
| id |
+----+
| 2 |
| 6 |
+----+
Re comment from @ashraf: "How about sorting the whole tree [by name]?"
Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name
, and sort by the name.
SELECT f.name
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;
Re comment from @Nate:
SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id)
WHERE a.ancestor_id = 1
GROUP BY a.descendant_id
ORDER BY f.name
+------------+-------------+
| name | breadcrumbs |
+------------+-------------+
| Node 1 | 1 |
| Node 1.1 | 1,2 |
| Node 1.1.1 | 1,2,4 |
| Node 1.2 | 1,6 |
+------------+-------------+
A user suggested an edit today. SO moderators approved the edit, but I am reversing it.
The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name
, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".
If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.
Best Solution
You can use a subselect