The correct way to avoid SQL injection attacks, no matter which database you use, is to separate the data from SQL, so that data stays data and will never be interpreted as commands by the SQL parser. It is possible to create SQL statement with correctly formatted data parts, but if you don't fully understand the details, you should always use prepared statements and parameterized queries. These are SQL statements that are sent to and parsed by the database server separately from any parameters. This way it is impossible for an attacker to inject malicious SQL.
You basically have two options to achieve this:
Using PDO (for any supported database driver):
$stmt = $pdo->prepare('SELECT * FROM employees WHERE name = :name');
$stmt->execute([ 'name' => $name ]);
foreach ($stmt as $row) {
// Do something with $row
}
Using MySQLi (for MySQL):
$stmt = $dbConnection->prepare('SELECT * FROM employees WHERE name = ?');
$stmt->bind_param('s', $name); // 's' specifies the variable type => 'string'
$stmt->execute();
$result = $stmt->get_result();
while ($row = $result->fetch_assoc()) {
// Do something with $row
}
If you're connecting to a database other than MySQL, there is a driver-specific second option that you can refer to (for example, pg_prepare()
and pg_execute()
for PostgreSQL). PDO is the universal option.
Correctly setting up the connection
Note that when using PDO to access a MySQL database real prepared statements are not used by default. To fix this you have to disable the emulation of prepared statements. An example of creating a connection using PDO is:
$dbConnection = new PDO('mysql:dbname=dbtest;host=127.0.0.1;charset=utf8', 'user', 'password');
$dbConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$dbConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
In the above example the error mode isn't strictly necessary, but it is advised to add it. This way the script will not stop with a Fatal Error
when something goes wrong. And it gives the developer the chance to catch
any error(s) which are throw
n as PDOException
s.
What is mandatory, however, is the first setAttribute()
line, which tells PDO to disable emulated prepared statements and use real prepared statements. This makes sure the statement and the values aren't parsed by PHP before sending it to the MySQL server (giving a possible attacker no chance to inject malicious SQL).
Although you can set the charset
in the options of the constructor, it's important to note that 'older' versions of PHP (before 5.3.6) silently ignored the charset parameter in the DSN.
Explanation
The SQL statement you pass to prepare
is parsed and compiled by the database server. By specifying parameters (either a ?
or a named parameter like :name
in the example above) you tell the database engine where you want to filter on. Then when you call execute
, the prepared statement is combined with the parameter values you specify.
The important thing here is that the parameter values are combined with the compiled statement, not an SQL string. SQL injection works by tricking the script into including malicious strings when it creates SQL to send to the database. So by sending the actual SQL separately from the parameters, you limit the risk of ending up with something you didn't intend.
Any parameters you send when using a prepared statement will just be treated as strings (although the database engine may do some optimization so parameters may end up as numbers too, of course). In the example above, if the $name
variable contains 'Sarah'; DELETE FROM employees
the result would simply be a search for the string "'Sarah'; DELETE FROM employees"
, and you will not end up with an empty table.
Another benefit of using prepared statements is that if you execute the same statement many times in the same session it will only be parsed and compiled once, giving you some speed gains.
Oh, and since you asked about how to do it for an insert, here's an example (using PDO):
$preparedStatement = $db->prepare('INSERT INTO table (column) VALUES (:column)');
$preparedStatement->execute([ 'column' => $unsafeValue ]);
Can prepared statements be used for dynamic queries?
While you can still use prepared statements for the query parameters, the structure of the dynamic query itself cannot be parametrized and certain query features cannot be parametrized.
For these specific scenarios, the best thing to do is use a whitelist filter that restricts the possible values.
// Value whitelist
// $dir can only be 'DESC', otherwise it will be 'ASC'
if (empty($dir) || $dir !== 'DESC') {
$dir = 'ASC';
}
From Save MySQL query results into a text or CSV file:
SELECT order_id,product_name,qty
FROM orders
WHERE foo = 'bar'
INTO OUTFILE '/var/lib/mysql-files/orders.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Note: That syntax may need to be reordered to
SELECT order_id,product_name,qty
INTO OUTFILE '/var/lib/mysql-files/orders.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM orders
WHERE foo = 'bar';
in more recent versions of MySQL.
Using this command, columns names will not be exported.
Also note that /var/lib/mysql-files/orders.csv
will be on the server that is running MySQL. The user that the MySQL process is running under must have permissions to write to the directory chosen, or the command will fail.
If you want to write output to your local machine from a remote server (especially a hosted or virtualize machine such as Heroku or Amazon RDS), this solution is not suitable.
Best Solution
For MySQL 8+: use the recursive
with
syntax.For MySQL 5.x: use inline variables, path IDs, or self-joins.
MySQL 8+
The value specified in
parent_id = 19
should be set to theid
of the parent you want to select all the descendants of.MySQL 5.x
For MySQL versions that do not support Common Table Expressions (up to version 5.7), you would achieve this with the following query:
Here is a fiddle.
Here, the value specified in
@pv := '19'
should be set to theid
of the parent you want to select all the descendants of.This will work also if a parent has multiple children. However, it is required that each record fulfills the condition
parent_id < id
, otherwise the results will not be complete.Variable assignments inside a query
This query uses specific MySQL syntax: variables are assigned and modified during its execution. Some assumptions are made about the order of execution:
from
clause is evaluated first. So that is where@pv
gets initialised.where
clause is evaluated for each record in the order of retrieval from thefrom
aliases. So this is where a condition is put to only include records for which the parent was already identified as being in the descendant tree (all descendants of the primary parent are progressively added to@pv
).where
clause are evaluated in order, and the evaluation is interrupted once the total outcome is certain. Therefore the second condition must be in second place, as it adds theid
to the parent list, and this should only happen if theid
passes the first condition. Thelength
function is only called to make sure this condition is always true, even if thepv
string would for some reason yield a falsy value.All in all, one may find these assumptions too risky to rely on. The documentation warns:
So even though it works consistently with the above query, the evaluation order may still change, for instance when you add conditions or use this query as a view or sub-query in a larger query. It is a "feature" that will be removed in a future MySQL release:
As stated above, from MySQL 8.0 onward you should use the recursive
with
syntax.Efficiency
For very large data sets this solution might get slow, as the
find_in_set
operation is not the most ideal way to find a number in a list, certainly not in a list that reaches a size in the same order of magnitude as the number of records returned.Alternative 1:
with recursive
,connect by
More and more databases implement the SQL:1999 ISO standard
WITH [RECURSIVE]
syntax for recursive queries (e.g. Postgres 8.4+, SQL Server 2005+, DB2, Oracle 11gR2+, SQLite 3.8.4+, Firebird 2.1+, H2, HyperSQL 2.1.0+, Teradata, MariaDB 10.2.2+). And as of version 8.0, also MySQL supports it. See the top of this answer for the syntax to use.Some databases have an alternative, non-standard syntax for hierarchical look-ups, such as the
CONNECT BY
clause available on Oracle, DB2, Informix, CUBRID and other databases.MySQL version 5.7 does not offer such a feature. When your database engine provides this syntax or you can migrate to one that does, then that is certainly the best option to go for. If not, then also consider the following alternatives.
Alternative 2: Path-style Identifiers
Things become a lot easier if you would assign
id
values that contain the hierarchical information: a path. For example, in your case this could look like this:Then your
select
would look like this:Alternative 3: Repeated Self-joins
If you know an upper limit for how deep your hierarchy tree can become, you can use a standard
sql
query like this:See this fiddle
The
where
condition specifies which parent you want to retrieve the descendants of. You can extend this query with more levels as needed.