Php – How to build large MySQL INSERT query in PHP without wasting memory

mysqlphp

I have code looking something like this:

$data = file_get_contents($tempFile); // perhaps 30MB of file data, now in PHP's memory
$hash = md5($data);
$query = "INSERT INTO some_table
          SET BlobData = '" . mysql_real_escape_string($data) . "',
          BlobHash = '$hash'
          ";
mysql_query($query);

I know this isn't very efficient as each of the '.' operators will reallocate a bigger memory block and the 30MB string will be copied several times.

Is there anything more efficient than the following solution?

$data = file_get_contents($tempFile); // perhaps 30MB of file data, now in PHP's memory
$hash = md5($data);
$query = "INSERT INTO some_table SET BlobData = '%s', BlobHash = '$hash'";
mysql_query(sprintf($query, mysql_real_escape_string($data)));

Best Solution

You have two issues here:

#1, there are several different ways you can compute the MD5 hash:

  • Do as you do and load into PHP as a string and use PHP's md5()
  • Use PHP's md5_file()
  • As of PHP 5.1+ you can use PHP's streams API with either of md5 or md5_file to avoid loading entirely into memory
  • Use exec() to call the system's md5sum command
  • Use MySQL's MD5() function to compute the hash

Since these are all trivial to implement it would be easy for you to implement and benchmark them all for memory usage and speed. Here are some benchmarks showing system md5 via exec to be a lot faster than PHP's md5_file as file size increases. Doing it your way is definitely the worst way as far as memory usage is concerned.

#2, mysql_real_escape_string performs a database query, so you're actually transmitting your blob data to the database, getting it back as a string, and transmitting it again(!) with the INSERT query. So it's traveling to/from the DB server 3x instead of 1x and using 2x the memory in PHP.

It's going to be more efficient to use PHP5 prepared statements and only send this data to the database once. Read the linked article section, you'll see it mentions that when you are binding parameters, you can use the blob type to stream blob data to the DB in chunks. The PHP docs for mysqli_stmt::send_long_data have a great simple example of this that INSERTs a file into a blob column just like you are.

By doing that, and by using either the streams API, md5_file or exec with the system md5 command, you can do your entire INSERT without ever loading the entire file into memory, which means memory usage for your series of operations can be as low as you want!