I need to perform a query 2.5 million times. This query generates some rows which I need to
AVG(column) and then use this
AVG to filter the table from all values below average. I then need to
INSERT these filtered results into a table.
The only way to do such a thing with reasonable efficiency, seems to be by creating a
TEMPORARY TABLE for each query-postmaster python-thread. I am just hoping these
TEMPORARY TABLEs will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course.
I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)
Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See
Temporary table are, however, dropped at the end of a database session:
There are multiple considerations you have to take into account:
DROPa temporary table at the end of a transaction, create it with the
CREATE TEMPORARY TABLE ... ON COMMIT DROPsyntax.
CREATE, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the
ON COMMIT DROPcreation syntax), or on an as-needed basis (by preceding any
CREATE TEMPORARY TABLEstatement with a corresponding
DROP TABLE IF EXISTS, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)
Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an
ANALYZEon the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.