Hive query – INSERT OVERWRITE LOCAL DIRECTORY creates multiple files for a single table


I do the following from a hive table myTable.

INSERT OVERWRITE LOCAL DIRECTORY '/myDir/out' SELECT concat_ws('',NAME,PRODUCT,PRC,field1,field2,field3,field4,field5) FROM myTable;

So, this command generates 2 files 000000_0 and 000001_0 inside the folder out/.

But, I need the contents as a single file. What should I do?

Best Solution

There are multiple files in the directory because every reducer is writing one file. If you really need the contents as a single file, run your map reduce job with only 1 reducer which will write to a single file.

However depending on your data size, this might not be a good approach to run a single reducer.

Edit: Instead of forcing hive to run 1 reduce task and output a single reduce file, it would be better to use hadoop fs operations to merge outputs to a single file. For example

hadoop fs -text /myDir/out/* | hadoop fs -put - /myDir/out.txt
Related Question