Bash – Use zcat and sed or awk to edit compressed .gz text file

awkbashsed

I am trying to edit compressed fastq.gz text files, by removing the first six characters of lines 2,6,10,14… I have two different ways of doing this right now, either using awk or sed, but these only seem to work if the files are unzipped. I would like to edit the files without unzipping them and tried the following code without getting it to work. Thanks.

Using sed:

zcat /dir/* | sed -i~ '2~4s/^.\{6\}//'

Using awk:

zcat /dir/* | awk 'NR%4==2 {gsub(/^....../,"")} 1'

Best Answer

You can't bypass compression, but you can chain the decompress/edit/recompress together in an automated fashion:

for f in /dir/*; do
  cp "$f" "$f~" &&   
  gzip -cd "$f~" | sed '2~4s/^.\{6\}//' | gzip > "$f"
done

If you're quite confident in the operation, you can remove the backup files by adding rm "$f~" to the end of the loop body.