Java – Postgres encoding “UTF8” error whilst inserting images via Java

databaseimagejavapostgresqlsql

I am inserting jpeg images into my UTF-8 encoded Postgres database into bytea column/s. I'm using a prepared sql statement to insert the images. In the statement I create a file object, within Java, of the jpeg image and then pass it as a FileInputStream into the setBinaryStream method. However every now and again my Java app will throw an exception once the statement is executed, stating that:

" ERROR: invalid byte sequence for encoding "UTF8": 0x84 "

This happens for a select few images which is odd. These images are extracted from a previous set of images, all the previous images insert fine only a few extracted images seem to cause the error. So how do I solve such a problem? Can encode the byte steam to UTF-8 somehow? Or is it a problem with the database?

Btw, if I replace the extracted images with new ones and save them as jpegs the same error occurs. Thanks for your help!

The code is below as requested…

There is some code missing otherwise this would be very long but, basically I do a few checks on the path and directory names to make sure they adhere to the file system rules. This is in a loop which goes through all the sub-directories and adds all
the jpeg files in those sub-directories. I then go to the next directory with sub-directories with images till there are none there. I have yet to add try-catches and logging sections yet.

String imgStr = image.toString();
int age = getAgeFromDir(imgStr);
String gender = getSexFromDir(imgStr);
String table = "";
switch(validIdx){
    case 0: table = "carpals";
        break;
    case 1: table = "d_phalanges";
        break;
    case 2: table = "p_phalanges";
        break;
    case 3: table = "i_phalanges";
        break;
    case 4: table = "epiphyses";
        break;
    case 5: table = "sesamoids";
        break;
    case 6: table = "metacarpals ";
        break;
}

    PreparedStatement ps = con.prepareCall("INSERT INTO " + table +
            " VALUES( (SELECT hands.hand_id FROM hands WHERE hands.age = " + age + " AND hands.gender = '" + gender + "' AND hands.location = '" + path + directory + imageNames[i] + "' )," +
            " (SELECT COUNT(" + table + ".location) FROM " + table + " ), " +
            " ?, ? )"   );

        //go through each sub-directory which contains jpeg images and add them to
        //the database
        File sublist = new File(image + "\\" + subdir[j]);
        String[] files = sublist.list();
        String[] pics = sublist.list(new JpegFilter());

        if(files.length > pics.length){
            //WRITE TO LOG
            //WARNING UNEXPECTED FILES OR DIRECTORIES FOUND IN....
        }

            for(int r = 0; r < pics.length; r++ ){

                    String location = image + "\\" + subdir[j] + "\\" + pics[r];
                    System.out.println(i + "\t" + r + " location : " + location);

                    File f = new File(location);
                    FileInputStream pic = new FileInputStream(f);


                    if(f.isFile()){
                    ps.setString(2, location);
                    ps.setBinaryStream(1, pic, (int)f.length());
                    ps.execute();
                    pic.close();
                    }
            }
    ps.close();

}

The SQLException thrown is below, it is thrown at ps.execute():

Exception in thread "main" org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x84
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1608)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1343)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:194)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:350)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:343)
at nuffielddb.HandDB.addExtractedImages(HandDB.java:406)
at nuffielddb.Main.main(Main.java:37)
Java Result: 1

Best Solution

Well, 0x84 is not a valid utf8 character:

=> perl -e 'print "\x84"' | iconv -f utf8 -t utf8
iconv: illegal input sequence at position 0

Generally - bytea will work with any bytes, but INSERT statement is a text string, and as such has to conform to encoding of client!

Simple way to insert the data:

  1. Encode the data in your application to be in Base64 format (there are also other options, but this one is easiest for me to show)
  2. insert it with: INSERT INTO q (x) VALUES (decode(?, 'base64'))

Example in Perl (sorry, I don't write Java):

#!/usr/bin/perl
use MIME::Base64;
use DBI;

my $dbh = DBI->connect( "dbi:Pg:dbname=depesz;port=5840", "depesz" );
my $blob = "\x84";
my $encoded = encode_base64( $blob );
$dbh->do("INSERT INTO q (x) VALUES (decode(?, 'base64'))", undef, $encoded );

q table is:

      Table "public.q"
 Column | Type  | Modifiers
--------+-------+-----------
 x      | bytea |

Data (after insert) looks like this:

# select x, octet_length(x) from q;
  x   | octet_length
------+--------------
 \x84 |            1
(1 row)