Ruby-on-rails – Encoding::UndefinedConversionError: “\xE4” from ASCII-8BIT to UTF-8

encodingrubyruby-on-rails

I tried to fetch this CSV-File with Net::HTTP.

File.open(file, "w:UTF-8") do |f|
  content = Net::HTTP.get_response(URI.parse(url)).body
  f.write(content)
end

After reading my local csv file again, i got some weird output.

Nationalit\xE4t;Alter 0-5

I tried to encode it to UTF-8, but got the error Encoding::UndefinedConversionError: "\xE4" from ASCII-8BIT to UTF-8

The rchardet gem tolds me the content is ISO-8859-2. But convert to UTF-8 will not work.

After open it in a normal Texteditor, i see it normal encoded.

Best Solution

You can go with force_encoding:

require 'net/http'

url = "http://data.linz.gv.at/katalog/population/abstammung/2012/auslg_2012.csv"
File.open('output', "w:UTF-8") do |f|
  content = Net::HTTP.get_response(URI.parse(url)).body
  f.write(content.force_encoding("UTF-8"))
end

But this will make you lose some acentuation in your .cvs file

If you are deadly sure that you always will use this URL as input, and the file will always keep this encoding, you can do

# encoding: utf-8
require 'net/http'

url = "http://data.linz.gv.at/katalog/population/abstammung/2012/auslg_2012.csv"
File.open('output', "w:UTF-8") do |f|
  content = Net::HTTP.get_response(URI.parse(url)).body
  f.write(content.encode("UTF-8", "ISO-8859-15"))
end

But this will only work to this file.

Related Question