Ruby 1.9 Encodings

By Mobomo January 03, 2011

When i came to Ruby 1.9, the first problem i met is the encodings. Gregory Brown said, in a training session at the Lone Start Rubyconf, “Ruby 1.8 works in bytes. Ruby 1.9 works in characters.” In Ruby 1.8, you have to deal with those bytes and it does not provide any functions with encodings. But in Ruby 1.9, i think you must know about the encoding stuff to make you life easier. Let us talk about the fouce Encodings in Ruby 1.9 by examples.

The Source File Encoding

The source file encoding is the character encoding of a given source file. It is US-ASCII by default. When you create a String literal in your code, it is assigned the Encoding of your source. So you have to changing the source Encoding when you want to place any non-ASCII content in a String literal.

  
  $cat no_encoding.rb
  p "Ã¤Â¸ÂÃ¦â€“â€¡".encoding
  $ruby no_encoding.rb
  no_encoding.rb:1: invalid multibyte char (US-ASCII)
  

  $cat encoding.rb
  #!ruby19
  # encoding: utf-8
  p "Ã¤Â¸ÂÃ¦â€“â€¡".encoding
  $ruby encoding.rb
  #

As you can see in the no_encoding.rb, the error came out as “invalid multibyte char (US-ASCII)” when there is an chinese string in the source file. That is because when nothing of encoding is specified, Ruby will default to ASCII. But after the encoding is specified by adding the encoding comment, it works.

The String Encoding

Each string has its own own encoding, which you can access with String#encoding method:


   ruby-1.9.2-head>string = "Ã¤Â¸ÂÃ¦â€“â€¡"
    => "Ã¤Â¸ÂÃ¦â€“â€¡"
   ruby-1.9.2-head>string.encoding
    => #

You could transcode the string into a different encoding by using String#encode:
  
    ruby-1.9.2-head>string_in_gb2312 = string.encode("GB2312")
     => "x{D6D0}x{CEC4}"
  

But the transcoding will fail if the encoding does not support all characters in your string:
 
    ruby-1.9.2-head>string_in_ascii = string.encode("us-ascii")
    Encoding::UndefinedConversionError: U+4E2D from UTF-8 to US-ASCII
 

The External Encoding
The encoding of the data in an IO stream is known by Ruby as the object's external encoding.The default external Encoding is pulled from your environment.

   ruby-1.9.2-head>Encoding.default_external
    => #



Here is how the exernal encoding works:

   ruby-1.9.2-head>f = File.open("example.txt")
    => #
   ruby-1.9.2-head>f.external_encoding
    => #
   ruby-1.9.2-head>content = f.read
    => "Ã¨Â¿â„¢Ã¦ËœÂ¯Ã¤Â¸â‚¬Ã¤Âºâ€ºÃ§Â¤ÂºÃ¨Å’Æ’Ã¦â€“â€¡Ã¦Å“Â¬"
   ruby-1.9.2-head>content.encoding
    => #


if the file is not going to use the default extrenal encoding, you can override it:

   ruby-1.9.2-head>f = File.open("example.txt", "r:gb2312")
    => #
   ruby-1.9.2-head>f.external_encoding
    => #
   ruby-1.9.2-head>content = f.read
    => "x{E8BF}x99xE6x98x{AFE4}xB8x80x{E4BA}x9Bx{E7A4}x{BAE8}x8Cx83xE6x96x87xE6x9CxACn"
   ruby-1.9.2-head>content.encoding
    => #


The Internal Encoding
The encoding that the programmer wishes to use with the data in a stream is the internal encoding. The default internal encoding is nil unless set explicitly.

   ruby-1.9.2-head>Encoding.default_external
    => nil


We could specify our internal encoding when opening the file if the external encoding won't match the encoding we want to use internally.

   ruby-1.9.2-head>f = File.open("example.txt", "r:utf-8:gb2312")
    => #
   ruby-1.9.2-head>f.external_encoding
    => #
   ruby-1.9.2-head>content = f.read
    => "x{D5E2}x{CAC7}x{D2BB}x{D0A9}x{CABE}x{B7B6}x{CEC4}x{B1BE}n"
   ruby-1.9.2-head>content.encoding
    => #

Ruby 1.9 Encodings

The Source File Encoding

The String Encoding

The External Encoding

The Internal Encoding

ELEGANT
SOLUTIONS
START
HERE.

New project request.

NASA

USGS

USO

Pulse

NASA

NOAA Fisheries

USGS

NASA Eclipse

NASA

NOAA Fisheries

USGS

Ferc

NASA

VA

PRAC

Apogee

NASA

VA

Pulse

RGS

NASA

VA

M3

PRAC

Apogee

Pulse

RGS

USO

NASA

NOAA Fisheries

USGS

Ferc

ACR/MCR

ReCapted

ThreadRobe

Pacify

Think Big.

Large Scale Web & CMS.

Mobile & APP.

User-centered Design.

WE ARE YOUR CLOUD TEAM.

Drupal.

Emerging Tech.

Think as one.

WE ARE MOBOMO.

Our team.

Careers.

Mobomo Labs.

Mobomo University

Digital Services Playbook

Press Kit.

Awards.

Ruby 1.9 Encodings

The Source File Encoding

The String Encoding

The External Encoding

The Internal Encoding

Related articles:

ChatGPT’s Boundless Frontier: A Symphony of Sight, Sound, and Conversation

Why Is User Experience Important In The Design Process?

ELEGANTSOLUTIONSSTARTHERE.

New project request.

Large Scale
Web & CMS.

User-centered
Design.

WE ARE YOUR
CLOUD TEAM.

ELEGANT
SOLUTIONS
START
HERE.