DBMS_LOB and Character Sets

Running the database with a multi byte database character set (fixed or varying), e.g. AL32UTF8, has some implications on default behaviour of certain parts of the database you just have to be aware of. Of what I can see those are mostly well documented in the Oracle documentation and is just another case that proves the importance of reading them ;-).

Here the relevant NLS parameter config for the case/scripts following:

SQL> select * from nls_session_parameters where parameter in ('NLS_CHARACTERSET', 'NLS_LENGTH_SEMANTICS');

PARAMETER                      VALUE
----------------------------------------------------------
NLS_CHARACTERSET               AL32UTF8
NLS_LENGTH_SEMANTICS           CHAR

Create the table and populate it:

SQL> create table t1 (id number(1), text clob);

SQL> declare
  cval clob;
begin
  dbms_lob.createtemporary(cval, false);
  for idx in 0 .. 9 loop
    dbms_lob.writeappend(cval, 4000, lpad(to_char(idx), 4000, to_char(idx)));
  end loop;
  insert into t1 values (1, cval);
  commit;
end;
/

Illustrate the behaviour of DBMS_LOB.SUBSTR:

SQL> declare
  cval clob;
begin
  select text into cval from t1 where id = 1;
  --
  dbms_output.put_line('Original length: '||dbms_lob.getlength(cval));
  dbms_output.put_line('Substr 10000 length: '||length(dbms_lob.substr(cval, 10000, 1)));
  dbms_output.put_line('Substr 1000 length: '||length(dbms_lob.substr(cval, 1000, 1)));
end;
/
Original length: 40000
Substr 10000 length: 8191
Substr 1000 length: 1000

Although we know the entire string is 40’000 characters long a the sub-string function supposed to return the first 10’000 characters only returns 8191. Getting the first 1’000 characters just work s as expected. Here’s why:
AL32UTF8 character set uses variable-width multi byte encoding scheme which means depending on the character it uses one to maximum four bytes to store it. As by definition DBMS_LOB.SUBSTR returns 32767/n characters whereas n = number of bytes used to store a character for fixed-width character sets and n = max number of bytes used to store a character for variable-width character sets. This means in our case 32767 / 4 = 8191.75.

But heck, I really need the first 10’000 characters…so I make use DBMS_LOB.READ:

SQL> declare
  cval clob;
  vval varchar2(10000);  -- session level lenght semantic is CHAR
  nof_chars integer := 10000;
begin
  select text into cval from t1 where id = 1;
  --
  dbms_lob.read(cval, nof_chars, 1, vval);
  dbms_output.put_line('Read 10000 length: '||length(vval));
end;
/

Read 10000 length: 10000

This rule also applies to DBMS_LOB.COMPARE function.
Now I just wanted to mention something else that I always miss when checking DBMS_LOB function/procedure definitions in the PL/SQL Packages and Types Reference documentation. Procedures CONVERTTOBLOB, CONVERTTOCLOB and LOADCLOBFROMFILE require a character set ID but nowhere it points out how you can figure out the ID for a given character set (or vice versa). The SQL Reference documentation gives you the clue when looking for the functions NLS_CHARSET_ID and NLS_CHARSET_NAME:

SQL> select nls_charset_id('AL32UTF8') cs_id, nls_charset_name(873) cs_name from dual;

CS_ID         CS_NAME
--------      --------------
873           AL32UTF8

Foot note: All tests have been run on a 10g 10.2.0.4 database.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.