Running the database with a multi byte database character set (fixed or varying), e.g. AL32UTF8, has some implications on default behaviour of certain parts of the database you just have to be aware of. Of what I can see those are mostly well documented in the Oracle documentation and is just another case that proves the importance of reading them ;-).
Here the relevant NLS parameter config for the case/scripts following:
SQL> select * from nls_session_parameters where parameter in ('NLS_CHARACTERSET', 'NLS_LENGTH_SEMANTICS');
PARAMETER VALUE
----------------------------------------------------------
NLS_CHARACTERSET AL32UTF8
NLS_LENGTH_SEMANTICS CHAR
Create the table and populate it:
SQL> create table t1 (id number(1), text clob);
SQL> declare
cval clob;
begin
dbms_lob.createtemporary(cval, false);
for idx in 0 .. 9 loop
dbms_lob.writeappend(cval, 4000, lpad(to_char(idx), 4000, to_char(idx)));
end loop;
insert into t1 values (1, cval);
commit;
end;
/
Illustrate the behaviour of DBMS_LOB.SUBSTR:
SQL> declare
cval clob;
begin
select text into cval from t1 where id = 1;
--
dbms_output.put_line('Original length: '||dbms_lob.getlength(cval));
dbms_output.put_line('Substr 10000 length: '||length(dbms_lob.substr(cval, 10000, 1)));
dbms_output.put_line('Substr 1000 length: '||length(dbms_lob.substr(cval, 1000, 1)));
end;
/
Original length: 40000
Substr 10000 length: 8191
Substr 1000 length: 1000
Although we know the entire string is 40’000 characters long a the sub-string function supposed to return the first 10’000 characters only returns 8191. Getting the first 1’000 characters just work s as expected. Here’s why:
AL32UTF8 character set uses variable-width multi byte encoding scheme which means depending on the character it uses one to maximum four bytes to store it. As by definition DBMS_LOB.SUBSTR returns 32767/n characters whereas n = number of bytes used to store a character for fixed-width character sets and n = max number of bytes used to store a character for variable-width character sets. This means in our case 32767 / 4 = 8191.75.
But heck, I really need the first 10’000 characters…so I make use DBMS_LOB.READ:
SQL> declare
cval clob;
vval varchar2(10000); -- session level lenght semantic is CHAR
nof_chars integer := 10000;
begin
select text into cval from t1 where id = 1;
--
dbms_lob.read(cval, nof_chars, 1, vval);
dbms_output.put_line('Read 10000 length: '||length(vval));
end;
/
Read 10000 length: 10000
This rule also applies to DBMS_LOB.COMPARE function.
Now I just wanted to mention something else that I always miss when checking DBMS_LOB function/procedure definitions in the PL/SQL Packages and Types Reference documentation. Procedures CONVERTTOBLOB, CONVERTTOCLOB and LOADCLOBFROMFILE require a character set ID but nowhere it points out how you can figure out the ID for a given character set (or vice versa). The SQL Reference documentation gives you the clue when looking for the functions NLS_CHARSET_ID and NLS_CHARSET_NAME:
SQL> select nls_charset_id('AL32UTF8') cs_id, nls_charset_name(873) cs_name from dual;
CS_ID CS_NAME
-------- --------------
873 AL32UTF8
Foot note: All tests have been run on a 10g 10.2.0.4 database.