The other day I tracked down a nasty problem one application was having. It was first noticed in the alert.log:
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT ... PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT of 7000 MB Immediate Kill Session#: 686, Serial#: 26646 Immediate Kill Session: sess: 0x19e7c2a78 OS pid: 35173
Checking the PGA memory from gv$process revealed a shocking picture:
SQL> select prc.inst_id, ses.sid, ses.serial#, prc.pid, prc.spid, round(prc.pga_used_mem/1024/1024) used_mb, round(prc.pga_alloc_mem/1024/1024) alloc_mb, round((prc.pga_alloc_mem - prc.pga_used_mem)/1024/1024, 0) unused_mb, round(prc.pga_freeable_mem/1024/1024) freeable_mb, round(prc.pga_max_mem/1024/1024) max_mb from gv$process prc left outer join gv$session ses on (ses.inst_id = prc.inst_id and ses.paddr = prc.addr and ses.type = 'USER') order by used_mb desc fetch first 10 rows only; INST_ID SID SERIAL# PID SPID USED_MB ALLOC_MB UNUSED_MB FREEABLE_MB MAX_MB ---------- ---------- ---------- ---------- ------------------------ ---------- ---------- ---------- --------------------------------------- ---------- 1 655 50496 89 338934 227 284 57 0 499 1 675 45630 137 338938 223 280 57 0 296 1 416 7124 285 315339 222 281 58 1 298 1 290 45321 296 81710 184 233 49 0 280 2 296 52565 98 223329 163 207 45 0 270 2 710 8463 293 386687 162 207 45 0 270 1 67 15923 54 98311 147 187 40 0 264 2 295 25163 242 316189 143 185 42 0 266 1 32 55120 270 358206 137 178 41 0 260 2 309 13006 230 167307 135 175 39 0 258 10 rows selected
A lot of sessions used between 100MB and 230MB. Monitoring these sessions over some time showed that the memory was never released, something’s definitely wrong. PGA dumps of a couple of connections all showed the same pattern.
oradebug setmypid oradebug dump pga_detail_get <pid>
SQL> select category, name, heap_name, round(bytes/1024/1024, 0) size_mb, allocation_count from gv$process_memory_detail where pid = 89 order by pid, bytes desc fetch first 10 rows only; CATEGORY NAME HEAP_NAME SIZE_MB ALLOCATION_COUNT --------------- -------------------------- --------------- ---------- --------------------------------------- Other permanent memory kolarsCreateCt 214 10541 Other free memory top uga heap 57 1571 Other free memory kolarsCreateCt 3 4907 Other free memory session heap 2 1016 Other kollalos2 koh-kghu sessi 1 86 Other permanent memory pga heap 1 43 Other kxsFrame4kPage session heap 0 123 Other qmushtCreate qmtmInit 0 6 PL/SQL static frame of inst koh-kghu sessi 0 36 Other inode->map_kdlimem buckets_kdliug 0 9
The closest I come to kolarsCreateCt in the oradebug doc is this:
oradebug doc component ... Components in library GENERIC: -------------------------- ... LOB LOB (koll, kola) ...
This is the clue that there is apparently a LOB resource management problem. Luckily, the PL/SQL code base was not too big and only a couple of places where LOB handling was done. But, all LOBs seemed to be handled correctly.
One thing, though, I first didn’t pay much attention to was a few “XMLTYPE.getClobVal()” in SELECT statements. I’ve used that myself in the past a couple of times and knew this returned a persistent LOB handle and should not leak.
I took one of these SQL statements and ran them in my favorite tool, SQL Developer, and BOOOOOM! My session just consumed a couple of hundreds of megabytes of PGA memory and wouldn’t release it until I closed the connection.
So I crafted a little test case to check what’s going on.
SQL> create table xmltest(id number, data xmltype, primary key (id)) SQL> xmltype column data store as binary xml; SQL> insert into xmltest (id, data) values (1, '<someelement></someelement>'); SQL> commit; SQL> select t.id, dbms_lob.istemporary(t.data.getclobval()) is_temp from xmltest t; ID IS_TEMP ---------- ---------- 1 1
I don’t think this is what I knew from past experience so I did a quick MOS search and found following bug report:
Bug 23061709 : HIGH KOLARSCREATECT AND ORA-04036 WHEN XMLTYPE COLUMN IS STORED AS BINARY
OK, lets change the XMLType storage to CLOB and see what happens:
SQL> create table xmltest(id number, data xmltype, primary key (id)) SQL> xmltype column data store as securefile clob; SQL> insert into xmltest (id, data) values (1, '<someelement></someelement>'); SQL> commit; SQL> select t.id, dbms_lob.istemporary(t.data.getclobval()) is_temp from xmltest t; ID IS_TEMP ---------- ---------- 1 0
To be fair, the documentation for 11.2, 12.1, and 12.2 all state the following:
“This member function returns a CLOB containing the serialized XML representation. If the CLOB returned is temporary, it must be freed after use.”
Because I always run my tests in SQL*Plus additionally to SQL Developer I was puzzled to see that there’s no memory leak. Why does it not happen in SQL*Plus? I thought that maybe SQL*Plus is clever enough to always check for temporary LOBs and free them.
Firing up gdb with this script proved me right.
break OCILobFreeTemporary commands where continue end
This SQL fetches five rows with CLOB data.
SQL> select t.id, t.data.getclobval() from xmltest t where rownum <= 5;
Output from gdb:
Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1 #0 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1 #1 0x00007f678eafb1ef in afioci () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #2 0x00007f678eaed912 in afifpilobfreetemp () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #3 0x00007f678eaed799 in afifpifretmp () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #4 0x00007f678eae8f20 in afidet () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #5 0x00007f678eb272f3 in safiqry () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #6 0x00007f678eb096e3 in afiqry () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #7 0x00007f678eb21bf2 in afixeqqry () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #8 0x00007f678eb21521 in afixeqsql () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #9 0x00007f678eb20ed5 in afixeqr () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #10 0x00007f678eabfd74 in aficfd () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #11 0x00007f678eabec49 in aficdr () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #12 0x00007f678eaeacf2 in afidrv () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libsqlplus.so #13 0x00000000004007a1 in main () Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1 ... #13 0x00000000004007a1 in main () Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1 ... #13 0x00000000004007a1 in main () Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1 ... #13 0x00000000004007a1 in main () Breakpoint 1, 0x00007f678cc3d250 in OCILobFreeTemporary () from /u01/app/oracle/product/12.1.0.2.161018/RAC/lib/libclntsh.so.12.1 ... #13 0x00000000004007a1 in main ()
Five rows fetched and gdb shows five calls to OCILobFreeTemporary. The call to OCILobFreeTemporary originates from libsqlplus.so which tells me it’s not the OCI layer itself handling it, it’s the logic in SQL*Plus.
Next, I wanted to see myself what happens in a Java/JDBC application. Here’s the relevant snippet from my test case:
String sql = "select t.id, t.data.getclobval() from xmltest t where rownum <= 5"; stmt = conn.prepareStatement(sql); for (int i=0; i<3; i++) { stmt = conn.prepareStatement(sql); rset = stmt.executeQuery(); while (rset.next ()) { System.out.println (rset.getInt(1)); xmlclob = rset.getClob(2); xmlclob.free(); } rset.close(); c.format("\nPress ENTER 1 to proceed.\n"); c.readLine(); }
Like SQL*Plus being an application handling the LOB resources, you have to do it programmatically in Java as well. Line 11 frees the temporary LOB.
Without the free() call on the Clob you will see the memory leak when there XMLType storage clause is XML BINARY. It happens, even when do don’t reference the LOB column in Java:
while (rset.next ()) { System.out.println (rset.getInt(1)); }
Lesson learnt: never trust what you think you know! Sometimes things change in subtle ways.