Wednesday, August 20, 2008

Oracle : Resolving ORA-29275: partial multibyte character

Data might sometimes get corrupted and when accessing can result in error:

In case data volumes are huge, its difficult to pin point the row which is causing this. Though, there is strategy for doing this.

sql > set autotrace traceonly statistics
sql> execute query (select column from table)

This will show you how many rows are processed before the error occured.
e.g.

22787220 rows selected.
.
Statistics
----------------------------------------------------------
6295 recursive calls
0 db block gets
65041120 consistent gets
10050575 physical reads
0 redo size
4195673524 bytes sent via SQL*Net to client
10634511 bytes received via SQL*Net from client
1519151 SQL*Net roundtrips to/from client
108956 sorts (memory)
0 sorts (disk)
22787220 rows processed

So, you can create an duplicate table and delete these many rows ,i.e., 22787220 records using rownum.

then you again run the query and check how many further records are selected before corrupt data is encountered. This way you can single out the data which is causing issue. Generally this kind of data corruption occurs as a result of DB upgrades.

How to fix:

A simple straight way is to delete the record. but that means loosing information.

Following could be done to resolve this:

update table set column=column||'';

i.e., append empty string to the column and that is it...data corruption is fixed without any loss of it.

similar exercise can be done for number type columns. they generally result in end-of-file on communication channel.

you can check the corruption of it by dumping its canonical form.

sql >select column, dump(column) from table.
---- --------------------------------------------------
1 Typ=2 Len=21: 193,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1

actual dump should have been "193,2", no trailing "1"s.

these records also can be updated to get them return to normal canonical form.

update table set column=column + 1 - 1;

Friday, August 1, 2008

Oracle : storing strings in multibyte character format

Creating tables using varchar2 datatype by default allocates specified value in bytes. That means if one is storing multi-byte character then you may not be able to store all the characters as some character might be multi-byte i.e taking 2 bytes per character.

We can store multibyte charater strings with specified number of characters using CHAR along with the varchar2 datatype. e.g
Create table lingual ( text varchar2(2o) CHAR);

this way you can store up to 40 bytes or 20 multibyte characters in column text.

To find out how many characters a column of a row has, use length() function. This will return the number of character, multibyte or single byte.

Use lengthb() to determine the number of bytes the column value contains.

Wednesday, July 9, 2008

Oracle : Finding Non-numeric column record in a table

Find ALL rows where you have non-numeric characters only:

select ROW_SEQ#, STRING#
from TMP_DATA
WHERE length(STRING#)
- Length( TRANSLATE(STRING#, CHR(1)||TRANSLATE(String#,
CHR(1)||'1234567890', CHR(1) ), CHR(1) ) ) > 0
;

Find ALL rows where you have numeric characters only:

Find ALL rows where you have numeric characters only:

select ROW_SEQ#, STRING#
from TMP_DATA
WHERE length(STRING#)
- Length( TRANSLATE(STRING#, CHR(1)||TRANSLATE(String#,
CHR(1)||'1234567890', CHR(1) ), CHR(1) ) ) = 0
;

Friday, April 4, 2008

Oracle: Deleting duplicates from a huge table

delete from t
where rowid in
(select rid
from (select rowid rid,
row_number() over (partition by KEY_FIELDS
order by rowid) rn
from t )
where rn <> 1 );

reference : http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1224636375004

Tuesday, March 25, 2008

Oracle : ORA-4030 in AIX Environment

If a session tries to allocate pga memory more than the upper limit specified at OS level, it will error out with:
ORA-04030: out of process memory when trying to allocate 249880 bytes (QERHJ hash-joi,kllcqas:kllsltba)

checking the ulimit settings on AIX:
> ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 256000
stack(kbytes) unlimited
memory(kbytes) 256000
coredump(blocks) 2097151
nofiles(descriptors) 2000

You can see that the maximum memory which can be allocated in chunk is 250M.

This needs to be set to unlimited.

edit /etc/security/limits file and set it as follows:

oracle:
fsize = -1
core = 2097151
cpu = -1
data = -1
rss = -1
stack = -1
stack_hard = -1
nofiles = 2000

here oracle is an OS user using which oracle database is installed. relogin to OS with this user and issue shall be fixed.

Wednesday, January 23, 2008

Oracle: HASH JOIN RIGHT OUTER

Queries with outer join would be executed by scanning the driving table first, which is ought to be the bigger table which is outer joined with the other table. This approach compromises with the performance as Larger table has to scanned and hashed first and then joined to the other tables, contrary to the very nature of Hash joins where smaller tables are hashed and stored in memory and then joined over to the bigger table.

In 10G, there is a new concept HASH JOIN (RIGHT OUTER) --could be seen in execution plan also--which scans the smaller tables first and then join them over to the bigger table. This surely is a welcome change as this plan is quite superior to the earlier execution plans in terms of performance.

But, beware, all your tables involved in joins should have statistics computed for 10g to pick up the new execution plan.

Monday, January 21, 2008

Informatica : Lookups Versus Outer Joins

A plain lookup over a dimension (fetching ROW_WID) can be replaced by an outer join over the dimension in the parent sql itself.

I have created a prototype to demonstrate this.

SILOS.TEST_BULK_SIL mapping is created as a copy of SILOS.SIL_PayrollFact (loads W_PAYROLL_F from W_PAYROLL_FS).

Original mapping had a mapplet mplt_SIL_PayrollFact. This mapplet has 12 lookups over various dimensions. It takes input (datasource_num_id, integration_id etc) from the parent sql, looks up the ROW_WID and loads into the fact table.

I removed this mapplet completely and incorporated all the 12 dimensions in the parent sql itself, Outer Joining them to W_PAYROLL_FS. All the expressions which were built in the mapplet were taken care in the main mapping itself (some of them may require more finishing).

Following are the results:

Mapping Records Loaded (million) Time Taken (hr.) RPS (Reader)
SIL_PayrollFact
(uses lookup)
183.3 16.3 3132
TEST_BULK_SIL
(uses Outer Join)
183.3 6.02 8429

Results show that Outer join based mapping ran approx 2.7 times faster than the one based on lookups.

Again, lookups which involve some complex calculations may not be replaced by outer join.