Monday, April 6, 2009

INFORMATICA : Truncate Logs

There was an unusual behavior when i proceeded with repository manager to truncate the existing session logs. Option "Truncate Log" under "Edit" menu was disabled. Selecting/expanding the required folder also did not help.

Since my informatica server was on AIX with client on windows, i did the following to truncate the logs:

$ pmrep
pmrep>connect -r Powercenter -d Domain_server1 -n Administrator -x Administrator

pmrep>truncatelog -t all
truncatelog completed successfully.

Sunday, April 5, 2009

DB2 : Establish remote connection on Unix/AIX

On server hosting DB2 Database:
db2set DB2COMM=tcpip
update database manager configuration using svcename 50000
db2stop
db2start

On client server (with DB2 client installed)
db2 UNCATALOG DATABASE U001DB
db2 UNCATALOG DCS DATABASE U001DB
db2 UNCATALOG NODE SERVER1

db2 CATALOG TCPIP NODE SERVER1 REMOTE SERVER1 SERVER 50000 REMOTE_INSTANCE DB2 SYSTEM SERVER1 OSTYPE AIX
db2 CATALOG DCS DATABASE U001DB AS U001DB
db2 CATALOG DATABASE U001DB AS U001DB AT NODE SERVER1 AUTHENTICATION DCS

Where, SERVER1 is the server where db2 server resides. U001DB is the db2 database name.

Test connection:

On client server:
db2 connect to U001DB user siebel using siebel

Database Connection Information

Database server = DB2/AIX64 9.1.6
SQL authorization ID = SIEBEL
Local database alias = U001DB

Tuesday, February 24, 2009

Oracle :Outer Join pitfalls

Starting Oracle 10G, Optimizer processes outer joins in much efficient way. Execution plan now has "HASH JOIN RIGHT OUTER", which means that the table with scant records will be scanned first. Table which it is outer joined to will be scanned in the end. This is especially use full for Datawarehouse queries where Fact table is outer joined to lot of dimension tables. So, DB now does not have to temp read and write the fact table records to join them later to the dimension tables.

But, before one jumps into concluding that this new method is almost sure way to win, there are few things one should keep in mind.
1. Ensure NOPARALLEL query execution. This new execution plan for outer joins is severely affected by parallel query i.e. if dimension tables are scanned in parallel and then joined to the fact tables, query exhibits worst performance. To avoid parallel execution, check that tables or any participating indexes have degree set to "1". Alternatively, one an also disable parallel query execution at instance level by setting init parameter "max_parallel_servers" to "0". But this is no advisable as there could be other kind of queries which might want to leverage the advantages of parallel query execution.

2. Keep huge dimension tables out of the outer joins. If Dimension tables (the one which is outer joined) is huge, lets say...1 Million records (for pga 4 gig), outer join using new execution plan may not help. This is because now hash tables for all the joining dimension could not be kept in memory and DB anyways have to do multipass hash join. "HASH JOIN RIGHT OUTER" based execution plan is effective only if the outer joined tables/dimensions are smaller in size.

3. Ensure that all the participating tables has the statistics collected. Missing statistics on tables would cause optimizer to pick up execution plan as it existed before 10G and you may not see the new execution plan for outer joined query.

Thursday, January 15, 2009

Informatica 8.6 : issue with pmcmd

On AIX 5.3 pmcmd command fails with following error:

$ ./pmcmd

Could not load program ./pmcmd:
Could not load module /u01/Informatica/PowerCenter8.6/server/bin/libpmser.a.
Dependent module /lib/libz.a could not be loaded.
The module has an invalid magic number.
Could not load module pmcmd.
Dependent module /u01/Informatica/PowerCenter8.6/server/bin/libpmser.a could not be loaded.
Could not load module .

This issue has been fixed on one of the latest hotfixes.

To workaround the issue, rename /lib/libz.a so that libz.a which is shipped with Informatica is picked up.

Sunday, January 11, 2009

Oracle: Correlated sub-queries effectiveness

During ETL extraction and loading into the star-schema based warehouse, we generally confront with the idea of using outer join or correlated subqeries to add dimension table primary key into the fact table. to clarify more, take an example below:

INSERT INTO w_emp_f
SELECT a.row_wid loc_wid,
b.ename,
b.address
FROM w_emp_fs b
LEFT OUTER JOIN w_location_d a
ON a.datasource_num_id = b.datasource_num_id
AND a.integration_id = b.integration_id



In the example above, row_wid is the unique identifier of the row in Location dimension which needs to be added to Employee Fact table. In reality, there could be up to 30-40 dimensions which need to be joined to Fact table. Outer joining these dimensions is one of the way, but it could be a big overhead if the dimension tables are very large e.g. >2million records. Outer join may not perform.
This is when correlated subqueries come to rescue. We can frame the above load query as follows:
INSERT INTO w_emp_f
SELECT (SELECT row_wid
FROM w_location_d a
WHERE a.datasource_num_id - b.data_source_num_id
AND a.integration_id = b.integration_id) loc_wid,
b.ename,
b.address
FROM w_emp_fs b


Correlated query would only perform good if the dimension columns used in join condition are indexed. This way DB memory can cache the index and return the records from the dimension table fast for each of the record in fact table.

We can choose to segregate dimensions attached to the fact table based on their size. Depending on the memory configuration of the DB system we can hit&trial the size threshold. Dimensions having size lower than this threshold can be outer joined to the fact table and the ones with size higher than the threshold can be pushed to correlated subqueries.

Wednesday, August 20, 2008

Oracle : Resolving ORA-29275: partial multibyte character

Data might sometimes get corrupted and when accessing can result in error:

In case data volumes are huge, its difficult to pin point the row which is causing this. Though, there is strategy for doing this.

sql > set autotrace traceonly statistics
sql> execute query (select column from table)

This will show you how many rows are processed before the error occured.
e.g.

22787220 rows selected.
.
Statistics
----------------------------------------------------------
6295 recursive calls
0 db block gets
65041120 consistent gets
10050575 physical reads
0 redo size
4195673524 bytes sent via SQL*Net to client
10634511 bytes received via SQL*Net from client
1519151 SQL*Net roundtrips to/from client
108956 sorts (memory)
0 sorts (disk)
22787220 rows processed

So, you can create an duplicate table and delete these many rows ,i.e., 22787220 records using rownum.

then you again run the query and check how many further records are selected before corrupt data is encountered. This way you can single out the data which is causing issue. Generally this kind of data corruption occurs as a result of DB upgrades.

How to fix:

A simple straight way is to delete the record. but that means loosing information.

Following could be done to resolve this:

update table set column=column||'';

i.e., append empty string to the column and that is it...data corruption is fixed without any loss of it.

similar exercise can be done for number type columns. they generally result in end-of-file on communication channel.

you can check the corruption of it by dumping its canonical form.

sql >select column, dump(column) from table.
---- --------------------------------------------------
1 Typ=2 Len=21: 193,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1

actual dump should have been "193,2", no trailing "1"s.

these records also can be updated to get them return to normal canonical form.

update table set column=column + 1 - 1;

Friday, August 1, 2008

Oracle : storing strings in multibyte character format

Creating tables using varchar2 datatype by default allocates specified value in bytes. That means if one is storing multi-byte character then you may not be able to store all the characters as some character might be multi-byte i.e taking 2 bytes per character.

We can store multibyte charater strings with specified number of characters using CHAR along with the varchar2 datatype. e.g
Create table lingual ( text varchar2(2o) CHAR);

this way you can store up to 40 bytes or 20 multibyte characters in column text.

To find out how many characters a column of a row has, use length() function. This will return the number of character, multibyte or single byte.

Use lengthb() to determine the number of bytes the column value contains.