Monday, January 21, 2008

Informatica : Lookups Versus Outer Joins

A plain lookup over a dimension (fetching ROW_WID) can be replaced by an outer join over the dimension in the parent sql itself.

I have created a prototype to demonstrate this.

SILOS.TEST_BULK_SIL mapping is created as a copy of SILOS.SIL_PayrollFact (loads W_PAYROLL_F from W_PAYROLL_FS).

Original mapping had a mapplet mplt_SIL_PayrollFact. This mapplet has 12 lookups over various dimensions. It takes input (datasource_num_id, integration_id etc) from the parent sql, looks up the ROW_WID and loads into the fact table.

I removed this mapplet completely and incorporated all the 12 dimensions in the parent sql itself, Outer Joining them to W_PAYROLL_FS. All the expressions which were built in the mapplet were taken care in the main mapping itself (some of them may require more finishing).

Following are the results:

Mapping Records Loaded (million) Time Taken (hr.) RPS (Reader)
(uses lookup)
183.3 16.3 3132
(uses Outer Join)
183.3 6.02 8429

Results show that Outer join based mapping ran approx 2.7 times faster than the one based on lookups.

Again, lookups which involve some complex calculations may not be replaced by outer join.


King said...

Thank you this was very much Useful.

instead of lookups Left outer join is it the only way. or can we use other outer joins.

Nitin Aggarwal said...


we can use left/right outer join depending on how you join the parent table and dimension. I found left join more convenient to use. I do not think we can use full outer join as it will give the wrong results.
Also note that outer joins are more effective in 10G compared to 9G. This is because of the improved execution plan based on "HASH JOIN RIGHT OUTER". I have another post which explains this.

Anonymous said...


What type of join is performed in lookup transformation?


Kishore........ :-) said...

Hi Nitin,
Iam facing the same problem. unable to see russian characters after loading. Iam reading the data from flat file and trying to place in ware house tables.
The data is not in correct format.
When i seen the file in my system able to see characters perectly, but when i seen same characters in Unix server it is different. please help me to resolve..

Mesh said...

Hi Nitin,
I am running a full ETL for HR R12 execution plan. One of task SDE_ORA_PayrollFact failed because of index problem.
When i checked the underlying table w_payroll_fs, it had more than 1 lakh rows of duplicated records.
is it strange? Removing the duplicated records is a good move? or some other problem i have to look at for solving this issue?

Nitin Aggarwal said...

>>One of task SDE_ORA_PayrollFact failed because of index problem.

Are you saying that fact staging table has index defined over it. Normally we will not have indexes over FS tables.

You definitely need to look at the cause of duplicates. Removing duplicates is just a workaround and not a solution.