Queries with outer join would be executed by scanning the driving table first, which is ought to be the bigger table which is outer joined with the other table. This approach compromises with the performance as Larger table has to scanned and hashed first and then joined to the other tables, contrary to the very nature of Hash joins where smaller tables are hashed and stored in memory and then joined over to the bigger table.
In 10G, there is a new concept HASH JOIN (RIGHT OUTER) --could be seen in execution plan also--which scans the smaller tables first and then join them over to the bigger table. This surely is a welcome change as this plan is quite superior to the earlier execution plans in terms of performance.
But, beware, all your tables involved in joins should have statistics computed for 10g to pick up the new execution plan.
Wednesday, January 23, 2008
Monday, January 21, 2008
Informatica : Lookups Versus Outer Joins
A plain lookup over a dimension (fetching ROW_WID) can be replaced by an outer join over the dimension in the parent sql itself.
I have created a prototype to demonstrate this.
SILOS.TEST_BULK_SIL mapping is created as a copy of SILOS.SIL_PayrollFact (loads W_PAYROLL_F from W_PAYROLL_FS).
Original mapping had a mapplet mplt_SIL_PayrollFact. This mapplet has 12 lookups over various dimensions. It takes input (datasource_num_id, integration_id etc) from the parent sql, looks up the ROW_WID and loads into the fact table.
I removed this mapplet completely and incorporated all the 12 dimensions in the parent sql itself, Outer Joining them to W_PAYROLL_FS. All the expressions which were built in the mapplet were taken care in the main mapping itself (some of them may require more finishing).
Following are the results:
Results show that Outer join based mapping ran approx 2.7 times faster than the one based on lookups.
Again, lookups which involve some complex calculations may not be replaced by outer join.
I have created a prototype to demonstrate this.
SILOS.TEST_BULK_SIL mapping is created as a copy of SILOS.SIL_PayrollFact (loads W_PAYROLL_F from W_PAYROLL_FS).
Original mapping had a mapplet mplt_SIL_PayrollFact. This mapplet has 12 lookups over various dimensions. It takes input (datasource_num_id, integration_id etc) from the parent sql, looks up the ROW_WID and loads into the fact table.
I removed this mapplet completely and incorporated all the 12 dimensions in the parent sql itself, Outer Joining them to W_PAYROLL_FS. All the expressions which were built in the mapplet were taken care in the main mapping itself (some of them may require more finishing).
Following are the results:
Mapping | Records Loaded (million) | Time Taken (hr.) | RPS (Reader) |
SIL_PayrollFact (uses lookup) | 183.3 | 16.3 | 3132 |
TEST_BULK_SIL (uses Outer Join) | 183.3 | 6.02 | 8429 |
Results show that Outer join based mapping ran approx 2.7 times faster than the one based on lookups.
Again, lookups which involve some complex calculations may not be replaced by outer join.
Thursday, December 20, 2007
Oracle: hash_area_size and sort_area_size
higher value for hash_area_size, though is desirable, does not affect the hashing performance in a big way. Reason could be that memory is only used for creating hashes and then the records are flushed to the temp segments (disk).
Whereas in case of sorting, due to complex logic performed for sorting all the records, data moves back and forth multiple time between memory and disk, thus requiring higher sort_area_size for minimizing these iterations. Temp space requirement also falls to a good extent.
One needs to set workarea_size_policy to MANUAL and specify the desired sort and hash area sizes. if hash_area_size is not defined then it defaults to 2*sort_area_size.
Whereas in case of sorting, due to complex logic performed for sorting all the records, data moves back and forth multiple time between memory and disk, thus requiring higher sort_area_size for minimizing these iterations. Temp space requirement also falls to a good extent.
One needs to set workarea_size_policy to MANUAL and specify the desired sort and hash area sizes. if hash_area_size is not defined then it defaults to 2*sort_area_size.
Thursday, November 29, 2007
Solaris : Troubleshooting Memory & CPU Consumption
Use the following command to see the TOP 5 processes which are consuming system memory.
prstat -s size -n 5
output shows the resource statistics for each thread of a server application:
prstat -L -p 3295
Processes consuming the most CPU resource:
prstat -s cpu -a -n
prstat -s size -n 5
output shows the resource statistics for each thread of a server application:
prstat -L -p 3295
Processes consuming the most CPU resource:
prstat -s cpu -a -n
8
Wednesday, October 31, 2007
PERL : diff two files
use strict;
open TXT1, "1.txt" or die "$!";
open TXT2, "2.txt" or die "$!";
my %diff;
$diff{$_}=1 while ();
while(){
print unless $diff{$_};
}
close TXT2;
close TXT1;
Remove ^M from files transferred from Windows to Linux
Inside vi [in ESC mode] type:
:%s/^M//g (to make the ^M -> CTRL+V then CTRL+M)
:%s/^M//g (to make the ^M -> CTRL+V then CTRL+M)
Friday, October 19, 2007
Oracle ODBC : Driver's SQLAllocHandle on SQL_HANDLE_ENV failed
While configuring DSN using Oracle ODBC driver, one might encounter error "Driver's SQLAllocHandle on SQL_HANDLE_ENV failed". Cause being, Permissions of the remote desktop software did not allow the user access via the remote desktop software.
Do the following to resolve the issue:
1. Select Administration Tools , Local Security Setting and Local Policy.
2. Then select "User Rights Assignment"
3. Double click on "Create Global Objects"
4. Select Add User or Group.
5. Make sure Object Types Group Box is checked.
6. Select Locations and highlight the name of the server you are working on (Not the Domain).
7. Enter "Remote Desktop Users" or "Everyone" (without the quotes) into the Object Names Box.
8. Select OK.
9. Add the same users to the group "Power User".
10. If this does not work, after step 8 a reboot may be required.
Do the following to resolve the issue:
1. Select Administration Tools , Local Security Setting and Local Policy.
2. Then select "User Rights Assignment"
3. Double click on "Create Global Objects"
4. Select Add User or Group.
5. Make sure Object Types Group Box is checked.
6. Select Locations and highlight the name of the server you are working on (Not the Domain).
7. Enter "Remote Desktop Users" or "Everyone" (without the quotes) into the Object Names Box.
8. Select OK.
9. Add the same users to the group "Power User".
10. If this does not work, after step 8 a reboot may be required.
Subscribe to:
Posts (Atom)