Oracle database can store only 255 columns in a single row-piece. Anything beyond 255 columns will be stored as a separate row, chained to the main one. This kind of chaining is different from what we know as migrated/chained rows which happen either because of lack of space in current block which causes rows to migrate when updated leaving behind their new address or due to row size more than the block size itself. Both these types of chaining are inter-block row chaining.
intra-block row chaining (as i mentioned above happens due to number of columns exceeding 255) does not cause as much a performance issue as inter-block row chaining. This is because all the row contents could still be found within the same block. When there is a full table scan, DB can easily do scattered reads over this table with number of blocks spanning up to the value for parameter db_file_multiblock_read_count. For scanning rows which are chained to another block, as is the case with migrated/chained rows, DB has to do sequential reads with one block read at a time. This summarily degrades query performance.
Unlike Inter-block row chaining, Intra-block row chaining cannot be detected using ANALYZE TABLE LIST CHAINED ROWS command. However, there is a way to see to what extent the rows are chained. Look out for statistic " table fech continued row". As you access the table with 255+ columns, this value will keep increasing.
select name, value from v$sysstat where name like 'table fetch %';
Note that if all the columns beyond 255 columns have NULL values, the row will not be chained.
Thursday, October 7, 2010
Monday, May 10, 2010
BI Apps ETL: Handling statistics gathering over tables in parallel
For one of the BI Apps incremental ETL execution, it was observed that the major time is consumed during statistics gathering over some of the warehouse tables. There were three tables which took 1 hr each to execute. When run standalone, gathering statistics over each of these tables took only 3 minutes. So, what was so wrong during the ETL ??
Upon closely observing the DAC execution plan it was found that all these three tables were analyzed in parallel. It was not a coincidence. In fact, each time these three tables would land up being analyzed in parallel. So, the question is how come !!!
There was a task group defined which had collection of six tasks which where loading into these three tables. Truncate table option was enabled over this task group. This caused all query indexes to be created over these three tables once all the tasks in this task group finished loading these three tables. While query indexes were created in parallel and one table had all indexes created while the other tables still had them being created, statistics gathering over the table did not spawn. Only after all the three tables had all the indexes created, analyze command over the three table was fired. This caused all these three tables getting created in parallel. Is executing these three tables in parallel so much resource consuming that time taken jumped 20 times than when run one at a time ??
Lets see the command being used to analyze these tables:
exec DBMS_STATS.GATHER_TABLE_STATS(ownname => USER, tabname => 'W_TAB_D', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO',cascade => false ,degree =>DBMS_STATS.AUTO_DEGREE );
When I executed all three tables in parallel, maximum time taken by any of analyze commands was 26 minutes (during etl there were some more Informatica jobs running which would have further slowed it down to take 1 hr). This is what I observed in v$sysstat view:
Before firing all three commands in parallel:
---------------------------------------------------
---------------------------------------------------
Parallel operations not downgraded 9
Parallel operations downgraded to serial 0
After firing all three commands in parallel:
----------------------------------------------------
Parallel operations not downgraded 10
Parallel operations downgraded to serial 2
Parallel operations downgraded to serial 0
After firing all three commands in parallel:
----------------------------------------------------
Parallel operations not downgraded 10
Parallel operations downgraded to serial 2
This shows that all the parallel servers were put to use by analyze over one of the table and rest of the two were then executed serially. This serial execution caused the other two analyze to take more time.
Now, the option was to either segregate the tasks with in the task group in three different task groups, or, to do something to speed up the analyze over these three tables in parallel.
First option was not so enticing given that this task group is shipped OOTB and any changes might impact the functionality. Also, all the tasks in this task groups were loading different tables in a given order, so changing the order by segregating them in different task groups was not so straight forward. So, the choice left was to speed up the statistics gathering over the table.
Since it was observed that parallel servers were hogged by analyze over one single table, It was decided to override the degree parameter in the analyze command and set it a value so that all the three execution will get enough parallel servers. We may not get the performance as that of standalone execution, but it might still be better than what was observed.
So, the revised syntax is:
exec DBMS_STATS.GATHER_TABLE_STATS(ownname => USER, tabname => 'W_TAB_D', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO',cascade => false ,degree =>4);
When executed in parallel, the maximum time taken by any of these stands at 19 minutes. v$sysstat shows no jobs being downgraded to serial.
Parallel operations not downgraded 16
Parallel operations downgraded to serial 2
Parallel operations downgraded to serial 2
This brought about performance improvement of 27%. Obviously, it is yet to be seen how it fares when we incorporate these changes in DAC for etl purpose. Will post more on what further is found.
Thursday, March 25, 2010
Linux : Scanning for an occurance of string in all the files in folder and its sub-folders
find . -exec grep "Upper" {} /dev/null \;
Thursday, January 28, 2010
Upgrading AIX system to next Technology Level
Refer to below link for upgrading technology level: http://www.ibm.com/developerworks/aix/library/au-aixservicepacks/index.html As root user: # suma -c -a HTTP_PROXY=http://[proxy]:[port]/ # suma -c -a FTP_PROXY=http://[proxy]:[port]/ #smit suma -> select option "Download Updates Now" -> select "Download Maintenance Level or Technology Level -> Provide TL as 5300-07 for e.g. Once all files are downloaded. Start installation. # inutoc # smit update_all -> provide the directory for software as "/usr/sys/inst.images" -> toggle the "Accept new license agreement" to Yes. -> [Enter]
-> [Enter]. -> Upgrade starts...
Thursday, January 7, 2010
DAC: no db2jdbc in java.library.path
This error is generated when you try to establish connection to DB2 using DAC even if you have db2java.zip present in lib directory.
To resolve this issue, edit\dac_env.bat and add \SQLLIB\bin to path variable.
To resolve this issue, edit
Subscribe to:
Posts (Atom)