Home » RDBMS Server » Performance Tuning » when to use_merge or use_hash for large data sets
when to use_merge or use_hash for large data sets [message #224429] Wed, 14 March 2007 04:36 Go to next message
renjunatarajan
Messages: 5
Registered: January 2007
Junior Member
Hi,
I have a huge table(say table_source) with 100s of millions of rows, from which I have to migrate to a new table (say table_destination)based on some computations. That is, it will pick up some rows from table_source, perform some grouping etc(no arithmetic computations like sum - only min & max) and insert rows into table_destination.

Right now the migration strategy is to migrate chunks of data from table_source based on the created date. That is, it will iteratively copy 1 year's data to table_destination. A year's iteration means it will have to process as much as 14-17 million rows from table_source. The period 1 year is configurable by the migration task.

When so much data is involved, what is the best index to use? is it use_hash or use_merge? Will use_hash eat up all the db server's memory?

Currently the index is use_nl which is not giving good performance for any iteration period - 1 year/6month/3month/1month.

Thanks,
Renju
Re: when to use_merge or use_hash for large data sets [message #224461 is a reply to message #224429] Wed, 14 March 2007 06:41 Go to previous messageGo to next message
gintsp
Messages: 118
Registered: February 2007
Senior Member
Mhm it is something unclear in some of your statements.

Use_hash and use_merge are hints to force hash join and merge join accordingly. They have generally nothing to do with indexes. Of course there is third join mechanism nested loops join that usually use indexes.

Hash join and merge join are generally suitable for large data sets, nested loops for smaller data sets.
Both hash join and merge join performance are quite dependant on available memory, (hash join even more) and therefore you have to give it to them as much as possible.

Although from your question it is not absolutely clear whether the problem is in joins or the problem is in table scan or whatever else.

You can probably check two of my articles which can be found at http://www.gplivna.eu/papers_e.htm
First is Data migration from old to new application: an experience that more or less directly is relevant to your subject, and second is Overview of long running operations in Oracle (entries in v$session_longops) which shows how hash joins are dependant on available memory.
You can also check Further reading chapter of the latter, it has links to papers that explains hash join mechanism in great details.

But as always you firstly has to understand where is the problem - in scanning the tables, in joining the data sets or whatever else.

Gints Plivna
http://www.gplivna.eu
Re: when to use_merge or use_hash for large data sets [message #224734 is a reply to message #224461] Thu, 15 March 2007 04:57 Go to previous message
renjunatarajan
Messages: 5
Registered: January 2007
Junior Member
Yes, I meant hints. Sorry.

Thanks for the articles.

For this problem, even a day of data could be as high as 1,00,000 rows. They say for small data sets, nested loops is best. But how can we quantify 'small'? Guess it would depend on the type and configuration of the server. This is a 9i server.

Previous Topic: all about db_block_size
Next Topic: Benchmarking UPDATE Statements
Goto Forum:
  


Current Time: Thu May 16 18:17:03 CDT 2024