Thursday, August 09, 2012

How to Fix Data Skew

What Data Skew can cause?

  • Why I cannot insert 50 GB evenly distributed data in a database having 500 GB unused space?
  • Why we are not able to move only 10% of  the unused space from a database?
  • Why most of my requests are Spooling out?
  • Why most of the queries running on this database usually have high CPU SKEW?

Because

  • Most of the database space is being wasted
  • Almost all allocated space for a database on at least one of the AMPs has been occupied
  • The chunk of upcoming data that needs to go on the highly congested AMP, after distribution, do not finds a space for accommodation
  • If space needs to be moved, the chunk of space that is supposed to be fetched from the congested AMP would not be allowed. As, equal space will be fetched from each of the AMPs and the activity will fail if one of the AMPs does not have FREE SPACE=“Total Space to be moved/# of AMPs”  
  • During processing of any request on the database, there would be high chances of spooling out on the congested AMP as “MaxPerm – CurrPerm” will be the available Spool which will be less due to high Skew on the AMP
  • In some cases, the CPU Skew of the sql request on the skewed tables of the database will be more due to uneven distribution of data and different CPU required on different AMPs




No comments: