I'll provide some additional information and then can you please confirm this approach of increasing the level to 10? As a result (see SNIP below) I am a little wary of increasing the snapshot level at this time since the doco highlights that extra work will be carried out, I have some data now, before I take this next step I want to see whether I can make good use of what I have already. Some of the SQL I am seeing pulled out seems to be from the STATSPACK snapshot queries. I've already capture data with a vanilla STATSPACK configuration. (frequent issue - you upgrade software but forget to look at RAM, which generally needs to increase as well) You are just doing tuning here, you are not going to "find the cause induced by an upgrade" - thats just not possible in your case. So, things like "don't do stupid things like cold backups" are just as important as "hey, tons of log file syncs, lets speed up those disks". Well, nothing will from this exercise as you have no baseline from which to compare. (but is does not address the upgrade performance issue). You will not be able to say "ah-hah, here is the cause from the upgrade" cause you don't know if that specific "thing" happened during the the old version or not. Then, pick a representative 15 minute window when performance was "felt to be at its worst" and analyze that one.īut - since you don't have anything to compare it to, you are just "tuning". Take a level 10 every 15 minutes for a day. The 24 hour snapshot will then be used as a baseline for any performance changes that may get implemented as a result of my investigation.Ĭan you highlight any potential issues with this approach and anything else that I might be able to do during the investigation but may not have considered above?ġ) nope, that would be meaningless - waits and everything would be averaged out over far too long a period of time. My intention is to carry out 2), the initial investigation will focus on the top wait events and from here drill into the area of the report that might indicate what is causing these waits. So, my question is, of the 16 hours of remaining data what is the best approach to investigating high level performance issues? Should I ġ) Generate a SNAP for the whole 16 hour range and use this for all investigations?Ģ) Start with a SNAP for the 16 hour range and then drill into other sections of this range as a sanity check (binary search the range to check for unusual behaviour)?ģ) Pick the last 30 minute interval and use this for investigations? My approach (after gathering all the user info about the problem) is to discard the first 8 or so hours of data as the re-start will skew my analysis (lots of PIO due to warming up the cache and lots of CPU usage due to parsing). My plan is to gather STATSPACK snapshot data every 30 minutes for one of these re-start cycles and then investigate the performance stats. This obviously impact DB performance due to the loss of library and buffer cache contents and will be something that I will recommend changing (but is does not address the upgrade performance issue). The disks are then cloned and the clones backed up to tape. This site employs a backup strategy that requires them to shutdown the database nightly. The performance hit has not been quantified yet but is 'felt throughout the day'. The feeling is that there has been some performance degradation since the upgrade. I've been asked to look at a site who have just carried out a DB upgrade. This is just an sanity check on my appraoch.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |