because of on bug in Oracle 188.8.131.52 we had to disable Veritas ODM, and since then we suffer from poor IO performance.
We switched to Quick IO, which did not remedy the problem.
No I was asking myself how and where to tune the system for optimal performance (we cannot use ODM in the near future).
First, I think we should stick with Quick IO - the penalty compared to ODM should be in a range from 1-3% from the documentation.
Now, our mount options are these:
I was asking my self if we should add: mincache=direct/convosync=direct/nodatainlog, as pointed in many documents on the web.
The tunvxfs values for our volumes are all these:
root@NB25010 # vxtunefs /dev/vx/dsk/krwdata11dg/krwdata11vol
Filesystem i/o parameters for /oracle/KRW/sapdata11
read_pref_io = 65536
read_nstream = 1
read_unit_io = 65536
write_pref_io = 65536
write_nstream = 1
write_unit_io = 65536
pref_strength = 10
buf_breakup_size = 8388608
discovered_direct_iosz = 262144
max_direct_iosz = 1048576
default_indir_size = 8192
qio_cache_enable = 0
odm_cache_enable = 0
write_throttle = 0
max_diskq = 1048576
initial_extent_size = 8
max_seqio_extent_size = 2048
max_buf_data_size = 8192
hsm_write_prealloc = 0
read_ahead = 1
inode_aging_size = 0
inode_aging_count = 0
fcl_maxalloc = 39031525376
fcl_keeptime = 0
fcl_winterval = 3600
fcl_ointerval = 600
oltp_load = 0
So I was asking my self:
read_unit_io = 65536 => should be 4096, because NetApp stripe set in WAFL is 4K
write_unit_io = 65536 => should be 4096, because NetApp stripe set in WAFL is 4K
read_nstream = 1 => should by 14, because 14 data disks in one RAID-DP set
write_nstream = 1 => should by 14, because 14 data disks in one RAID-DP set
max_direct_iosz = 1048576 => should be 4194304, because maxphys in /etc/system
/etc/system is this:
root@NB25010 # fstyp -v /dev/vx/dsk/krwdata11dg/krwdata11vol
magic a501fcf5 version 7 ctime Tue Sep 18 15:18:35 2012
logstart 0 logend 0
bsize 1024 size 1257851904 dsize 0 ninode 1257851904 nau 0
defiextsize 0 ilbsize 0 immedlen 96 ndaddr 10
aufirst 0 emap 0 imap 0 iextop 0 istart 0
bstart 0 femap 0 fimap 0 fiextop 0 fistart 0 fbstart 0
nindir 2048 aulen 32768 auimlen 0 auemlen 8
auilen 0 aupad 0 aublocks 32768 maxtier 15
inopb 4 inopau 0 ndiripau 0 iaddrlen 8 bshift 10
inoshift 2 bmask fffffc00 boffmask 3ff checksum f568c441
oltext1 32 oltext2 1282 oltsize 1 checksum2 0
free 258380170 ifree 0
efree 0 1 0 35 33 35 34 33 33 32 34 35 33 33 7 3 7 8 5 12 3 9 2 6 2 0 0 1 0 0 0 0
I guess bsize should by 8192, because of Oracle blocksize = 8192; for the redo log I think we should stick with this.
And what about set vxfs:vx_vmodsort=1 ?
Any hint is highly appreciated!
If you are not using ODM , we would suggest to use mincache=direct (ie. direct mounting - no cache).
Could also mount with noatime to speed it up.
Oracle does cache and does this in the SGA.
The problem is that Oracle cache and any filesystem cache is not the same at all.
Filesystems have different blocks sizes and also have to update metadata (inode information and allocation information specific to the filesystem).
If you mount the Oracle database files with cache, you get a "clash" of the 2 caches , and will actually get a slow down.
The mincahce=direct mount option will mount the VxFS filesystem without filesystem cache. The problem with this is that all other files (on the same mount point and filesystem) will need some cache as well to perform better (Oracle does cache for database tables).
The other option to mount with, is noatime . This will prevent VxFS from modifying the access time of the DBF (I assume dbf) files. This will also save a little time as the access time does not need to change every time a file is being accessed. (would not do the same with the modification time -- mtime ; for backup purposes).
Now, modsort .....
Any filesystem caches metadata (regardless). So, when the filesystem reads an inode from disk, it will keep this in memory (to access later - disk being 100 times slower than memory).
Big problem is, how many are you going to keep in memory ?
And, when you release some of it, which inodes (or the memory associated with the inode) will you release?
The solution is obviously to find the inodes (in memory) that has been least accessed (or last accessed).
This means that you will have to go through the inodes that you have in memory, sort them by last date/time accessed, and the release the "oldest" ones.
With any sorting (does not matter if you use bubble sort or mod sort or ......) this does take time.
Solaris has actually put all of this sorting in the kernel (so you do not have to do it). Well, at least certain versions...
All you have to do, is tell the filesystem to use this.
This is why you can enable vmodsort for vxfs. Just make sure that you have the correct versions for this (Solaris and VxFS)
Last thing ...
The qio mount option does not actually enable quick IO. All it does is check if you have a license to use it.
To use quickio, you will actually have to make quickio files (please see the VxFS / SFRAC documentation on how to do this or some forums with a quick how to .. like this one : http://www.dba-resources.com/oracle/using-quick-io-files-with-oracle/)
Obviously, if you can upgrade (to get around the known issue), ODM is a lot more efficient.
Just a bit of background on Quick IO and ODM
A long time ago, Veritas could look at IO to raw volumes and to filesystems to see what the IO looks like. (size , order, ....)
Veritas then took what they learned, and did quickio to allow VxFS to do IO in a way that is very close to raw performance.
ODM is just a formal specification (from Oracle) for this. So, now any filesystem vendor (zfs or ufs or ...) can write odm routines (that Oracle will call to do IO) and put this into a library and link into Oracle.
Veritas ODM is just this. A library of functions that does IO in a way Oracle expects it to do (not going into too much of the specifications now, but there are a lot and public).
If you do not link the Veritas ODM, Oracle uses the "old" way of doing IO (it calls ODM function calls, but these map back to standard IO routines and system calls)
Sorry, very long answer, but hope that helps