This discussion string is for any one and everyone who uses Operations Manger OM \ (DFM - Data Fabric Manager) to manage their NetApp storage environment.
Please list out what you would like to see added to the product:
- What is missing?
- What is needed?
- What should it be doing that it isn't?
- What is it doing wrong?
- What enhancements should be added?
Thanks,
Bryan Bell
All,
The idea of these suggestions is to get utilization numbers based both by node and by total install base on to date calendar year Jan. 1st to Dec. 31st. So there are really two requests per item below here.
Suggest the following:
1. Total Capacity installed RAW
2. Total Capacity in RAID & WAFL Overheads
3. Total Capacity installed useable
4. Total Capacity Currently used
5. Total Capacity Currently available
6. Total Capacity Currently Reserved – All SnapShots and other reserves in Aggregates and Volumes
7. Total current storage hosted, and available to end users
8. Total current storage utilized by end users
9. Total nodes requiring net-new storage based off of anything beyond 80-90% or tunable by customer
10. Current Spare Drive counts
11. Current Failed Drives
12. Current NFS Mounts
13. Current CIFS Shares
14. Current iSCSI Volumes\LUNs
15. Current FCP Volumes\LUNs
16. Average Top, heavy hitter hosts\users
17. Average Top, heavy hitters by volume and protocol
18. Average Top heavy it “Hot Disks” by disks and Nodes…
19. Current users in system by protocol: NFS\CIFS\iSCSI\FCP\FTP\HTTP
20. *Node Uptime - Based on Calendar year Jan.1st to Dec. 31st
21. *Cluster Uptime - Based on Calendar year Jan.1st to Dec. 31st
22. *Reboots to date per Node (User init. Maint)
23. *Reboots to date per Cluster (User init. Maint)
24. *Major Hardware failures to date – Drives, Shelves, nodes, ESH, Software based halts etc…
25. Total Failed Drives to date
26. Total replaced drives to date
27. Total growth trend by node and by over all install base
28. Total usage trend showing max throughput in Kb\Gb and MB and GB usage (peaks and valleys for networking)
29. Total sustained write and reads for non and sequential I\O
30. Total IOPs sustained by both install base and Node count, measured by per second graphed out by month over a year…
31. A built-in calculator that would allow the customer to input their P.O. pricing of the frame and output the cost per Megabyte/Gigabyte, with or without support…
a. Cost of the Frame by node from P.O.
b. Cost of Support from P.O. Line item
c. Cost of installation if applicable from Line item on P.O.
*I would like to see uptime of course!
But more importantly several of these items could potentially be enough to drive a good utilization number for our clients and customers. If you would like to discuss these utilization suggestions further, I would be happy to get on a call to go over them. Just send out an invite…
Bryan Bell
Central facility for managing all SnapManagers, with SNMP enabled for all SnapManager events. Consistent log, email and SNMP formats for all events produced by any SnapManager. A central catalog of all application-aware snapshots and snapmirrors, including coordination with job scheduling so you can see lag times and missed snaps or replication. I also agree with Dan' s observation that Protection Manager's human interface is awkward and obscure, needs a complete redesign. NetApp data protection is serving essentially the same set of administrators as NetBackup (except with a D2D focus) but lacks tools that support the dedicated backup teams found in every medium-to-large data center.
Hi Robert,
What exactly do you feel is awkward and obscure about Protection Manager's GUI?
Please elaborate.
Is it the concepts of Data Sets, Policies and Resource Pools or something more fundamental in the user interface design.
As the designer, I'd like to get more consrtuctivre feedback than blanket stements like "this sucks and needs a redesign".
Finally, you a wrong about hte target user for Protection Manager.
It was never designed for a dedicated backup administrator that uses NetBackup. It was designed for a storage administrator using disk to disk backup on Netapp equiment only.
Reporting can better: we do this now in SQL report server
I want to keep data for more than 1 year (minimum 3 year)
I want to have a separate (on a other machine) DB of my choose: Oracle, MySQL or MS SQL-server
I want to change the severity of some events: the errors and critical's are forwarded to our incident system. The problem is that some of those events are less / not important in our situation. It's no a lot of work to eliminate those alerts.
Reinoud
On the last point:
I want to change the severity of some events: the errors and critical's are forwarded to our incident system. The problem is that some of those events are less /not important in our situation. It's no a lot of work to eliminate those alerts.
You can do this using the "dfm eventtype" command. For example, I have changed the severity of an event from error to normal below:
trinity:/root# dfm eventtype list | grep Error
...
user-files-quota-full Error userquota.files
volume-full Error df.kbytes
volume-quota-overcommitted Error volume.overcommit
volume-space-reserve-depleted Error volume.reserve-depleted
trinity:/root# dfm eventtype modify -v normal volume-space-reserve-depleted
Modified event "volume-space-reserve-depleted".
After the change:
trinity:/root# dfm eventtype list | grep volume-space-reserve-depleted
volume-space-reserve-depleted Normal volume.reserve-depleted
Great suggestion. I will try it immediatly and remove my email filter ![]()
Reinoud
I was forgotten an other important one:
support for GX
And this is no problem when this go in several steps. It can't be very hard to implement aggregate and volume monitoring. All the other things can come in a second or a third step.
I want a Snapmanager Manager GUI. The current NOM basically just lists the jobs...
This may be too much of to handle via HTML... But I still would like this feature somplace (Netapp Mgmt Console maybe)
I would like to see pretty gui to help with the management of scheduling snapmirror jobs across the enterprise... When you have 100's of snapmirror jobs across many filers, jobs with multiple destinations, and jobs with multiple hops, keeping rack of all the data, the traffic, schedules, etc becomes is a chore...
I'de like to see a simple "Explorer" syle gui. Left side shows all the source filers volumes in a explorer style tree view, and the right side shows all the destination filers and target volumes.
This will allow for many snapmirror operation to be "drag and drop".... Operations like the following would then be possible
1. Drag a volume from the source filer to destination filer(s) ( this would create the dst. volume, setup the snapmirror, setup the schedule,setup the destination exports and shares, etc...)
2. Right click on one or more destination volumes to define attributes like schedules, traffic rates, etc...
3. Right click on the dst volume to get reports/graphs like snapmirror history, traffic history, etc.....
4. etc, etc ,etc Lots of posibilities...
Also in this window you could show current snapmirror jobs with a line going from the src to dst filer/volume, traffic rates, total traffic, etc...
I'de also like a way to see a timeline view that shows all the snapmirror jobs on a timeline which would better help in scheduling the jobs. A way to drag jobs across the timeline and stagger snapmirror syncs. Also a way to see the bandwidth used across one or more filers is needed to manage bandwidth usage
BTW: I've seen protection manager.... no thanks... nice attempt, but wrong direction... The Idea is to make the job easer ![]()
dp
THINK one GUI not 10...
I think the one single large impacting item we could ask for is bringing back all the applications to one GUI, not 2 or 4 +...
I would like to see DFM\OM host it all via http or convert all the protection products to web and add DFM to them if it is easier. The whole point is to have one single site you can go to, to manage your entire environment perferably over the web.
Bryan Bell
Just so everyone is aware, the DFM\OM Product Manager is watching this Forum String, so put your ideas in!!!
Thanks,
Bryan Bell
Hi Bryan,
Last week, we had a user group meeting with OM as one of the topics. This are the some extra suggestions of this group (the suggestions that I already had mentioned are not repeated: Re: What features do you need added to Operations Manager? and Re: What features do you need added to Operations Manager?)
It's a little bit confusion: you have two interfaces: the web interface and the full client interface for performance monitor, protection manger and provision manager and the web interface for all the general stuff.
speed/performance of the tool can always better
a good online tutorial
for protection manager: the names he used (e.g. for his volumes) are probably very logical, but a normal person can't understand them. Why not used names that everybody understand, so that you without OM/PM very easy can see what they are. A lot of users create the relationships by hand and then import those relations in PM.
for protection manager: there are always resources where it's normal that you don't want to protect them. I must be very easy to say that's ok that there is no protection for those resources.
I agree... with all of your suggestions! Most importantly, having multiple seperate interfaces for each portion of the application are a sure step in the wrong direction.
Bryan Bell
Parag,
Here is another one emailed to me Onaro and DFM\OM...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Bryan,
Just discovered this thread in NetApp communities and I'm
not sure of your position in working with the OM product group but is there a
plan to integrate Onaro into OM? Currently, there is limited SAN topology
monitoring and alerts from end to end especially performance measurements.
Thanks
-dave
Dave Rubright drubright@partner.aligntech.com
Reinoud wrote:
"for protection manager: there are always resources where it's normal that you don't want to protect them. I must be very easy to say that's ok that there is no protection for those resources."
You can do this fairly easily in Protection Manager. ProtMgr's definition of 'unprotected' is 'not in a dataset with a protection policy assigned'. There is a policy named 'No protection' which is included for this purpose.
Simply create a dataset for the data you don't want to protect, then assign 'No protection' as the protection policy. This tells Protection Manager you know the data is there and have made the decision not to protect it.
I hope this helps.
Phil Bachman
HI Philip,
Your are completly right. We have indeed solved this on that way.
TX
Reinoud
and what about Protection Mananger makes the job harder?
Shelf firmware - can it report FW info? recommend where we should be?
It would be very good to have much better CIFS statistics -- items such as top CIFS users by ops, MB, etc.
Right now I don't there's any good historical way to see this with any tool (as people see the Sun Amber Road UI and how its demo drills into CIFS users, this question comes up very quickly).
If the reports could be given a average for the month on total ops per sec and latency.
Our monthly reports have total space and latencylun latency. It would be nice if I could show the trend.
Thanks
Give me ONTAP upgrade functionality. I need a way to roll out the ONTAP package. DFM is a webserver so it shoulde be obvious to roll out the ONTAP package via OpsMgr and the "software install" command on ech controller.
It's only about distributing the package in the first place. Download and reboot should be performed individually and not centrally by OpsMgr (for now).
will add more topics as ideas fly by ;-)
regards, Niels
OM reports should be able to provide a field for Volume/Qtree security information.
Identifying Items such as Vol SecurityStyle (Unix/NTFS) and NTFS ACL's would be very helpful.
I believe this information can be easily gathered with the fsecurity command but not available through OM.
Regards,
Robinson.
Hi,
Volume and qtree security styles will be available in custom reports and database views in the next release of Operations Manager.
Thanks,
Raj.
Here are a couple of minor items in Prov Mgr:
1. in the NMC gui, I'd like to have a 'refresh' button. Sometimes when I delete provisioned storage the gui doesn't refresh to show the storage gone. Usually adding storage works just fine but it seems like deletes don't.
2. In Provisioning Policies I'd like to see the ability to set a default qtree size. I can set the quotas but not the default size of the qtree to be created. We are looking at using this for a helpdesk so they can make user home dirs. The quota is 100MB but the default size for the qtree keeps coming up as 20MB everytime they provision, so they manually need to change it. Would like to make that part of the policy template.
3. A post-delete script for provisioning. We can have a post-script for provisioning storage, but when they go to delete storage there's no ability to run a script.
Thanks!
Andy,
What kind of actions do you foresee as part of the post-delete script?
Thanks!
Hi,
I'm currently out of the office in training through Apr 09, I will have limited
access to email and voicemail during this time.
If this is an urgent support issue please call our support line at
1-888-4NETAPP.
I'll return return any mails when I'm back in the office.
Regards,
-jenni
--
Jennifer Coopersmith
NetCache Sustaining Engineer
NetApp Global Services
NetApp
408.822.6908 Direct
www.netapp.com
Would like to see latency graphs in reports. I can get the graphs in Perf Advisor and I can get the rolled up averages in regular reports, but I can't get a graph over time of latencies.
I'd like to put all of my luns for a particular database (or app) into a dataset (or group) and then just run the report against that group so I can have a nice graph of latency during the past week (or day or whatever time frame) for that app. Maybe a line for each lun and a 'average' line.
I would like to see;
1. Tighter integration or even a merge of SANScreen with Operations Manager. They both do pretty impressive stuff independently, but together could be alot more effective at providing an end-to-end monitoring and management platform.
2. Host filesystem utilization monitoring and stronger correlation with all the "storage side" utilisation information we report on. FSRM seems to go some of the way there but what I am looking for is a way to help customers use Thin Provisioning to maximise efficiency. Some are wary of Thin Provisioning because they can't get a good view of how much capacity their applications/files are really using and are afraid they will "get a run on the bank", run out of storage without being prepared and have applications go down.
Thanks,
Aaron
I agree a tighter integration of the two products is needed. I don't think they need to be merged, as we often sell SANscreen into non-NetApp accounts, and Operations Manager is a pure NetApp tool.
What I would like to see is for Operations Manager and SANscreen to share a common GUI design - probably based on NWF (NetApp Web Framework) which is being developed for Ops Manager. I also think that Performance Advisor, and Protection & Provisioning Managers should be ported to this web framework.
Once this is done, you have a 'suite' which looks and feels like a suite, as well as keeping the option to install the components individually.
Under the covers, I think we need to decide on a single database to store the information - OpsMgr uses Sybase ASA and SANscreen uses MySQL - and I think we need to take the data warehouse that the SANscreen team built on Cognos and adapt it to handle Operations Manager and Performance Advisor data as well.
This gives the customer what they've asked for - the ability to build compelling reports based on their data.
Phil
I would like to see the data warehouse functionality that the SANscreen team built adapted to work with Operations Manage and Performance Advisor.
This would give the user the ability to produce compelling reports of their own design using the data the DFM server collects.
Phil
The feature of data ware housing is available in DFM 3.7 onwards.
"Access to DFM database and Performance Advisor data"
This feature would allow database access to information provided by the custom report catalogs, history data collected by DataFabric Manager server. This feature would also allow us to export the counter information collected by the Performance Advisor and the data exposed by custom report catalogs to CSV formatted files.
There is a TR on the same.Below is the link
Small correction for the above post. "Data Warehousing" feature is NOT available in Operations Manager 3.7 but what is available is "Data Export" feature wherein user can export the Operations Manager server and Performance Advisor data. Also user can connect reporting tools like "Crystal reports" to the Operations Manager database using JDBC/ODBC connetions and create custom reports. The details of this feature are available in the TR metioned in the above thread.
Thanks
Ravi
Ravi,
Thanks for clarifying this for the community. My point in making this post is that I believe we need to take the next step and provide the data warehouse. Providing the ability to export the data is a great step forward. The next step, in my view, involves offering the data warehouse to allow granular reporting on the data. Ous SANscreen team has shown this can be done, and I would hope we can leverage their work to provide this solution for Operations Manager and Performance Advisor Data.
Phil
Thanks to Andrew Miller for pointing this thread out to me ![]()
Been playing with Operations Manager quite a bit this week and sorting out some problems and reports for a customer. Come across a few limitations and areas for improvements, just a couple of bullet points.
Within the chargeback setup I'd love to see the ability to set multiple price tiers. This would allow for accurate and complete chargeback reports to be generated.
e.g. FC disk might cost $1/gb, while SATA is $.50/gb.or 15K FC vs 10K FC and so on.
The chargeback rate can be specified at a resource group level. So if you put all the volumes on FC disks in one group and all the volumes on SATA disks in another (for example) you should be able to achieve what you need. A CLI-based workflow for what you need to do is presented below:
[root@trinity ~]# dfm group create fcvols
Created 1 group.
[root@trinity ~]# dfm group create satavols
Created 1 group.
[root@trinity ~]# dfm group set fcvols chargebackRate=1
Changed annual charge rate (per GB) for group fcvols (2048) to 1.
[root@trinity ~]# dfm group set satavols chargebackRate=.5
Changed annual charge rate (per GB) for group satavols (2049) to 0.5.
-- Move all filers/aggregates/volumes to the respective group using 'dfm group add <group-name> <object-name>' --
What about A-SIS?
Right now it is strictly command line. Not even in filer view (yet?). Besides, I don't think O/M reports space on deduped volumes correctly (or aggregates, etc.). I kept getting warnings after upgrading to 3.7
My 2 cents.
Bill
You wish is our command.
Operations Manager 3.8, scheduled for release very soon, is deduplication aware and includes deduplication reporting and monitoring. Protection and Provisioning Managers include provisioning support for dedupe.
Phil
Philip Bachman
Consulting Systems Engineer
NA East - Northeast
NetApp
908-420-7800
Please excuse typos or brevity,
thumbed on my Blackberry.
Hello,
We would like to trend dedupe numbers to see if we are getting better or worse over time for planning purposes. It would good to correlate changes in dedeupe percentages back to a new OS or some other event. After a quick look at 3.8 it seems to be point in time only.
We would also like to see reporting and trending on PAM card usage.
Email reports, as far as I can tell, are zipped and not very user friendly. It would be better to be able to publish these to an external web server or somewhere anyone can easily view them with the click of a link.
I do need to spend more time with the app so I apologize if I have missed something and what I mentioned is already available.
Thanks!
Andrew
+1 for this....would be very helpful to see historical trending on deduplication savings both per volume and aggregate/system.
Hi Andrew,
Operations Manager 3.8 includes historical graph for deduplication space savings at volume and storage system level.
I have attached a graph of "Volume Capacity Used vs Total" which provides dedupe space savings also.
If you would like to graph it in any other tool or reporting engine, you can use the "dfHistoryDayView" or the views for different periods.
# dfm database query run "select * from dfHistoryDayView"
"dfId","timestamp","sampleCount","dfKBytesUsedSum","dfKBytesDedupeSpaceSavingsSum","dfKBytesTotal"
...
Thanks,
Raj.
Hello,
Thanks for the response. I would like to see the vlaue expressed as a percent so I can monitor the percentage over time. Is this available today? We want to see how our useage, new OSes, etc are affecting deduplication numbers. I would also like to be able to see all volumes at once on one graph ( this goes back to another issue with respect to more flexible and customizable built-in reporting.
Hi Andrew,
I would like to see the vlaue expressed as a percent so I can monitor the percentage over time. Is this available today? We want to see how our useage, new OSes, etc are affecting deduplication numbers.
[Raj] This is not available in 3.8.
I would also like to be able to see all volumes at once on one graph ( this goes back to another issue with respect to more flexible and customizable built-in reporting.
[Raj] This is available in 3.8 in the "Volume Capacity Graph" report. The report shows the "Volume Used vs Total" graph i mentioned in the earlier post for every volume.
Thanks a lot for the other suggestions.
Raj.
Just a side note -- dedup is in NetApp Systems Manager (can enable/disable as well as view some basic statistics).
Two things:
Ops Mgr needs to manage system-level flexvol counts. In our customer's SAN environment they are constantly battling staying within flexvol count limits to keep takeover/giveback times within host-timeout values. I need Ops Mgr to do the following:
- Report on the number of flexvols on a system. Distinguish between online and offline.
- Set a user-defined threshold at the group level for the number of flexvols on a system
- Alert on the defined threshold with a severity of "critical"
Also, need to have Ops Mgr alert on an onboard FC port failure. Today's model using SNMP traps sent from the filer cannot be used in our customer's environment. We need a canned event built into the product.
Hi,
Maintenance mode for filers in DFM would be nice! When I am doing something with a filer that's planned I'd like to be able to tell DFM to not alert while that filer is down or whatever.
I would like to see tape drive performance statistics.
A feature that was present way back around v3.4 was that when you selected a filer, it showed several charts on one page that gave a good, quick view of overall system performance at a given point in time, not just a few like the current versions do. I would like to see that again.
Hi Bill,
Even in 3.8, you can go to Appliance Details for any system, and chose "All Graphs" from the "Graphs" drop-down menu.
This shows all the graphs available for that system. Is that what you are referring to ?
Thanks,
Raj.
Dear Santa,
For my new Operations Manager I would like to see the following features (listed in no particular order of importance):
Too much?
Hi Richard,
That sounds about right ;-)
To this I want to add the following:
A trap in the that relates to when a volume has hit its max size. This is in correlation with "vol autosize -m " tag.
We had a situation once where our volumes could not autogrow anymore and the LUNs ran out of space = went offline.
Cheers,
Eric
Message was edited by: eric barlier
Hi Richard,
Configuration Management
This is available in 3.8 (in bug 332666).
Thanks,
Raj.
Richard,
For Reporting
Ability to show Dropped vscan errors from /etc/messages for each filer/all filers
I used perl to search keyword "vscan.dropped.connection" on any filer
Reporting (general)
Interesting... I have seen other customers who have built scripted log monitors that pull their messages and syslog to a central location (the DFM server in this instance) and then use parsing to highlight items of interest, your vscan.dropped.connection would be one example of messages they parse for and alert;
I was also asked about this precise functionality not 30 minutes ago...
Cheers
Rich
Hi,
Due to the fact that we use snapshotting, deduplication, snapmirror and ndmp dump to tape, I'm missing a global overview of how long / when those operations are running and what impact they have on our filers.
The best practice is that those operations should not overlap and a global overview per lun / volume / agregate / filer would be nice.
Greetings,
Kris Boeckx
Upload buttons to send performance (maybe perfstat?) and ASUP data to NetApp so it can be used for Support and account management stuff ![]()
It would be great to see functionality developed that transferred DFM/OM settings (i.e., dfm eventtype, custom reports, etc.) between multiple instances. Large environments are difficult to standardize as there is no simple method to export/import OM only objects (Performance Advisor in the NMC already supports this with thresholds and threshold templates).
Thanks,
sean.
I would like the ability to script custom events that self-heal, like the built-in ones do.
For example when an aggregate-full alarm is issued, the aggregate-space-normal event will cause the original to be removed and the status set to green.
Right now I am writing my scripts to re-test all original events and manually delete them if the situation no longer persists, but this is really cumbersome.
When implementing some monitoring I like my customers to keep a high level track of aggregate and volume/LUN performance - to be aware of spindle limits on an aggregate level combined with latency and CPU stats. Great job in being able to create combined stats alerting, this works wonders!
Something I would like to do but have not found an ideal way to do is to create a threshold for aggregate performance and when it triggers export an email report of the top 10 LUN/Volume IOPS/Latency creating that specific load. It would be nice to just add this into the NMC GUI to set up. Or really any way to easily find out which objects played a role in an aggregate peak performance at some point in time.
On a same note (maybe we are blind) but there does not seem to be a way to view perf data in the NMC with a custom zoom to a specific period, there are buttons for 1-6 hours or max. I would like to be able to zoom to, say, 11-14 nov 2009 and view that period.
Also when you create custom views and want to change them we can only remove charts instead of edit them which is kind of cumbersome.
For operations manager itself I always find myself creating custom views for LUN/snapreserve/fractional reserve space overviews, in line with the latest best practices on thin provisioning it would make sense to have some detailed standard dashboards to more easily find out about the myriad levels of space management involved.
Hi,
You'll see the Custom Zoom feature in the next release of PerfAdvisor;
Interesting points on the other requests
Cheers
Rich
Performance Advisor has some more features added in upcoming release. You can try this out by participating in OM 4.0 beta program - http://communities.netapp.com/docs/DOC-4597
Not sure wether this thread is still actual or not.
But as I already stated earlier in some posts, the biggest pain point is still the snapshot naming.
I would like to have:
- the choice between a "generic"-naming (_recent) and "unique" (timestamp)
- snapshot names without any blanks/spaces, cause they are a nightmare for every scripter
- no ":" in the timestamp, because this seems to cause problems for Windows previous versions (https://now.netapp.com/eservice/case/caseView.jsp?caseId=200121238)
Next I am still unhappy with the concept of choosing an "archive management group" and thus be obligated to create jobs for every retention I want.
For the interested the posts:
http://communities.netapp.com/message/12645#12645
https://forums.netapp.com/message/9426#9426
Hi All,
I have connected to the DFM database using crystal reports, however I can only run queries on the DFMGROUP DB and not the DFM DB which seems to have the data I require for my reports.
Is this a permission issue or does DFM by design restrict access to the DFMGROUP DB.
thanks
Hermon
DFM by design restrict access to the DFMGROUP DB only.
You can view the data exposed by Database Views.