CURRENT_MEETING_REPORT_ Reported by Claudio Topolcic/CNRI and Bernhard Stockman/NORDUnet Minutes of the Operational Statistics Working Group (OPSTAT) Monday's Session The purpose of this meeting were: 1. Review the current status of the OPSTATS activities o Bernhard's papers o Other related efforts, specifically, Susan Estrada's BOF 2. Decide what can be progressed now and progress it o Model o Set of metrics (simple SNMP only) o Display formats o Simple collection, storage, and exchange 3. Define what is still left to do o MIB for new SNMP variables o Exchange protocol o More sophisticated storage formats o Develop publicly available collection tools o Display formats for weekly and instantaneous reports 4. Specific actions to be taken in this meeting were: o Decide polling period o Agree on what to progress o Edit Bernhard's papers, review on Thursday, submit as Internet Draft The model was presented for people who were new to the group. A fundamental part of this model is the agreement on a common minimal set of metrics that will be collected. It was noted that some of these may be difficult to obtain. It had been proposed that there would be three report formats that would be produced; a monthly report, a weekly report, and an instantaneous display. A format for the monthly report had been agreed to. It was described as a ``Macdonalds'' report because it would contain only total aggregates. It was felt that this report would support management activities, whereas the weekly report would support engineering planning, and the instantaneous display would support problem resolution. However, it was realized that the real distinction was not the time frame but the degree of aggregation of the data. The data in 1 the management reports would be more aggregated that that in the engineering reports, regardless of the time they covered. Bernhard's documents described the data that would be collected from each router, both for each of the router's interface, and for the router itself. These are all MIB variables. It was at first assumed that the per interface variables were specific to IP, but it was pointed out that the loading data needs to be total, not IP specific, or the link loading could not be determined. It was also pointed out that the MIB interface variables are multi-protocol anyway, so there is no problem. However, it was also pointed out that if the router variables are IP only, then they do not give a measure of the router's loading. It was noted that the loading information that is important is not related to any interface, but to the links. Links are occasionally rehomed when interfaces fail. Currently, the data is processed by hand to compensate for such rehoming. The documents do not make this distinction and need to be clarified. Dropping the ``storage requirements'' section of Bernhard's document was considered, but it was decided to keep it in, since dropping it would give the misimpression that the group hadn't thought about the problem. It had been proposed that the client-server model not be covered in the current documents. The reason, in part, was that the original purpose of the Working Group was to get the various network operators to produce consistent reports that could be compared, not to exchange information, and that exchanging information is not required very often. The data storage format was discussed. The format impacts what will be stored and what can be done with it. To reduce storage requirements, several people proposed that raw data could be kept for some period of time, and then aggregated somewhat and kept for some other period of time, and then further aggregated. The proposals differed in the time periods, and the form of aggregation. However, it was pointed out that although engineering requirements tend to be common, so common non-aggregated data will be useful, management requirements tend to differ, so common aggregated data is not useful. In the end, it was realized that how much data is retained, and how long, are local decisions that cannot be standardized. The data format should support the process that the data will undergo. The process was identified as: 1. Collect status data about routers and interfaces. 2. Collect ``resource'' data, for example, about the mapping of links to interfaces. 3. Process the data to merge 1 and 2, decreasing the quantity of data but without loss of information. 2 4. Produce reports from the above reduced data. It was understood that the processing in step 3 would not lead to sufficient reduction in quantity to address long term data storage problems. However, it was felt that this processing should not be combined with the report generation. Bernhard proposed a raw data format, which was discussed. He will incorporate suggestions into his document. It was suggested that the monthly reports be based on a matrix that identified all the variables that would be collected and processing functions that could be applied to them. This would not only clearly delimit the scope of the report generation process, but would also allow new variables to be added easily. However, this approach would not support functions that are based on multiple variables, and although the matrix could be relatively full, any network operator might select only a few possibilities, and worse, the different operators might select different sets. It was felt that the Working Group should recommend a specific polling period. Two were on the table; 5 minutes and 15 minutes. Concern was expressed that 5 minutes or less might result in excessive overhead or be impossible to implement with a poller that polls one router at a time. For variables describing link loading, such as bytes transmitted, the polling period is a function of the line speed. A one minute polling period will miss the interesting peaks of a T1 line, but will show the individual packets on a 1200 baud line. For variables not describing link loading, such as packets dropped, the polling interval can generally be very long, until the value changes, at which time the polling period should be shortened to help identify the problem. So it may be that a 15 minute polling period is sufficient for anything other than link utilization. This discussion was deferred until the next meeting on Thursday. Geoff Huston suggested a different approach. He proposed that the link utilization parameter that is most closely correlated to the clients' dissatisfaction is the mean standard deviation of inter-packet arrival times of evenly spaced (when transmitted) TCP packets. He suggested that this parameter explodes as soon as congestion appears. Thursday's Session During the second OPSTAT session the storage format and the polling periods were discussed in more detail. The Storage Format The placeholder for the header section is suggested to be within the log-file. However, there might be useful with both separate and in-band 3 headers. It was expressed the need for multiple header sections within one log-file. When closing and reopening the same log-file there is the need for close and start time specifications. When changing log-source there is the need of specifying a new device. Three delimiter pairs were suggested: BEGIN_TIME - END_TIME BEGIN_DEVICE - END_DEVICE BEGIN_DATA - END_DATA There are currently two storage formats. The version presented by Bernhard Stockman and and earlier version produced by Chris Myers. Chris Myers volunteered to produce a second version of his storage format strawman. The generic log data format is: timestamp, tag, delta_sample_interval, data1, data2, data3, ..., dataN where the tag defines the logged variables. The Polling Period The reason for the polling is to achieve statistics to serve as base for trend and capacity planning. From the operational data it shall be possible to derive engineering and management data. It will not be sufficient with a polling period of 15 minutes to detect variations in peak-behavior. It was suggested that a period of maximum 1 minute would be needed. Using such a tight polling period will create a need for aggregating stored data. Aggregation here means to over a period with logged entries, a new aggregated entry is created by taking the first and last of the previously logged entries over some aggregation period and compute a new entry. A method of displaying both average and peak-behaviors in the same bar-diagram is to compute both the average value over some period and the peak value during the same period. The average and peak values are then displayed in the same bar. A problem here is how to aggregate peak values. There is the possibility of creating a new peak value being the peak of all the peaks, the average of all the peaks, etc. 4 Another reason for aggregation is the differentiation of needed polling periods depending on the reason for and source of the polling. What is foreseen is that over a relatively short period, polled data will be logged at the tightest polling period (1 minute) regularly these data will be pre-processed into the actual files being stored. The pre-processing may include steps such as the computation of percent samples above a certain limit, average of all samples during the aggregation period, cumulative histograms. This pre-processing will than not only serve as storage compacting but also provide some initial statistical processing. Recommendation on polling period: Basic polling period 1 minute (60 seconds). Recommendation on aggregation periods: Over a 24 hour period aggregate to 15 minutes, 1 month period aggregate to 1 hour, 1 year period aggregate to 1 day Aggregation is the computation of new average and maximum values for the aggregation period based on the previous aggregation period data. Recommendation for saving periods of logged and aggregated data: 15 minute aggregation period saved 1 week. 1 hour aggregation period saved 1 month. 1 day aggregation period saved 1 year. Finally it was decided that, as the current document will not contain the protocol specification of the client-server model, it will be sufficient to put the comming RFC into the informational track. Attendees Vikas Aggarwal aggarwal@jvnc.net Miriam Amos Nihart miriam@decwet.zso.dec.com Jordan Becker becker@nis.ans.net 5 Robert Blokzijl K13@nikhef.nl Steve Bostock steveb@novell.com Randy Butler rbutler@ncsa.uiuc.edu John Gong jgong@us.oracle.com Phillip Gross pgross@nis.ans.net Greg Hollingsworth gregh@mailer.jhuapl.edu Kathleen Huber khuber@bbn.com Geoff Huston g.huston@aarnet.edu.au Walter Lazear lazear@gateway.mitre.org April Marine april@nisc.sri.com Robert Morgan morgan@jessica.stanford.edu Dennis Morris morrisd@imo-uvax.dca.mil Chris Myers chris@wugate.wustl.edu Rebecca Nitzan nitzan@es.net Marsha Perrott mlp+@andrew.cmu.edu Ron Roberts roberts@jessica.stanford.edu Timothy Salo tjs@msc.edu Bernhard Stockman boss@sunet.se Joanie Thompson joanie@nsipo.nasa.gov Claudio Topolcic topolcic@nri.reston.va.us Andrew Veitch aveitch@bbn.com Wengyik Yeong yeongw@psi.com Osmund de Souza osmund.desouza@att.com 6