Editor's note: These minutes have not been edited. 0. Agenda review/changes The proposed agenda was accepted without changes. 1. Why two parallell CIP drafts ? Patrick explained that he and Roland shared the view that Chris Weider's draft didn't reflect the consensus of the group reached at the LA meeting and also had too much whois++ stuff in there. Therefore a second draft was produced by by Jeff Allen and Patrik Faltstrom. The intended outcome of this is that these two drafts will be merged into one. 2. Charter of the find group There where some discussion about which papers were going to be produced. The consensus was that there should be one document specifying the CIP, another one specifying how to use centroids as one special case of indexes within the CIP and further for each client - server protocol that is goint to use the CIP one paper describing the mapping between the data representations and one describing the access method. 3. LDAP/CIP work at Umea University Roland Hedberg presented the work he has been doing to enable a X.500 DSA to work as an index server and he also presented a WWW- interface that can use this index server. The WWW-interface can be reached at http://macavity.umdc.umu.se/~roland/query2.en.html and the indexserver it accesses contains all the information presently accessable in the Swedish branch of the X.500 DIT (~50.000 entries). For the time being the index only contains names of people. Roland will produce a draft describing the objectclass and attributes needed to ackomplish this . 4. The new CIP draft Jeff Allen presented the gist of the new draft. The discussion following the presentation led up to a couple of unresolved items: The use of MIME - should/can INDEX-CHANGED be structured as a MIME message Aggregation ala CIDR - facilitate query routing. Incremental updates - per application domain or general. Security - both regarding exporting indexes and data protection. Centroid scaling issues - certain datasets only contain unique items which means that the resulting index is no smaller than the original dataset. Frontends to indexservers might only speak one access protocol - clients speaking another access protocol can not pass this server, while climbing the tree upwards or downwards, which means that parts of the mesh might be unaccessable to the client. 5. Workshop of Distributed Indexing and Searching Erik Selberg presented some ideas on using query routing within the Web indexing sphere which came out of the workshop . It was felt that introducing query routing and distributed index servers is a necessary step in the development of the Web indexes since the current centric approach doesn't scale. More info on the workshop can be found at http://www.w3.org/pub/WWW/Search/9605-Indexing-Workshop/ It was agreed that followup work undertaken by the query routing contingent from the Distributed Indexing/Searching Workshop would be folded into the FIND working group. 6. The CIP and CCSO Martin Hamilton presented his work on integrating CCSO nameservers with the CIP. His conclusion was that it was viable but that there remained some items that have to be resolved. There is no standard URL format for a CIP referral to a CCSO nameserver. For the time being Martin proposed that one could use the gopher one (gopher://ccso.server.domain.name:105/2). Another question is whether the CCSO should the CCSO attribute names and types be normalized to a common schema. 7. Scaling of the CIP Patrik presented some graphs showing the relationship between the size of a centroid and the size of the actual datasets both when looking a people informations from the phonebook and large document collections. Phonebook information revealed the not very astonishing fact that phonenumbers are unique which means that the centroid increased almost linearly with the growth of the dataset. Removing phonenumbers from the centroid gave a much slower growth and it also appeared to be asymptotic. When indexing words out of documents the curve didn't seem to level off when the dataset grew ( max dataset size ~12.000.000 tokens). When applying a stop list weeding out very frequent words and very unusual words the curve became asymptotoic, reaching 60.000 and levelling off to be leveling of at that value.