Internet Draft Network Working Group K. Kinnear INTERNET DRAFT American Internet Corporation R. Cole AT&T MNS R. Droms Bucknell University July 1997 Expires January 1998 An Inter-server Protocol for DHCP <draft-ietf-dhc-interserver-02.txt> Status of this Memo This document is an Internet-Draft. Internet-Drafts are working docu- ments of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute work- ing documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference mate- rial or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract The DHCP protocol is designed to allow for multiple DHCP servers, so that reliability of DHCP service can be improved through the use of redundant servers. To provide redundant service, all of the DHCP servers must be configured with the same information about assigned IP addresses and parameters; i.e., all of the servers must be config- ured with the same bindings. Because DHCP servers may dynamically assign new addresses or configuration parameters, or extend the lease on an existing address assignment, the bindings on some servers may become out of date. The DHCP inter-server protocol provides an auto- matic mechanism for synchronization of the bindings stored on a set of cooperating DHCP servers. This draft is a direct extension of draft-ietf-dhc- interserver-00.txt, and represents the merging of ideas from both Kinnear, Cole & Droms [Page 1] DRAFT July 1997 draft-ietf-dhc-interserver-alt-00.txt and draft-ietf-dhc- interserver-01.txt. The basic protocol semantics from draft-ietf- dhc-interserver-alt-00.txt were used with the underlying message map- ping to SCSP from draft-ietf-dhc-interserver-01.txt. Considerable additional work has been included in this current draft in the area of protocol correctness, detailed work on mapping the protocol to SCSP, and organization of the draft itself. 1. Introduction DHCP servers manage the assignment of IP address and configuration parameters to IP hosts. The DHCP protocol specification [1] refers to the collection of configuration information assigned to a client as a "binding". The DHCP protocol is designed to allow for multiple DHCP servers, so that reliability of DHCP service can be improved through the use of redundant servers. To provide redundant service, all of the DHCP servers must be configured with the same information about assigned IP addresses and parameters; i.e., all of the servers must be configured with the same bindings. Because DHCP servers may dynamically assign new addresses or configuration parameters, or extend the lease on an existing address assignment, the bindings on some servers may become out of date. The DHCP inter-server protocol provides an automatic mechanism for synchronization of the bindings stored on a set of cooperating DHCP servers. The remainder of this document is organized in the following sec- tions: 2. Goals and Requirements Defines the requirements and goals for the protocol. Discusses limitations of the protocol. Also contains a definition of several classes of failures as well as a list of specific fail- ures (which provide a useful common ground for discussion). 3. Overview Discusses in a general way the content of the information com- municated between servers implementing this protocol as well as the way that information is communicated. Introduces the three aspects of the protocol: client binding management, address management, and group management. Kinnear, Cole & Droms [Page 2] DRAFT July 1997 Defines some key concepts surrounding the allowable "states" of an IP address, including extensions critical to the operation of this protocol. Gives a brief sketch of the actions required by this protocol for each DHCP client request received by the server. 4. Client Binding Management Discusses the fundamental messages used by this portion of the protocol, and the ways in which these messages are combined to form higher level operations. Required responses to incoming client binding management requests are explained in this sec- tion. The required responses to incoming DHCP client requests are explained in Section 6 below. 5. Address Management The fundamental messages used by the address management portion of the protocol are explained, as well as how they are combined into higher level operations. The required responses to incom- ing address management requests are explained in this section, while the required responses to incoming DHCP client requests are explained in Section 6 below. 6. Actions in Response to DHCP Client Messages and Events The required responses to incoming DHCP client messages and events are discussed in this section. 7. Group Management The fundamental messages and their combination into higher level operations for the group management portion of the proto- col are explained. The actions to take when receiving any of these messages as well as how to utilize them to join or leave a server group are explained. 8. SCSP Message Mapping The messages described in sections 4, 5, and 7 are mapped into underlying SCSP messages in this section. This includes detailed information on the format of each SCSP message. 9. IP Address State Transition This protocol expands the possible states for an IP address. The new states are described in Section 3.3. This section Kinnear, Cole & Droms [Page 3] DRAFT July 1997 describes all of the transitions between states in detail. 10. Security The security implications of this draft are discussed in this section. 11. Open Questions Poses open questions about the protocol. Some questions from draft-ietf-dhc-interserver-00.txt are included verbatim with answers and questions (and some answers) new to this draft are included as well. 12. Acknowledgments 13. References 14. Author's Information A. Appendix A: An Overview of SCSP 1.1. The Language of Requirements Throughout this document, the words that are used to define the sig- nificance of particular requirements are capitalized. These words are: o "MUST" This word or the adjective "REQUIRED" means that the item is an absolute requirement of this specification. o "MUST NOT" This phrase means that the item is an absolute prohibition of this specification. o "SHOULD" This word or the adjective "RECOMMENDED" means that there may exist valid reasons in particular circumstances to ignore this item, but the full implications should be understood and the case carefully weighed before choosing a different course. o "SHOULD NOT" Kinnear, Cole & Droms [Page 4] DRAFT July 1997 This phrase means that there may exist valid reasons in particu- lar circumstances when the listed behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label. o "MAY" This word or the adjective "OPTIONAL" means that this item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because it enhances the product, for example; another vendor may omit the same item. 1.2. Terminology This document uses the following terms: o "DHCP client" A DHCP client is an Internet host using DHCP to obtain configura- tion parameters such as a network address. o "client" Whenever the term client is used in this draft, it refers to a DHCP client (and not a server communicating with another server using this protocol). o "DHCP server" A DHCP server is an Internet host that returns configuration parameters to DHCP clients. o "binding" A binding is a collection of configuration parameters, including at least an IP address, associated with or "bound to" a DHCP client. Bindings are managed by DHCP servers. o "active server" An active server is one which is capable of offering IP addresses to clients. o "stable storage" Kinnear, Cole & Droms [Page 5] DRAFT July 1997 Every DHCP server is assumed to have some form of what is called "stable storage". Stable storage is used to hold information concerning IP address bindings (among other things) so that this information is not lost in the event of a server failure which requires restart of the server. 2. Goals and Requirements There are several levels of goals for this protocol. There are a set of requirements with which it must comply, and then there are a set of goals for the protocol and the way that it is used that are listed in priority order. 2.1. Requirements on this Protocol The following list of requirements must be (and are) achieved by this protocol. 1. Implementations of this protocol work with existing DHCP client implementations based on the DHCP protocol [1]. It must work with today's clients! 2. Implementation works with existing BOOTP relay implementations. 3. Can be specified with sufficient clarity that unique implementa- tions will work well together the first time (e.g. DHCP today largely meets this requirement). 4. Work well with minimum of two and a maximum of 16 servers. 2.2. Goals of this Protocol The following are the goals of this protocol. These goals are listed in priority order. The protocol meets all of these goals. 1. Avoid binding an IP address to a client while that binding is currently valid for another client. In other words, don't allo- cate the same IP address to two clients. 2. Ensure that an existing client can keep its existing IP address binding if it can communicate with any DHCP server using this protocol -- not just the server that originally offered it the binding. DISCUSSION: Kinnear, Cole & Droms [Page 6] DRAFT July 1997 There is a subtle but very important point here. For exam- ple, assume that there are five servers using this protocol. Everything is running fine, and then the network becomes par- titioned, and three servers can communicate among themselves, and the other two can communicate among themselves -- but the set of three cannot communicate with the set of two. Each set, however, can communicate with some clients. In this situation, every client that can communicate with a DHCP server in either set should be able to continue to use its existing binding, even if the server that originally cre- ated the binding is not included in the set of servers with which it can communicate. 3. Do not add any requirement for communication with another server to the processing between a DHCPDISCOVER and a DHCPOFFER or between a DHCPREQUEST and a DHCPACK. DISCUSSION: This is another subtle point. The implications of this goal are that "lazy" update of IP address binding information is required. In other words, because of this goal, the protocol cannot require one server to update another server with information concerning a new IP address binding prior to sending the DHCPACK to the DHCP client. As a result of this goal, a server may fail immediately after sending the DHCPACK to the client but prior to successfully sending a record of that information to any other server. Should this happen, the DHCP client is the only operational machine with a record of this binding -- and the protocol must be (and has been) designed to properly deal with this situation. 3. Ensure that a new client can get an IP address from some server. 4. If a server goes down, and an external agent determines that it is actually down as opposed to running but simply unable to com- municate with other servers, then the addresses that it cur- rently owns but are not yet bound may be recovered for use by other servers. 5. Ensure that in the face of partition, where servers continue to run but cannot communicate with each other, the above goals and requirements are met. In addition, when the partition condition is removed, allow graceful automatic re-integration without requiring human intervention. Kinnear, Cole & Droms [Page 7] DRAFT July 1997 2.3. Limitations of this Protocol The following are explicit limitations of this protocol. This is not to say that they are not useful capabilities to have (that's why they are explicitly listed, so that it will be clear that this protocol does not supply them). 1. Determination of permanent server failure. The protocol provides a way to propagate information about the permanent failure of a server, but no way to detect a permanent failure. Transient failures are detected, but there is no mech- anism in this protocol to determine when a transient failure is really a permanent failure. Some external agent must make this determination -- and must ensure that the server declared perma- nently failed is not simply partitioned from the other servers and unable to communicate with them. The server which has been declared permanently failed by the external agent MUST be informed of that declaration prior to restart. DISCUSSION: The existing configuration messages allow one server to declare another server as permanently failed and remove it from the group. That is not the issue. What makes fully automatic determination of permanent server failure impracti- cal is distinguishing between permanent server failure (which is easily defined as transient server failure that has gone on too long) and partition of the group of servers. Once communication fails with a server, the other servers cannot know if it is still operating or not, and removing an operating server from the group is an activity fraught with peril. This protocol is designed so that a server which is parti- tioned from the group will re-integrate cleanly when it can communicate again with the rest of the group. Group membership protocols typically handle a partition situ- ation (when they bother to handle it at all) by having the partitioned server determine that it has been partitioned and shut itself down. It detects a partition condition in one of two ways: either it can't communicate with the "master", or it can't communicate with the "majority" of the group. In either case, it shuts down. We believe that this is not an appropriate response for a Kinnear, Cole & Droms [Page 8] DRAFT July 1997 DHCP server. If my DHCP client can talk to a DHCP server, I want my client to continue to operate -- I'm not interested in having the only DHCP server to which I can talk shut itself down! 2. Some addresses are temporarily unavailable during transient server failure. The full range of existing IP addresses that are potentially available for allocation is reduced during the period of a tran- sient server failure. The size of the pool of addresses that are available for allocation but not yet allocated SHOULD be configurable for each server. If the server is subsequently declared to have undergone a permanent failure, these addresses will be made available again. Note that it is only the addresses not yet allocated but avail- able for allocation that are unusable during the period of a transient server failure. IP addresses that have been allocated to clients may continue to be used by those clients even during server failure. Indeed -- to allow existing clients to be able to renew their existing IP addresses even if the server who granted them the lease has failed is a primary reason why this protocol exists. 2.4. Failures This section makes explicit both classes of failures as well as a list of specific failure scenarios in order to facilitate discussion of the capabilities of this protocol. o "transient server failure" A transient server failure is one where a server is unable to respond to requests, but later becomes operational and able to respond to requests. Its local stable storage (i.e., whatever mechanism it uses to preserve its binding information) is accu- rate as of the time that transient server failure began. o "permanent server failure" A permanent server failure is one where a server is unable to respond to requests -- probably for an extended period. While the protocol defined in this document supports declaration of a per- manent server failure, the decision that a transient server fail- ure is in reality a permanent server failure is beyond the scope of this protocol. Kinnear, Cole & Droms [Page 9] DRAFT July 1997 This determination will be likely be performed by some adminis- trative entity, although in the future a group membership proto- col could be integrated with the protocol defined in this docu- ment to make such determinations automatically. o "partition" A network partition is caused by a failure of the underlying com- munications substrate, such that two systems that could previ- ously communicate cannot now do so. This may mimic transient server failure, but is not the same because in this case the server that appears to have failed may still be operational and interacting with clients. There is a form of partition known as "partial partition", where the transitivity of communication usually expected is not achieved. Imagine a set of servers organized (for the purposes of exposition only) as a ring where each server can communicate with its neighbors, but nobody else -- and when the number of servers is greater than three, a partial partition situation exists. This term may also be used as a noun, as in "each partition may communicate with ...", and in this case it refers to the group of servers which can communicate normally (as distinguished from those with which that group cannot communicate). o "communication failure" Communications failure describes the condition where the communi- cation channel between two servers becomes impossible. "Partial communication failure" describes the case where the normally bidirectional communications channel becomes unidirectional, where one server can send to but not receive from another server. Some examples of the above failures are given below: 1. A single server crashes and reboots. [transient failure] 2. A single server crashes and stays down for a period of hours and then reboots (either automatically or through some external agent). [transient failure] 3. A single server fails and never returns. No permanent failure is declared for this server. [transient failure] 4. A single server fails. A permanent failure is declared for this server. [permanent failure] Kinnear, Cole & Droms [Page 10] DRAFT July 1997 5. A group of two servers are partitioned so that they cannot com- municate, but each can communicate to some clients. [partition] 6. A group of five servers are partitioned so that three can commu- nicate together and the remaining two can also communicate, but the two partitions cannot communicate. Each partition can com- municate with a subset of the clients, and these subsets are disjoint. [partition] 7. A group of five servers are partitioned so that three can commu- nicate together and the remaining two can also communicate, but the two partitions cannot communicate. Each server continues to be able to communicate with all of the clients. [partition] DISCUSSION: This situation is unlikely to occur, but the protocol should be able to handle it. 8. Server A can send packets to server B, but cannot receive pack- ets from server B. [partial communications failure] 9. There are four servers, A, B, C, and D. A cannot communicate with C, B cannot communicate with D. [partial partition] DISCUSSION: This section on failures may well not belong in the final docu- ment. For the purposes of review of the rest of the protocol, however, defining a common language to describe failures and giv- ing specific examples of failures as an aid to discussion seemed useful. 3. Overview At the most basic level, the DHCP protocol specifies the behavior of DHCP servers which communicate with DHCP clients in order to allocate IP address to the clients as well as provide a variety of configura- tion parameters information to them. It is the allocation of IP addresses to clients by the server that creates a requirement to update what is known as "stable storage" -- typically held on disk. This information is used to "remember" the IP address bindings that have been made by the DHCP server in order to avoid allocating the same IP address to two clients. The key motivation for an inter-server protocol is the desire to allow a client to continue to use its IP address (i.e., be able to Kinnear, Cole & Droms [Page 11] DRAFT July 1997 renew its lease on an IP address) even if the server who initially offered it the lease on its IP address is unavailable for some rea- son. In addition, no IP address should ever be bound to two clients simultaneously. Providing multiple DHCP servers to which each client can communicate is the first step in creating this reliable DHCP capability. In addition, these DHCP servers must communicate among themselves in order to provide this reliable DHCP capability. 3.1. Information Communicated by the Protocol There are three types of information which must be communicated between servers implementing the server server protocol. o Client Binding Information This entire interserver protocol exists in order to allow servers to share information about client bindings of IP addresses. Servers must be able to update other servers about client bind- ings that they have created, and must be able to receive similar updates from other servers about client bindings that the other servers have made or changed. o Address Management Information In order to implement an effective strategy for client binding information updates, this protocol defines some additional states for an IP address beyond those defined or implied by RFC 2131 [1] that are not directly connected with client binding information. The servers need to communicate among themselves concerning these states, and this communication is enabled by the address manage- ment information portion of the protocol. o Group Management Information While it is possible to conceive of a group of servers statically configured to be part of a server group, the operational charac- teristics of such an approach are far from pleasant. The group management portion of this protocol allows a server to determine the groups to which another server belongs; determine for each group the current membership in the group; determine for each group the subnets and IP addresses managed by that group; and join or leave a server group. Kinnear, Cole & Droms [Page 12] DRAFT July 1997 3.2. Server Groups Fundamental to this protocol is the "group" of servers which are com- municating and with which the clients can communicate in order to provide a reliable DHCP service. Each server group (SG) to which a server belongs is associated with a particular set of address pools. These address pools are those which exist on a single network segment (sometimes called a single "wire"). An active server can be (and typically would be) a member of several groups simultaneously. This protocol allows a server to join an existing SG. Which SGs a server would join is a configuration issue for a particular server, and outside of the realm of this protocol -- although considerable support is provided in order to make this a solvable problem. The membership of a particular SG will change over time, and in order to ensure that each server is made aware of any changes in group mem- bership in a timely way, every protocol message which is sent in the inter-server protocol includes a group generation number (with a few exceptions). Whenever a message is received, the group management layer of the software MUST verify that the group generation number matches the current group generation number for that SG stored in the server. If there is a mismatch, the group management layer will discard the mes- sage. It will then attempt to update its knowledge of the current group (and incidentally bring its generation number up to date in the process). In this way, any changes in group membership become spread throughout the group as fast as possible -- and no messages that are out of syn- chronization with the latest concept of group membership can be received. A server attempts to become a member of a particular group by using the configuration messages described in Section 7 below. In addi- tion, a server can remove another server from the group using these messages -- but in this case an external agent must ensure that the server being removed is truly inactive and not just partitioned. 3.3. Messages and Operations Defined by the Protocol The protocol requires that servers who implement it can communicate, each with the other, in a point-to-point manner (when all are operat- ing correctly). It allows for the possibility that they can fail Kinnear, Cole & Droms [Page 13] DRAFT July 1997 entirely (i.e., crash) or be unable to communicate with each other for a variety of reasons. Each server will periodically need to communicate with other servers in the group. There are several recurring styles of communication that, if defined, will assist in explaining the major concepts of this protocol. These major styles of group communication are as fol- lows: There are "messages", which for the purpose of this specification consist of a communication between two servers. Messages are gath- ered into higher level generic "operations", which describe the form of the operation, and are made up of messages communicated between more than one server. These generic operations are then instantiated into specific operations as part of the various portions of the pro- tocol. 3.3.1. Generic Protocol Messages Messages are used to communicate between a pair of servers. o QUERY A QUERY operation is performed when one server wishes to obtain knowledge about the server cache of another server. o UPDATE An UPDATE operation is performed when one server wishes to update the information in the cache of another server. 3.3.2. Generic Protocol Operations These generic protocol operations are used when a server must commu- nicate with more than one other server. o POLL A POLL operation is used when one server must contact every other server in the group using a QUERY message in order to request that they respond with some information (typically concerning an IP address). Usually, if the server executing the POLL cannot contact all of the other servers using the QUERY message, it will use whatever information it could glean from those it could con- tact. o COMPLETE POLL Kinnear, Cole & Droms [Page 14] DRAFT July 1997 A COMPLETE POLL is like a POLL in that one server attempts to contact every other server using a QUERY message -- but in a COM- PLETE POLL it must successfully complete a QUERY with each of them or the operation itself fails to complete. o PUSH A PUSH operation is used when one server wants to update all of the other servers using an UPDATE message. In a way similar to the POLL operation, a PUSH operation will succeed if the server employing it has managed to contact at least one other server in the group with a successful UPDATE. o COMPLETE PUSH A COMPLETE PUSH is analogous to a COMPLETE POLL -- the COMPLETE PUSH operation requires the server to attempt to UPDATE every other server in the group. If every server responds successfully to the UPDATE, the COMPLETE PUSH succeeds, otherwise the COMPLETE PUSH fails. Note that both PUSH and POLL involve operations to all of the servers in the group. 3.3.3. Specific Protocol Operations These above generic forms of inter-server communication are utilized in the following ways in the Client Binding and Address Management. Client Binding Management: o CLIENT BINDING POLL (operation) This operation involves one server asking every other server using a QUERY for client binding information concerning a partic- ular IP address. If all of the other servers are not opera- tional, the requesting server will use any information it receives. o CLIENT BINDING COMPLETE PUSH (operation) This operation involves one server informing all of the other servers using an UPDATE about updated client binding information. While there is utility in reaching even one other server (in some cases) the operation is not deemed to have succeeded unless all of the other servers were successfully updated with the new information. Kinnear, Cole & Droms [Page 15] DRAFT July 1997 Address Management: o UNBINDABLE COMPLETE POLL (operation) In this operation, all of the other servers are contacted using a QUERY concerning one (or more) IP addresses, and they all report on whether that IP address(es) is UNBINDABLE or not. This opera- tion fails if any server fails to respond to the QUERY or if any server responds to the QUERY with a negative answer (i.e., the IP address is not currently UNBINDABLE). It succeeds only when all of the servers in the server group answer that the address is UNBINDABLE. o TRANSFER (message) This message is used to transfer BINDABLE IP addresses from one server to another (used when the SG is partitioned and the normal UNBINDABLE COMPLETE POLL cannot be used to make an IP address BINDABLE, but also when all of the UNBINDABLE IP addresses have already been made BINDABLE by some server). The information is sent from the initiating to the responding server as a QUERY and includes the subnet specification and the number of BINDABLE IP addresses the initiating server has avail- able for that address pool, and the number of BINDABLE IP addresses it is requesting. The responding server is free to give the initiating server all, some, or none of the number of IP addresses the initiating server has requested. 3.4. IP Address State The concept of the state of an IP address is largely implicit in the DHCP RFC [1]. However, in order to manage pools of IP addresses with multiple servers, the states and transitions between them must be made quite explicit. 3.4.1. IP Address State: Basic DHCP Protocol When an IP address is always controlled by a single DHCP server (implicit in the definition of DHCP in the current DHCP draft [1]) the IP address is either in the BINDABLE state or the BOUND state. The following state diagram represents the states that an IP address may occupy based on the current DHCP draft. (Note that these terms do not appear in [1], but are terms that describe concepts that are Kinnear, Cole & Droms [Page 16] DRAFT July 1997 implicit in the RFC.) +-----------------+ | | | BINDABLE |<--+ | | | +-----------------+ | | | V | +-----------------+ | | | | | BOUND |---+ | | +-----------------+ Figure 3.4.1-1: Basic DHCP IP address state transition diagram When an IP address transitions from BINDABLE to BOUND, that transi- tion must be recorded in the server's stable storage prior to the transition being "published" to any observer outside of the server. 3.4.2. IP Address State: Extensions to Support the Interserver Protocol The situation is more complex when multiple servers are managing the same set of IP addresses as required by this protocol. Three new states are defined for an IP address: UNBINDABLE, POLLING, PUSHED and EXPIRED. This is the state diagram for IP address state required by this pro- tocol: Kinnear, Cole & Droms [Page 17] DRAFT July 1997 +-----------------+ | | | UNBINDABLE |<--------+ | | | +-----------------+ | | | V | +-----------------+ | | | | | POLLING |-------->| | | | +-----------------+ | | | V | +-----------------+ | | | | | BINDABLE |-------->| | | | +-----------------+ | | | ----------------------------- | V | +-----------------+ | | | | +-->| BOUND |-------->| | | | | | +-----------------+ | | | | | | | V | | | +-----------------+ | | | | | | | | | PUSHED |-->| | | | | | | | +-----------------+ | | | | | | V V | | +-----------------+ | | | | | +<--| EXPIRED |-------->+ | | +-----------------+ Figure 3.4.2-1: Extended DHCP IP address state transition diagram required for the Inter-server protocol. Kinnear, Cole & Droms [Page 18] DRAFT July 1997 For every server which cooperates using this protocol, an IP address is in one of the following six states: o UNBINDABLE This state represents the default state for every IP address. Explicit action must be taken to move an IP address from this state into the BINDABLE state. An UNBINDABLE COMPLETE POLL must be performed and must complete successfully. Any IP address that has previously been BOUND must retain infor- mation concerning the server that PUSHED the binding information, the client to which it was bound, and the lease time for the binding. This information is used when a server is removed from the server group. o POLLING While an UNBINDABLE COMPLETE POLL operation is being performed, an IP address is in the POLLING state. This ensures that if two servers are simultaneously performing an UNBINDABLE COMPLETE POLL operation that involves the same address that neither of them will succeed in making that address BINDABLE. o BINDABLE In this state, the IP address is available to be offered to a DHCP client, and if the client accepts the offer, it may be bound to that client. An IP address is only BINDABLE by a single server at a time. A server must know for precisely which IP addresses it has on its list of BINDABLE addresses. A server does not know about any other server's list of BINDABLE addresses. (Although performance optimizations are possible where a server may develop hints about this information, they are not required). An IP address can move from the BINDABLE state into the BOUND state through the normal activity of the DHCP protocol where a server interacts with a client. When this happens, the Client Binding Management portion of the protocol is used to inform other servers of the change. A server can also transfer ownership of a BINDABLE IP address to another server upon request from that other server (and without any interaction beyond that with the other server). Kinnear, Cole & Droms [Page 19] DRAFT July 1997 o BOUND An address that is BOUND is associated with a particular DHCP client, and usually is in use by that client (although it may have abandoned the lease on that IP address). It may be termed BOUND to that client. In the BOUND state the information about the client binding has not been propagated to all of the other servers in the server group. o PUSHED An address that is PUSHED is associated with a client in the same was as a BOUND address. However, an address in the PUSHED state indicates that all of the other servers in the server group have been informed of the existence of the binding to this client. When a DHCP client releases a lease on an IP address it moves from either the BOUND or PUSHED state into the UNBINDABLE state, but no explicit PUSH operation is required. When the lease time and any grace period implemented by a server both expire, then an IP address moves into the EXPIRED state. Note that only a server that actually completes a CLIENT BINDING COMPLETE PUSH will place its IP address into the PUSHED state. The servers who receive the CLIENT BINDING COMPLETE PUSH will place their IP addresses into the BOUND state. DISCUSSION: Many DHCP servers implement something called a "grace period", which is a period after the the lease on a binding expires that an IP address will not be offered to another DHCP client. A lease which is in this "grace period" is still BOUND or PUSHED as far as the inter-server protocol is concerned. o EXPIRED An IP address is EXPIRED when it was BOUND and the term of the lease (and any implemented grace period) has run out. It may be termed EXPIRED to that client. An EXPIRED IP address will transition to the UNBINDABLE state when the server who shows it as EXPIRED receives an UNBINDABLE COMPLETE POLL. It will respond to the UNBINDABLE COMPLETE POLL after making the IP address UNBINDABLE. Kinnear, Cole & Droms [Page 20] DRAFT July 1997 It may be moved back into the BOUND state by an REQUEST/INIT- REBOOT request from the previously bound client. Note that an IP address can never go from BOUND to one client to BOUND to another client without first passing through the UNBINDABLE state. The line across the middle of the state transition diagram helps to illustrate this. Further, note that the transition from POLLING to BINDABLE requires the successful completion of an UNBINDABLE COMPLETE POLL. 3.5. Overview of Server Operation This section will give a brief sketch of the of the core elements of the Client Binding Management and Address Management parts of the protocol (from the perspective of an already configured group of servers). Many of the possible cases are not described here, and this section is not to be considered definitive. The definitive description of this information is contained in Section 6 and in the case of conflicts with information found there, the information in Section 6 will govern. 3.5.1. DISCOVER Prior to the receipt of a DISCOVER message, each server should have built up a list of BINDABLE IP addresses -- for two reasons. First, because an UNBINDABLE COMPLETE POLL is required to move an IP address into the BINDABLE state, and an UNBINDABLE COMPLETE POLL may not be possible due to server failure at any given instant. Second, because even if an UNBINDABLE COMPLETE POLL was possible it would generally take too long to do between a DISCOVER and an OFFER message. A server should offer a BINDABLE address to a client upon receipt of a DISCOVER message. There are no inter-server protocol activities required when a DIS- COVER is processed and an OFFER is returned to the client (assuming of course that a BINDABLE address was available to be offered). 3.5.2. REQUEST/SELECTING When a client accepts an offer by sending a SELECTING message, then the server updates its stable storage with the binding information and ACKs the client. It must then perform a CLIENT BINDING COMPLETE PUSH operation to push the binding information to all of the other Kinnear, Cole & Droms [Page 21] DRAFT July 1997 servers (to which it can communicate at that time). There are some limitations on the lease time that can be offered to the client until at least one successful CLIENT BINDING COMPLETE PUSH has succeeded for the offering server. See Section 4.4.1 for additional details. 3.5.3. REQUEST/INIT-REBOOT In the usual case where the server who created the binding for the requesting client managed to PUSH that information to the other servers using a CLIENT BINDING COMPLETE PUSH, the receiving server will have the binding information for this client. If this informa- tion can be verified, then ACK the client -- else NAK it. If the IP address was in the EXPIRED state, then move the IP address to the PUSHED state. 3.5.4. REQUEST/RENEWING Upon receipt of a RENEWAL message (which is unicast from the client to the server), it is expected that the server will have accurate information concerning the binding of the client. If it does not, process the message like a REBINDING, below. Given that the server has information sufficient to extend the lease, it should update its stable storage with the lease extension, and then ACK the client with the extended time. Then it must perform a CLIENT BINDING COMPLETE PUSH operation to the other servers with the updated binding informa- tion. 3.5.5. REQUEST/REBINDING Upon receipt of a REBINDING message (which is broadcast from the client), the server will check to see if it has any information about the binding for this client. There are several possible cases: 1. Current information shows that this client owns the IP address. Extend the lease, update stable storage, ACK the client, and perform a CLIENT BINDING COMPLETE PUSH with the information to the other servers. 2. Current information shows that some other client is BOUND to this IP address. This is a problem. Make the IP address UNAVAILABLE (see Section 12 for details). Kinnear, Cole & Droms [Page 22] DRAFT July 1997 3. Current information says this IP address is UNBINDABLE. In this case, a server has probably created a binding and then failed to propagate the information to this server. Perform a POLL operation to see if any communicating server has any better information. If information is returned, then move to the appropriate case in this list. If no information is returned, then extend the lease on the IP address, update stable storage, ACK the client, and PUSH the information to the other servers. 3.5.6. RELEASE When a release is received, if the client matches the binding infor- mation in the server, then update stable storage with the release, set the IP address UNBINDABLE, and perform a CLIENT BINDING COMPLETE PUSH to inform other servers. If the CLIENT BINDING COMPLETE PUSH operation fails due to inability of an UPDATE message to succeed to another server, do nothing. 3.5.7. Expiration When a lease on an IP address expires, move the lease to the EXPIRED state and update stable storage with this information. From now on, if some server performs an UNBINDABLE COMPLETE POLL operation to gather information about this IP address, make the IP address UNBIND- ABLE, update stable storage, and respond with the state of the IP address as UNBINDABLE. 3.6. When a server is down or partitioned and can't be contacted When a server is down or partitioned (i.e., can't be reached), then some aspects of the normal DHCP client processing are different. This section summarizes those differences: o Client lease times for new clients will never be greater than MAXIMUM_UNPUSHED_LEASE_TIME, since a CLIENT BINDING COMPLETE PUSH cannot succeed. o No UNBINDABLE COMPLETE PUSH will succeed, and thus no server will be able to transition an address from the UNBINDABLE state into Kinnear, Cole & Droms [Page 23] DRAFT July 1997 the BINDABLE state. If a server runs low on addresses, it will have to use TRANSFER messages to acquire new addresses from other servers. 4. Client Binding Management Client binding management is the aspect of the protocol which is con- cerned with communicating information about client bindings from one server to another. It is the core of the inter-server protocol. The following messages and operations are used explicitly by a server participating in the interserver protocol when DHCP client requests and events require it, and are used implicitly by the SCSP cache alignment procedure whenever a server (re)establishes communication with another server. 4.1. Client Binding Messages o CLIENT BINDING UPDATE Update a single server with client binding information. This operation will not complete successfully unless and until that server is updated with the information being sent. o CLIENT BINDING QUERY Query a single server for its client binding information. 4.2. Client Binding Operations The operations defined in for client binding management are: o CLIENT BINDING COMPLETE PUSH This operation involves one server using the UPDATE message to inform all of the other servers about updated client binding information. While there is utility in reaching even one other server (in some cases) the operation is not deemed to have suc- ceeded unless all of the other servers were successfully updated with the new information. o CLIENT BINDING POLL This operation involves one server using the QUERY message to inquire of every other server about client binding information concerning a particular IP address. If all of the other servers Kinnear, Cole & Droms [Page 24] DRAFT July 1997 are not operational, the requesting server will use any informa- tion it receives. 4.3. Client Binding Information When binding data is sent as part of message concerned with client binding management it contains the following information: o IP Address o Expiration [expressed as a delta seconds from the current time] o Client ID o MAC Address [including the hardware type] o Last Transaction [selected from the list below] o Last Transaction Time [expressed as a delta seconds from the cur- rent time] o Last Transaction Server [an IP address] Each server must maintain as part of the binding information the "last transaction time", the "last transaction", and the "last trans- action server" associated with that binding. The last transaction time is the time at which the binding changed in response to a request (the last transaction) from the client. The last transaction time is returned in an address information message as a number of seconds from "now". The possible last transactions are listed below. This list is ordered by the precedence of the transactions and is used to help determine if a response to an address information message contains more recent information than that currently held by a server. The last transaction is one of the following: o DHCPREQUEST/SELECTING o DHCPREQUEST/REBINDING o DHCPREQUEST/INIT-REBOOT o DHCPREQUEST/RENEWING Kinnear, Cole & Droms [Page 25] DRAFT July 1997 o DHCPRELEASE o EXPIRATION The IP address state information is transmitted as well, and it con- sists of one of the following states: o UNBINDABLE o POLLING o BINDABLE o BOUND o PUSHED o EXPIRED 4.4. Initiating Client Binding Operations and Messages 4.4.1. CLIENT BINDING COMPLETE PUSH The CLIENT BINDING COMPLETE PUSH operation is initiated whenever the state of a server's client binding cache is changed, typically by the receipt of a DHCP client request or expiration of a lease. The lease time that is offered to a DHCP client must not be greater than the MAXIMUM-UNPUSHED-LEASE-TIME for that SG until at least one CLIENT BINDING COMPLETE PUSH has succeeded for that client binding. Thus, as long as the state of the IP address is BOUND, then the client should be offered the MAXIMUM-UNPUSHED-LEASE-TIME. The lease time that is sent to the other servers in the CLIENT BIND- ING COMPLETE PUSH is the lease time that the server would like to give to the DHCP client, and once a CLIENT BINDING COMPLETE PUSH has succeeded with that lease time in it (and the IP address state is set to PUSHED), then the server is free to actually extend the client's lease on the IP address with that lease time. The servers which receive the CLIENT BINDING COMPLETE PUSH will place their IP addresses into the BOUND state, not the PUSHED state. Kinnear, Cole & Droms [Page 26] DRAFT July 1997 4.4.2. CLIENT BINDING POLL The CLIENT BINDING POLL is used when the server has received a DHCP client request but believes that it has insufficient or out-of-date information concerning this client's binding. Thus, the CLIENT BIND- ING POLL is an attempt to gather more recent and up-to-date informa- tion from the other servers in the SG. DISCUSSION: Is this really necessary? Given that SCSP will "align" the caches of the servers at every reconnect, then what is the value of asking "again"? 4.4.3. CLIENT BINDING UPDATE The CLIENT BINDING UPDATE is initiated in three ways. It is initiated at the client binding management level as the under- lying operation in a CLIENT BINDING COMPLETE PUSH. It is initiated at the client binding management level when a server realizes that the server who returned information as a result of a CLIENT BINDING QUERY returned information which was less up-to-date than that avail- able to the current server. It is initiated at the SCSP level as part of the cache state alignment process. 4.5. Responding to Client Binding Messages When a server receives the following client binding messages, it should respond as detailed below. Note that operations consist of multiple messages at the initiator, but that when processing incoming requests, only individual messages are evident. 4.5.1. CLIENT BINDING QUERY The proper response to a CLIENT BINDING QUERY is to respond with the current information in the client binding cache. 4.5.2. CLIENT BINDING UPDATE The proper response to a CLIENT BINDING UPDATE is to determine if the information received is more current than that available in the server's cache. If it is not, then respond negatively to this request. If it is, then update the client binding cache, ensure that Kinnear, Cole & Droms [Page 27] DRAFT July 1997 the changes have been written to stable storage, and respond success- fully. Note that no CLIENT BINDING UPDATE should generate additional client binding message activity (i.e., the CLIENT BINDING UPDATE should not generate a CLIENT BINDING COMPLETE PUSH). When a CLIENT BINDING UPDATE is received, the IP address should be placed into the BOUND state, not the PUSHED state. Only the actual server performing the CLIENT BINDING COMPLETE PUSH will place its IP address into the PUSHED state. 5. Address Managment Address management is the aspect of the protocol concerned with man- aging the state of IP addresses that are not currently bound to any client. It is a necessary part of the protocol in order to support certain goals in the client binding management part of the protocol, principally that of allowing a server to continue to operate even though it was partitioned from other servers in the server group. 5.1. Address Management Operations o UNBINDABLE COMPLETE POLL In this operation, all of the other servers are contacted using a QUERY operation concerning one (or more) IP addresses, and they all report on whether that IP address(es) is UNBINDABLE or not. If they are UNBINDABLE, then the current information on that IP address is also reported (as in a CLIENT BINDING POLL). In con- trast to a CLIENT BINDING POLL, this operation fails if any server cannot be contacted or if any server answers the QUERY with a negative answer (i.e., the IP address is not currently UNBINDABLE). It succeeds when all of the servers answer that the address is UNBINDABLE. There is a subtle interaction required with the group management layer of the protocol. A successful UNBINDABLE COMPLETE POLL must be inhibited in certain cases where a server has been removed from a server group. The case is question is that where a server is removed from a server group by a different server. Immediately after this hap- pens, all UNBINDABLE COMPLETE POLLS must fail for a period equal to the MAXIMUM-UNPUSHED-LEASE-TIME. After that time passes, then UNBINDABLE COMPLETE POLLS may operate as they normally do. Kinnear, Cole & Droms [Page 28] DRAFT July 1997 DISCUSSION: This covers the situation where a server gives a lease to a while both the client and server are partitioned. Then, the server goes away completely. The client stays up, but remains partitioned. Then, the dead server is removed by another server from the server group. At this point, UNBINDABLE COM- PLETE POLL operations could (except for the above restriction) begin to complete successfully. However, the client that was given a lease while partitioned along with the server that died certainly has an address, and when the partition is removed (just after the UNBINDABLE COMPLETE POLL operation which declared its IP address now BINDABLE for some server), there would be a very dangerous situation developing. The solution is to only offer leases to clients of the MAXIMUM- UNPUSHED-LEASE-TIME until the information concerning their client binding reaches all of the other servers in the group. Once that happens, then they can be offered the normal lease time. Thus, whenever any server is removed from the group (where it doesn't remove itself), then there is a possibility that it may have offered leases to clients about which no other server would have any record. In this case, the remaining servers must wait the MAXIMUM-UNPUSHED-LEASE-TIME before being able to complete an UNBINDABLE COMPLETE POLL and reuse the BINDABLE addresses that the removed server was using. 5.2. Address Management Messages The following messages are part of the address management portion of the protocol. o TRANSFER This message is used to transfer BINDABLE IP addresses from one server to another (especially when the SG is partitioned and the normal UNBINDABLE COMPLETE POLL cannot be used to make an IP address BINDABLE, but also when all of the UNBINDABLE IP addresses have already been made BINDABLE by some server). The information sent from the initiating to the responding server includes the subnet specification and the number of BINDABLE IP addresses the initiating server has available for that address pool, and the number of BINDABLE IP addresses it is requesting. Kinnear, Cole & Droms [Page 29] DRAFT July 1997 The responding server is free to give the initiating server all, some, or none of the number of IP addresses the initiating server has requested. o UNBINDABLE QUERY The UNBINDABLE QUERY operation is the primitive query from which the UNBINDABLE COMPLETE POLL is constructed. It is identical to the CLIENT BINDING QUERY defined above in terms of the data returned, although the actions taken when it is received are slightly different. 5.3. Initiating Address Management Operations and Messages o UNBINDABLE COMPLETE POLL (operation) This operation is initiated when the server detects that it needs to generate more BINDABLE IP addresses. It will initiate this operation whenever the number of BINDABLE IP addresses drops below a configurable threshold. Prior to initiating this operation, the server must change the state for each IP address that will be part of the UNBINDABLE COMPLETE POLL from UNBINDABLE to POLLING, and commit this state change to stable storage. DISCUSSION: Is the commit to stable storage really necessary? Given that we will abandon the POLL if we reboot (presumably), what is the value of remembering that we were doing it? For every IP address for which the UNBINDABLE COMPLETE POLL oper- ation fails (i.e., some server responds in such a way that indi- cates that the IP address is not UNBINDABLE, or some server fails to respond at all), the IP address' state should be reset to UNBINDABLE. o TRANSFER (message) The TRANSFER message, which attempts to transfer some IP addresses from some other server to the initiating server, is initiated whenever the number of BINDABLE IP addresses in an address pool falls below a configurable threshold. Kinnear, Cole & Droms [Page 30] DRAFT July 1997 5.4. Responding to Address Management Messages o TRANSFER When receiving a TRANSFER message, the responding server inspects its list of BINDABLE addresses for the address pool to which the TRANSFER operation refers. It will attempt to offer the initiat- ing server as many addresses as it requested, with the limitation that it will never give away more than half of its pool of BIND- ABLE addresses in any one request. o UNBINDABLE QUERY The responding server will respond to this query just like it responds to a CLIENT BINDING QUERY as far as the information com- municated to the initiating server is concerned. In addition, if the IP address mentioned in this query was in the EXPIRED state, prior to responding to this message, the respond- ing server will move that IP address to the UNBINDABLE state, commit this change to stable storage, and then respond with information that indicates the IP address in question was UNBIND- ABLE. Note that an UNBINDABLE QUERY will not be generated to any server if at least one server in the SG is currently not able to be con- tacted, as known by the SCSP "Hello" subprotocol. This will pre- vent unnecessary transitions from the EXPIRED to the UNBINDABLE state when an UNBINDABLE COMPLETE POLL would not be able to com- plete in any case. 6. Actions in Response to DHCP Client Messages and Events This section defines the actions that should be taken in the client binding and address management portions of the protocol when incoming DHCP requests (messages) are received. DISCUSSION: There is considerable commonality in the sections that describe the various DHCP client messages below. Once the details have stabilized, it should be possible to compress the explanations. Kinnear, Cole & Droms [Page 31] DRAFT July 1997 6.1. DISCOVER Prior to the receipt of a DISCOVER message, each server should have built of a list of BINDABLE IP addresses -- for two reasons. First, because a CLIENT BINDING COMPLETE POLL is required to get a BINDABLE IP address, and a CLIENT BINDING COMPLETE POLL may not be possible due to server failure at any given instant. Second, because even if a CLIENT BINDING COMPLETE POLL were possible, it would be unwise to require such an operation between a receipt of a DISCOVER message and the response of an OFFER to a client. There are several cases involved in processing a DISCOVER request, depending on the state of the requested IP address in the DISCOVER request: o No specific IP address requested. Offer a BINDABLE address to the client. Record that this address was offered in the cache memory of the server, but there is no need to update the stable storage of the server with any informa- tion. The IP address continues to be BINDABLE as far as the inter-server protocol is concerned. o Requested IP address is UNBINDABLE. If the IP address is UNBINDABLE, then perform a UNBINDABLE COM- PLETE POLL operation in an attempt to make the IP address BIND- ABLE. If the operation is successful, then respond as though the IP address were BINDABLE, below. If the results of the attempt to make the IP address BINDABLE resulted in a discovery that the IP address is now BOUND or PUSHED, then respond as for BOUND our PUSHED, below. Otherwise (i.e., the IP address is BINDABLE for some other server, or no an UNBINDABLE COMPLETE POLL was not pos- sible) then respond as above for "No specific IP address requested". o Requested IP address is BINDABLE. Offer the IP address to the client. IP address remains BINDABLE. o Requested IP address is BOUND or EXPIRED. If the IP address is BOUND or EXPIRED to the requesting client, then set it to BOUND and offer it to the client -- with a lease time of MAXIMUM-UNPUSHED-LEASE-TIME. Otherwise (i.e., the IP address is BOUND or EXPIRED to some other client), respond as in "No specific IP address requested", above. Kinnear, Cole & Droms [Page 32] DRAFT July 1997 o Requested IP address is PUSHED. If the IP address is PUSHED to the requesting client, then offer it to the client -- with a normal lease time. Otherwise (i.e., the IP address is PUSHED to some other client), respond as in "No specific IP address requested", above. 6.2. REQUEST/SELECTING The client uses a REQUEST/SELECTING to accept the offer of a lease made by a server. When a server receives such a message, and where the server-id option reflects the IP address of that server, then if the IP address is in the following states the server should respond in the following way: o UNBINDABLE If the IP address is UNBINDABLE, then perform a UNBINDABLE COM- PLETE POLL operation in an attempt to make the IP address BIND- ABLE. If that operation is successful, then respond as though the IP address were BINDABLE, below. If the results of the attempt to make the IP address BINDABLE resulted in a discovery that the IP address is now BOUND, then respond as for BOUND, below. Otherwise (i.e., the IP address is BINDABLE for some other server, or no a complete POLL was not possible) NAK the REQUEST. o BINDABLE If the IP address is BINDABLE and has been offered to the requester, then bind the IP address to the client, set the IP address BOUND, and update stable storage. Then, ACK the client, and finally perform a PUSH operation of the binding information to the other servers. o BOUND or EXPIRED If the IP address is BOUND or EXPIRED to the requesting client, then set the state to BOUND, update the expiration time using the normal lease time, update stable storage, ACK the client with the MAXIMUM-UNPUSHED-LEASE-TIME, and perform a CLIENT BINDING COM- PLETE PUSH with the normal lease time. If the IP address is BOUND or EXPIRED to a different client, then NAK this REQUEST. Kinnear, Cole & Droms [Page 33] DRAFT July 1997 o PUSHED If the IP address is PUSHED to the requesting client, set the IP address to be PUSHED, update the expiration time, update stable storage, and ACK the client. Finally, perform a CLIENT BINDING COMPLETE PUSH operation of the updated binding information to the other servers. Use the normal lease time in all of the above operations. If the IP address is PUSHED to some other client, then NAK the request. 6.3. REQUEST/INIT-REBOOT The client uses a REQUEST/INIT-REBOOT to query the server (as part of the client boot process) to determine if a "remembered" binding is still valid. If the requested IP address will be in one of the fol- lowing states: o UNBINDABLE If the IP address is UNBINDABLE, then perform a UNBINDABLE COM- PLETE POLL operation in an attempt to make the IP address BIND- ABLE. If the operation is successful, then respond as though the IP address were BINDABLE, below. If the results of the attempt to make the IP address BINDABLE resulted in a discovery that the IP address is now BOUND, then respond as for BOUND, below. Oth- erwise (i.e., the IP address is BINDABLE for some other server, or a complete POLL was not possible) NAK the REQUEST. DISCUSSION: This means that if a server creates a binding for a client and fails to PUSH the information to any other server prior to undergoing a server failure, and if the client is powered off prior to the time when it will issue a REBINDING message, it will not get back the same lease when it is powered back on. The reasoning for this (and the difference from the REBINDING case below) is that in this case the server has no way to determine if the requested address in the INIT-REBOOT request is current or perhaps very old indeed. In the REBINDING case the client is currently using the address, so the client at least believes that it is current and not in use by some other client. In this case, however, no such assumption is possi- ble. Kinnear, Cole & Droms [Page 34] DRAFT July 1997 In the case where a server which creates a binding fails prior to PUSHing the information about a lease to some other server, and the client which receives that binding makes a REBINDING request prior to either failing or being shutdown, it will get back the existing binding upon restart and INIT-REBOOT -- since the REBINDING will have caused a recovery of the binding information and that will have been distributed through a CLIENT BINDING COM- PLETE PUSH. o BINDABLE If the IP address is BINDABLE, then bind the IP address to the client, set the IP address BOUND, and update stable storage. Then, ACK the client, and finally perform a PUSH operation of the binding information to the other servers. o BOUND or EXPIRED If the IP address is BOUND or EXPIRED to the requesting client, then set the state to BOUND, update the expiration time using the normal lease time, update stable storage, ACK the client with the MAXIMUM-UNPUSHED-LEASE-TIME, and perform a CLIENT BINDING COM- PLETE PUSH with the normal lease time. If the IP address is BOUND or EXPIRED to a different client, then NAK this REQUEST. o PUSHED If the IP address is PUSHED to the requesting client then set the IP address PUSHED, update the expiration time, update stable storage, and ACK the client. Finally, perform a CLIENT BINDING COMPLETE PUSH operation of the updated binding information to the other servers. Use the normal lease time for all of the above operations. If the IP address is PUSHED to some other client, then NAK the request. 6.4. REQUEST/RENEWING Upon receipt of a RENEWAL message (which is unicast from the client to the server), it is expected that the server will have accurate information concerning the binding of the client since this is the server that the client believes most recently sent an ACK to the client concerning this IP address binding. Kinnear, Cole & Droms [Page 35] DRAFT July 1997 Perform the following actions if the IP address being renewed (i.e., the IP address in ciaddr) is in one of these states: o UNBINDABLE If the IP address is UNBINDABLE, then perform an UNBINDABLE COM- PLETE POLL operation in an attempt to make the IP address BIND- ABLE. If the operation is successful, then respond as though the IP address were BINDABLE, below. If the results of the attempt to make the IP address BINDABLE resulted in a discovery that the IP address is now BOUND, then respond as for BOUND, below. If the IP address is determined to be BINDABLE for some other server, then NAK the request, and set the IP address to be UNAVAILABLE since this likely represents a duplicate allocation of an IP address (see Section 11, Open Questions, for details). Otherwise NAK the request. o BINDABLE If the IP address is BINDABLE, then bind the IP address to the client, set the IP address BOUND, and update stable storage. Then, ACK the client, and finally perform a PUSH operation of the binding information to the other servers. o BOUND or EXPIRED If the IP address is BOUND or EXPIRED to the requesting client, then set the state to BOUND, update the expiration time using the normal lease time, update stable storage, ACK the client with the MAXIMUM-UNPUSHED-LEASE-TIME, and perform a CLIENT BINDING COM- PLETE PUSH with the normal lease time. If the IP address is BOUND or EXPIRED to a different client, then NAK this REQUEST. o PUSHED If the IP address is PUSHED to the requesting client then set the IP address PUSHED, update the expiration time, update stable storage, and ACK the client. Finally, perform a CLIENT BINDING COMPLETE PUSH operation of the updated binding information to the other servers. Use the normal lease time for all of the above operations. If the IP address is PUSHED to some other client, then NAK the request and set the IP address to UNAVAILABLE. (see Section 11, Kinnear, Cole & Droms [Page 36] DRAFT July 1997 Open Questions, for details). 6.5. REQUEST/REBINDING Upon receipt of a REBINDING message (which is broadcast from the client), the server will check to the state of the address requested for rebinding (i.e., the ciaddr). There are several cases possible: o UNBINDABLE If the IP address is UNBINDABLE, then perform an UNBINDABLE COM- PLETE POLL operation in an attempt to make the IP address BIND- ABLE. If the operation is successful, then respond as though the IP address were BINDABLE, below. If the results of the attempt to make the IP address BINDABLE resulted in a discovery that the IP address is now BOUND, then respond as for BOUND, below. If the IP address is determined to be BINDABLE for some other server, then NAK the request. Set the IP address to be UNAVAIL- ABLE since this likely represents a duplicate allocation of an IP address (see Section 11, Open Questions, for details). If no information is returned from any server that this IP address is anything but UNBINDABLE, then consider the address BOUND to this client, and proceed as in BOUND below. DISCUSSION: This is one of the key points of the inter-server protocol. In this case, a server has created a binding and then failed prior to telling any other server about that binding. Eventu- ally, the client to whom that binding was made will attempt a REQUEST/REBINDING and contact a different server. That dif- ferent server will be able to determine nothing about that IP address. As far as can be determined, it is not BOUND to any client, and it is not BINDABLE for any other server. In this restricted case, the server will renew the lease for the client and move the IP address into the BOUND state -- and PUSH this information to the rest of the servers. How can this be safe? Well, remember that the client is presently using the IP address to make this request. In this limited case where a server crashes before PUSHing information about a BOUND IP address to any other server, the client to whom the IP address is BOUND is the only running machine with any record of that binding. In this case, the DHCP servers will accept that client's information about the binding as Kinnear, Cole & Droms [Page 37] DRAFT July 1997 correct. o BINDABLE If the IP address is BINDABLE, then bind the IP address to the client, set the IP address BOUND, and update stable storage. Then, ACK the client, and finally perform a PUSH operation of the binding information to the other servers. o BOUND or EXPIRED If the IP address is BOUND or EXPIRED to the requesting client, then set the state to BOUND, update the expiration time using the normal lease time, update stable storage, ACK the client with the MAXIMUM-UNPUSHED-LEASE-TIME, and perform a CLIENT BINDING COM- PLETE PUSH with the normal lease time. If the IP address is BOUND or EXPIRED to a different client, then NAK this REQUEST. o PUSHED If the IP address is PUSHED to the requesting client then set the IP address PUSHED, update the expiration time, update stable storage, and ACK the client. Finally, perform a CLIENT BINDING COMPLETE PUSH operation of the updated binding information to the other servers. Use the normal lease time for all of the above operations. If the IP address is PUSHED to some other client, then NAK the request and set the IP address to UNAVAILABLE. (see Section 11, Open Questions, for details). 6.6. RELEASE When a RELEASE is received, an IP address will be in one of the fol- lowing states: o UNBINDABLE If the IP address is UNBINDABLE, then perform a CLIENT BINDING POLL operation in an attempt to determine if this IP address is BOUND to any client. If the results of the POLL operation indicate that the IP address is now BOUND, then respond as for BOUND, below. Kinnear, Cole & Droms [Page 38] DRAFT July 1997 If the IP address is determined to be BINDABLE for some other server, then NAK the request. Set the IP address to be UNAVAIL- ABLE since this likely represents a duplicate allocation of an IP address (see Section 11, Open Questions, for details). Otherwise, ignore the RELEASE. o BINDABLE If the IP address is BINDABLE, ignore the RELEASE. o BOUND, PUSHED, or EXPIRED If the IP address is BOUND, PUSHED, or EXPIRED to the requesting client set the IP address to be UNBINDABLE, update stable stor- age, and perform a CLIENT BINDING COMPLETE PUSH to update the other servers with this information. 6.7. Lease Period Expiration When the lease period on a BOUND or PUSHED IP address expires, set the IP address to be EXPIRED and update stable storage. 7. Group Management The group management part of the protocol is concerned with configur- ing a server into or out of a server group (SG). It allows discovery of information concerning the configuration of an existing server group as well as the address pools that are managed by a server group. While it is possible to conceive of a statically defined server group, the operational characteristics (both for group startup as well as removal of a server from a group) are quite painful. Group management messages are used add a server to a group as well as to remove a server from a group. A server must add itself to a group -- it cannot be added by another server. A server may be removed by any server in the group, including itself. In addition to changing the group membership, group management mes- sages are used to keep the various servers up to date with respect to the current membership of the group. Once a server successfully become part of a group using the group management messages, it the goes into the SCSP protocol. This proto- col determines which servers in the SG are currently in communication with this server, and starts an automatch "cache alignment" process Kinnear, Cole & Droms [Page 39] DRAFT July 1997 with each connected server. 7.1. Group Management Operations o SG CHANGE The SG CHANGE operation is a two-stage operation made up of a propose and then a commit phase. It uses the SG PROPOSE CHANGE and SG COMMIT CHANGE messages as part of this operation. It is used to change the membership of the group, either to add a server or to remove a server. 7.2. Group Management Messages o SG DISCOVERY QUERY The first stage of becoming a server participating in the inter- server protocol is to determine the existing SG ID for each SG for which participation in the inter-server protocol is desired. Assuming that a server has been provided or can discover the IP address of a server maybe in a group to which it wants to join, a server who wants to become a member of a group will send a SG DISCOVERY QUERY message to that server. The reply to the SG DISCOVERY QUERY message is a message which contains the list of SG identifiers for all of the groups to which the replying server belongs. These SG ids can then be used in SG CONFIGURATION messages to determine more information about each SG. This operation is performed only upon one server at a time, since at this point there is no notion of a "current" server group. o SG CONFIGURATION QUERY The SG CONFIGURATION QUERY operation has several suboperations, corresponding to the following types of configuration informa- tion: subnets, IP addresses, client configuration information, and vendor specific information. Each SG CONFIGURATION QUERY operation is read-only to the receiv- ing server. The particular SG CONFIGURATION QUERY suboperations are: Kinnear, Cole & Droms [Page 40] DRAFT July 1997 o Subnets The specific subnets managed by this SG are returned in this as part of this operation. o IP Addresses The IP addresses which are managed by this SG within this sub- net are return as the result of this operation. o Client Configuration Information The client configuration information associated with this sub- net is returned as the result of this operation. o Vendor Specific Information Provision is made for vendor specific configuration information to be returned in the SG CONFIGURATION message. Its format is TBD, but should be regular even though vendor specific. o SG PROPOSE CHANGE UPDATE The SG PROPOSE CHANGE UPDATE message is sent to all of the servers in a SG to propose a new membership in the server group. The information sent with this message is an updated list of the servers in the group. The servers to add to the group and servers to remove from the group are both listed in the same mes- sage. o SG COMMIT CHANGE UPDATE The SG COMMIT CHANGE UPDATE message is sent to all of the servers in the SG to commit a change the was proposed in a SG PROPOSE CHANGE operation. 7.3. Initiating Group Management Operations and Messages 7.3.1. SG CHANGE (operation) The SG CHANGE operation consists of the the following steps: o Determine the group membership using an SG CONFIGURATION message. Find out to whom to send all of the SG CHANGE messages. Kinnear, Cole & Droms [Page 41] DRAFT July 1997 o Send a SG PROPOSE CHANGE message to every member of the SG. This message has the current group specifier in the message, along with the new group membership. As the joining server cycles through the existing members of the group, it will be rationalizing the group specifiers among the group and the entire group's picture of the membership of the group. If it encounters a server whose view of the group membership lags behind that of the server from which the joining server received its idea of group membership, then it will bring that server up to date. If, on the other hand, it encounters a server that has a more up to date version of the group membership than the one from which it is operating, it will have to update its idea of the group membership and then start the proposal sequence over. All of the servers with which it has created proposals will be forced to update their view of group membership as part of this process. At the end of this process of proposal generation, all of the servers in the group share a common picture of both the group membership as well as the current proposal. o Reverify the group membership from at lease one server using an SG CONFIGURATION message. This is to ensure that all of the members of the group have actu- ally been sent a SG PROPOSE CHANGE message. o Check the proposal timer. The initiating server must have started a timer when it sent out the first SG PROPOSE CHANGE message, and if that timer has less than time/2 time left on it, the joining server SHOULD start the process over. o Send a SG COMMIT CHANGE message to every member of the SG. As soon as this completes successfully with one server, the server has changed the membership of the group, but the initiat- ing server MUST continue to try to update the other servers as long as they remain in the server group. 7.3.2. SG DISCOVERY QUERY (message) This is sent when a server wishes to know the groups to which another server is a member. It is used primarily when starting up a server in the initial discovery of the server group configuration. Kinnear, Cole & Droms [Page 42] DRAFT July 1997 7.3.3. SG CONFIGURATION QUERY (message) This message is sent to determine the details of the configuration of the server group. A server would typically initiate these messages as part of the process of confirming that it wished to be part of a particular server group. The SG CONFIGURATION QUERY operation has several suboperations, cor- responding to the following types of configuration information: o Subnets The specific subnets managed by this SG are returned in this as part of this operation. o IP Addresses The IP addresses which are managed by this SG within this subnet are return as the result of this operation. o Client Configuration Information The client configuration information associated with this subnet is returned as the result of this operation. o Vendor Specific Information Provision is made for vendor specific configuration information to be returned in the SG CONFIGURATION QUERY message. Its format is TBD, but should be regular even though vendor specific. 7.4. Responding to Group Management Messages 7.4.1. SG PROPOSE CHANGE UPDATE Upon receipt of a SG PROPOSE CHANGE UPDATE message, if no existing proposal exists that has not timed out, a server will create a single "proposed" group specifier from the current group specifier by incre- menting the group sequence number by 1. The creation of this pro- posed group specifier will inhibit the creation of another proposed group specifier for a 30 seconds. If an existing proposal exists that has not timed out, the responding will respond negatively to the SG PROPOSE CHANGE UPDATE message. Kinnear, Cole & Droms [Page 43] DRAFT July 1997 DISCUSSION: Clearly a deadlock situation can occur where two servers are try- ing to join a group at the same time, and each is working from "opposite ends" of the group. In this case, where the joining server gets a failure from a SG PROPOSE CHANGE UPDATE message due to the existence of a valid proposal that has not timed out, then the joining server should backoff an amount of time that is based in part on its IP address before trying again. The exact algo- rithm is TBD. This proposed group specifier will not be used in any messages until it moves to the accepted stage and become the current group specifier (see below for how it does that). If a second SG PROPOSE CHANGE UPDATE request is received from a server, that message will supersede the existing proposal and the timer will be reset. DISCUSSION Is there some possible attack here? Should we limit one servers proposals from tying up the "proposal" for more than 3 minutes at a time, for instance? 7.4.2. SG COMMIT CHANGE UPDATE Upon receipt of a SG COMMIT CHANGE UPDATE message, the current pro- posal is compared with the data in the SG COMMIT CHANGE UPDATE mes- sage, and if it compares successfully, the proposed new group becomes the current group and the group specifier is changed. Once a SG COMMIT CHANGE UPDATE message is received, the receiving server MUST examine all of its IP addresses. For every IP address for which the "last transaction server" is a server which was previ- ously in the group and is now not in the group, the following action should be taken: If the IP address is shown as ever having been BOUND to a client, and if that client does not now have a different IP address, then the IP address should be set to BOUND to that client, the lease time should be restarted for the previously recorded lease time. DISCUSSION: This is a key aspect of the protocol in terms of safely removing possibly partitioned servers from the group. The specific case Kinnear, Cole & Droms [Page 44] DRAFT July 1997 that this protects against is as follows. If a connected server creates a client binding, and successfully performs a CLIENT BINDING COMPLETE PUSH operation, and then renews its client's lease for the full lease time -- and then becomes partitioned, there can be problems if that server is ultimately removed from the group much later. If the server is partitioned for longer than the client's lease time, and if all of the other servers move this IP address to EXPIRED, and if then some server tries (unsuccessfully) to perform an UNBINDABLE COMPLETE POLL -- which will move the EXPIRED addresses to UNBINDABLE. Now, the partitioned server has updated the client several times, and the other servers by this time all believe that the IP address is UNBINDABLE. If the partitioned server then fails and is removed from the SG -- the other servers could (in the absence of the above algorithm) believe that they only need wait the MAXIMUM- UNPUSHED-LEASE-TIME before then can make those UNBINDABLE addresses BINDABLE. But in this case that would cause a failure. Thus, when a server is removed from a SG, each remaining server must look around for any IP addresses that it previously PUSHED, and set them up with their previous maximum lease time in order to catch this case. 7.4.3. SG DISCOVERY QUERY The server groups to which the current server belongs are returned as the response to an SG DISCOVERY QUERY message. 7.4.4. SG CONFIGURATION QUERY The SG CONFIGURATION QUERY operation has several suboperations, cor- responding to the following types of configuration information: o Subnets The specific subnets managed by this SG are returned in this as part of this operation. o IP Addresses The IP addresses which are managed by this SG within this subnet are return as the result of this operation. o Client Configuration Information The client configuration information associated with this subnet is returned as the result of this operation. Kinnear, Cole & Droms [Page 45] DRAFT July 1997 o Vendor Specific Information Provision is made for vendor specific configuration information to be returned in the SG CONFIGURATION QUERY message. Its format is TBD, but should be regular even though vendor specific. 8. SCSP Message Mapping This section develops the SCSP capabilities supporting the DHCP interserver protocol. The Server Cache Synchronization Protocol (SCSP) is found in [1]. The organization of this section is 1) we present a brief overview of SCSP (and refer to appendices for a more detailed discussion), 2) we discuss the mapping of the DHCP inter- server protocol onto SCSP and how the various DCHP interserver mes- sages are mapped into SCSP messages, 3) we identify the modifications to the SCSP protocol as identified in [1] necessary for the mapping of the DHCP interserver protocol onto SCSP, 4) we present the spe- cific formats of the DHCP protocol specific SCSP records and 5) we present a list of the open issues with respect to the mapping onto SCSP. 8.1. SCSP Overview The Server Cache Synchronization Protocol (SCSP) is a protocol which provides the generic functions necessary to provide loose synchro- nization between a set of distributed databases. The protocol, which is presented in [2], was developed to specifically address to issues associated with synchronizing the caches of redundant servers which provide the server functionality of a specific client-server proto- col. SCSP was built based upon the extensive experience in develop- ing and running link state routing protocols such as OSPF [3]. Client server protocols for which a redundant server capability is being developed using SCSP are NHRP [4] and ATM ARP [5]. Here we present the use of SCSP to synchronize servers supporting the DHCPv4 client-server protocol. The SCSP protocol consist of three separate sub-protocols, i.e., o The "Hello" protocol: this protocol defines and maintains the status of the inter-server connection, o The "Cache Alignment" protocol: this protocol defines the cache synchronization capability for new servers and servers that, for whatever reason, have lost synchronization, and o The "Client State Update" protocol: this protocol provides the ongoing server cache synchronization through asynchronous client Kinnear, Cole & Droms [Page 46] DRAFT July 1997 state updates. These sub-protocols define the semantics and high-level syntax of generic message sets and their exchanges in support of the capabili- ties provided. The SCSP associates replica databases into Server Groups (SG). The SCSP supports both point-to-point and point-to- multipoint connections between the local servers (LS) and the directly connected servers DCS(es). We discuss each of these sub- protocols in more detail in the appendices below. SCSP defines five message types in the operation of the above subpro- tocols: o Hello o Cache Alignment (CA) o Cache State Update (CSU) Solicit (CSU_Sol) o CSU Request (CSU_Req) o CSU Reply (CSU_Rep). The Hello and the CA messages are used within the Hello and the Cache Alignment subprotocol respectively. The CSU_Sol, CSU_Req and CSU_Rep messages are used to distribute cache records between the distributed servers of a server group. Full records are called Client State Advertisement (CSA) records. Summary records, which are essentially pointers to the full records, are called Client State Advertisement Summary (CSAS) records. For a server to request a particular record, it can send a CSU_Sol message containing the CSAS to indicate the full record of interest. A server which receives a CSU_Sol is required to respond with a CSU_Req message containing the full CSA record associated with the CSAS of the CSU_Sol. The soliciting server follows the receipt of the CSU_Req with a CSU_Rep to acknowledge receipt. A server which wishes to communicate a full record to the rest of the SG would transmit a CSU_Req message containing the full CSA record. This is acknowledged with a CSU_Rep message. DISCUSSION In some cases the CSU_Sol, CSU_Req, CSU_Rep sequence is overkill when one wants to perform a simple query operation. See the dis- cussion at the end of Section 8.3 for more details. For now we accept that these capabilities are generically provided Kinnear, Cole & Droms [Page 47] DRAFT July 1997 discuss the DHCPv4 interserver protocol specific overlay on SCSP. 8.2. Mapping DHCP interserver onto SCSP This section presents the relationship of SCSP to the DHCP inter- server protocol, the assumptions made in developing this relationship and the specific mappings of DHCP interserver messages into SCSP. The assumptions made in defining the DHCP client/server protocol map- ping onto SCSP are the following: o On the Issue of Protocol Encapsulation: The assumption is that the SCSP messages, and in fact all inter- server messages, are to be defined over UDP. Currently the SCSP messages within [2] are LLC/SNAP encapsulated. o On the Interserver over SCSP Layering Model: The interserver group management protocol will initialize a server into the group upon initial join, re-booting or re- connecting. Once this is complete the interserver group manage- ment protocol will initialize the SCSP protocol to handle the ongoing operation of the interserver cache alignment and address management functions. o On the DHCP Interserver Sub-Protocols: The current thinking goes as follows. The draft specification defines three DHCP interserver sub-protocols, i.e., the 'Client Binding Management' protocol (see Section 4), the 'Address Man- agement' protocol (see Section 5), and the 'Group Management' protocol (see Section 7). The 'Client Binding Management' sub- protocol addresses the core of the interserver protocol in that it distributes and maintains the client binding records over the distributed SG. This sub-protocol is to be mapped onto SCSP and is assigned a unique SCSP 'Protocol ID' value, e.g., the SCSP ProtID = 4 assigned to DCHP. For this draft we assume that the Group Management sub-protocol is run on a separate UDP port from the SCSP UDP port. The Group Mgmt sub-protocols will be assigned a unique UDP port number = tbd. We had no compelling reason to carry the Address Management subprotocol on SCSP as for the Client Binding protocol, however for this draft we mantain both these sub-protocols within SCSP. If at a later date it is deemed useful to separate these two protocol 1) we can define separate SCSP protocol types for the Cache Management and the Address Man- agement protocols, yet support them with a common Hello protocol link via the Hello protocol Family type field or 2)we can move Kinnear, Cole & Droms [Page 48] DRAFT July 1997 the address management sub-protocol out from SCSP as in the case of the Group management sub-protocol. The mappings between the interserver messages and the SCSP mes- sages will cover the interserver messages handling client binding and address management, but not the group management protocol functions of the interserver protocol. The group management messages are to be defined outside of SCSP, however these mes- sages will follow the syntax of the SCSP message sets to simplify the parsing of the total message sets required within the DHCP interserver protocol. The client binding management operations are CLIENT BINDING COM- PLETE PUSH and CLIENT BINDING POLL. CLIENT BINDING COMPLETE PUSH is required to distribute binding information and to increase the initial lease period to the desirable lease period. The CLIENT BINDING POLL is required to solicit information on client bind- ings in the event that the specific server has no record of the client requested binding. The Interserver messages supporting these operations are the CLIENT BINDING UPDATE and the CLIENT BINDING QUERY messages, respectively. The SCSP records for these operations are 'Binding' records for the update and query mes- sages. The Address Management operations are UNBINDABLE COMPLETE POLL and TRANSFER. The UNBINDABLE COMPLETE POLL initializes an address as bindable by the LS. The TRANSFER allows for the transfer of a block of bindable addresses between servers. The Interserver messages supporting these operations are the UNBIND- ABLE QUERY and the TRANSFER messages. The SCSP records for these operations are 'Address' records for the UNBINDABLE QUERY and 'Bindable Block Address' records for the TRANSFER messages. The Group Management messages are SG DISCOVERY Query, SG CONFIGU- RATION QUERY, SG PROPOSE CHANGE UPDATE and SG COMMIT CHANGE UPDATE. The SCSP records associated with these operations are 'SG Specifier' records for the SG DISCOVERY QUERY, 'SG Subnets' records for the SG CONFIGURATION QUERY, 'SG Members' records for the SG DISCOVERY Query, and 'SG Proposed Members' records for the SG PROPOSE CHANGE UPDATE and SG COMMIT CHANGE UPDATE messages. o On DHCP Interserver Authentication: The interserver protocol will rely on the authentication exten- sions within SCSP for the SCSP message authentication between servers within a server group. The authentication of the inter- server group management protocol messages are tbd. Kinnear, Cole & Droms [Page 49] DRAFT July 1997 o On the Notion of Server Ownership of Binding Records: It will be assumed that once the initial client binding record is generated by a particular server, that record will indicate that server as the originating server in the SCSP 'Originating Server ID' field. Any further changes to that binding, whether by the originating server or by another server, e.g., the originating server is down and the client is Rebinding and getting a lease extension from another server, that server does change the Origi- nating Server ID in the SCSP record field to indicate itself as the last transaction server. o On a More Efficient Cache Alignment Process: The cache alignment process can be made more efficient if the servers time stamp their cache records. In the event that the connections between servers fails, the servers determine and record the failure time. Upon reconnecting and cache alignment, the SCSP CRL list can be limited to those records that are more 'recent' than the failure and therefore greatly reduce the time and the bandwidth required. The details are presented below. Also, it is not necessary to perform a cache alignment of the address records for the proper operation of the Interserver pro- tocol. Therefore, we assume that the SCSP cache alignment pro- cess will not include these address records when building the SCSP CRL. o On the More Recent Record Determination: SCSP relies on the ability of identifying the more recent-ness of records when aligning and updating the cache based upon the CSA Sequence Number. For binding records this implies that in situa- tions where it is clear that a single server is updating the binding, e.g., extending the lease, then it should increment the CSA Sequence number by one. However there are situations in DHCP where multiple servers can simultaneously update the client bind- ing and it is not clear which of these updated bindings is accepted by the client, e.g., the client is in the rebinding state and the originating server is down and the other servers received the client broadcast request and the client gets multi- ple DHCPACKs extending the lease. In these situations the servers are required to increment the CSA sequence numbers by one and indicate that they are the last transaction server. Then, when a server caches the record, if it already has a cache record for that binding (as indicated by the Cache Key) it should replace the existing record only if the new record indicates a lease period which is greater than the existing record. Kinnear, Cole & Droms [Page 50] DRAFT July 1997 o On Maximally Defined Binding Records (or the B.Hibbs' Question): B.Hibbs' posed the question regarding the nature of the configu- ration synchronization of the servers within the same SG; Does the DHCP Interserver protocol require synchronization of all con- figuration parameters or a subset? We are assuming that there is a minimal set of configuration and client binding information to be synchronized across the members of the SG to ensure the cor- rect operation of the DHCP Client/Server protocol. This informa- tion must be carried in the interserver messages to synchronize the members in the SG with respect to this information. Further, there may be other client binding information that the members want to communicate; we currently have this information encoded as optional in this draft. The parameters encoded into the 'Client Binding' records are those which are minimally required for the correct operation of the DHCP Client/Server protocol. The interserver protocol should allow for situations where the configuration of the servers of the same server group are not strictly aligned; their configura- tions are only required to be aligned in the specification of the subnets and masks that are covered with a SG and the list of assignable addresses within each of the subnets. However, because clients DHCPDISCOVER messages can contain client specific requests for parameters, it may be desirable to embed a fuller set of parameters (committed to the client in the DHCPOFFER mes- sage) within the CSA record. This fuller set of parameters may be included in the initial CLIENT BINDING COMPLETE PUSH (encoded in the optional fields location in the record). The server in receipt of a CLIENT BINDING COMPLETE PUSH may chose not to cache or forward these optional parameters. o On Knowledge Obtained Through the SCSP Hello protocol: The SCSP Hello protocol maintains current status of the inter- server connectivity through a polling mechanism. This status information can be used to influence the actions of the LS, e.g., in the event that the LS has lost connectivity from a DCS, then it should not perform a COMPLETE POLL operation. o On the SG Connectivity: It is likely that the servers of the SG are required to be fully interconnected, i.e., a LS is a DCS to all other servers of the SG. It was first thought that this would aid in determining the status of the SG, i.e., whether the SG was 'up' (fully function- ing) or 'down' (not fully functioning). However on further inspection this is not true, i.e., the loss of connectivity Kinnear, Cole & Droms [Page 51] DRAFT July 1997 between a pair of servers in a fully connected SG does not imply that the other servers are not still connected to the other servers. Full mesh connectivity may still be required for the correct operation of the Address Management protocol. This is currently under study. When a new server wishes to join a server group, it must initialize itself to the other members of the server group through the above defined interserver Group Management Protocol. Once this has occurred, the local server must initiate SCSP which then will align its client binding cache to that of the server group. It should then acquire Bindable addresses and fully participate in the on-going client binding update functions of the server group. This process is outlined in the below state diagram for the DHCP interserver protocol. The Group Management protocol handles the new server joining the group. Once this has occurred, the new server and all the other servers of the server group initiate the SCSP Hello Protocol on a pairwise basis. Per the discussion in the SCSP speci- fication, once bi-directional connectivity is re-verified and now monitored within the SCSP Hello protocol, the servers enter into the cache alignment and then the ongoing cache and address management functions. In the event that the servers transition to the 'DOWN' state, polling will continue until connectivity is re-established. The Group Management Protocol does not allow additions to the member- ship in the event that the SG is down. However it does allow for the removal of a server from the SG while another server is re-booting or disconnected. Therefore a re-booting or re-connecting server cannot be assured that the SG generation has remained constant during the 'DOWN' period. Therefore, in the event that the generation number of the SG has changed as indicated through the generation number con- tained within the interserver messages, the server needs to update its notion of the server group through the procedures identified in the group management protocol prior to aligning its cache. Kinnear, Cole & Droms [Page 52] DRAFT July 1997 +------------+ | Group | | Management | | Protocol | +------------+ | | V +------------+ | SCSP | | Hello | +------------+ / ^ \ / | \ V | V +--------------+ | +---------------+ |'Binding Mgmt'| | |Null'Addr Mgmt'| | Cache |---+----| Cache | | Alignment | | | Alignment | +--------------+ | +---------------+ | | | | | | V | V +--------------+ | +------------+ |'Binding Mgmt'| | | 'Addr Mgmt'| | Cache Update |---+----|Cache Update| +--------------+ +------------+ Figure 8.2-1 Interserver State Flow Diagram For operational efficiency, the servers should implement a scheme to limit the number of cache records to exchange during the cache align- ment process. For example, a SG could easily be managing 10,000 client records and the bandwidth requirements to pass even the sum- mary records required to build the CRL table can be quite large. Therefore, for the 'Cache Management' sub-protocol, the servers should record the times at which the cache entries were received or created or modified. When the CAFSM transitions for a particular DCS to the down state, t(down) should be recorded. Then when the CAFSM enters the cache alignment state, the CRL list is to be built up based upon only those records with time stamps more recent then t(down) - F, where F is a factor to be set to a multiple of the Hel- loInterval x DeadFactor. We recommend that the multiple be 10. In the event that the LS crashed (causing the transition to the down state), then t(down) should be set to the last record time stamp when the LS reboots. In the event that the server has just joined the SG, the CRL should be built up from all of the current cache records. Kinnear, Cole & Droms [Page 53] DRAFT July 1997 The interserver messages associated with the Client Binding Manage- ment are: CLIENT BINDING QUERY for the CLIENT BINDING POLL opera- tion, and CLIENT BINDING UPDATE for the CLIENT BINDING COMPLETE PUSH operation. These are discussed in detail in the following list items: o The CLIENT BINDING QUERY message queries another server regarding the status of a particular binding. Within the SCSP protocol, this exchange is accomplished by the LS sending a Client State Update_Solicit (CSUS) message with the Client State Advertisement Summary (CSAS) 'Address record' of the IP address in question. The DCS responds with the CSU_Request message with the Client State Update (CSU) record associated with the CSAS. The LS then replies with a CSU_Reply with the 'A-bit' set. o The CLIENT BINDING UPDATE message updates another server with a new, or changed, client binding. Within the SCSP protocol, this exchange is accomplished with the CSU_Request message carrying the specific CSA 'Binding record' of the client binding in ques- tion. The DCS responds with the CSA-Reply with the 'A-bit' set. The interserver messages associated with the Address Management are: UNBINDABLE QUERY for the UNBINDABLE COMPLETE POLL operation, and TRANSFER messages for the TRANSFER operation. These are discussed in detail in the following list items: o The UNBINDABLE QUERY message queries another server of the SG regarding the status of a particular address with the intent of making that address bindable to the LS. Within the SCSP proto- col, this exchange is accomplished by the LS sending a CSU_Solicit with the CSAS 'Address' record of the IP address in question to all other servers of the SG. The DCSes respond with the CSU_Request message with the CSA 'Address' record indicating the status of the address within the DCS. The LS then replies with the CSU_Reply message to the DCS with the 'A-bit' set. o The 'TRANSFER' operation is initiated by the LS to request a transfer of bindable addresses from the DCS to the LS. Within the SCSP protocol, this exchange is accomplished by a two step process. First, the LS sends a CSU_Request message with the CSA 'Subnet Bindable Addresses' record to the DCS, which then responds with a CSU_Reply. The CSA 'Subnet Bindable Addresses' record indicates the subnet in question, the number of BINDABLE addresses owned by the LS and the number of additional BINDABLE addresses the LS is requesting. Second, this is immediately fol- lowed by the DCS sending a CSU_Request message with a CSA 'Subnet Bindable Address' record for the given subnet in question. The DCS' CSA 'Subnet Bindable Addresses' record indicates the subnet Kinnear, Cole & Droms [Page 54] DRAFT July 1997 in question and the number and address of the IP addresses that the DCS is transferring to the LS based upon it's previous request. This is based upon the DCS' current understanding of the supply of bindable addresses within the LS and its local knowledge of its own set of bindable addresses for this subnet. This CSU_Request will generate a CSU_Reply from the originating LS. When sending the CSU_Request message, the DCS sets the addresses it is transferring to the LS as UNBINDABLE. The LS then moves these addresses to its list of BINDABLE addresses and sends a CSU_Reply to the DCS with the 'A-bit' set. The interserver messages associated with the Group Management opera- tions are: SG DISCOVERY QUERY, SG CONFIGURATION QUERY, SG PROPOSE CHANGE UPDATE, and SG COMMIT CHANGE UPDATE messages. These are dis- cussed in detail in the following list items: o The SG DISCOVERY QUERY message queries the DCS for its list of current SG in which it is participating. Within the SCSP proto- col, this exchange is accomplished by the LS sending a CSU_Solicit with the CSAS 'Server Groups' record and the DCS replys with the CSU_Request message containing the CSA 'Server Groups' record. This record contains the list SG specifiers, i.e., SG ID and SG Generation Number (GN) pairs. The LS replies with a CSU_Reply. o The SG CONFIGURATION QUERY message queries the DCS for its con- figuration information. This information is passed within the 'SG Subnets Configuration' record. The LS initiates this query by sending a CSU_Solicit containing the CSAS 'SG Subnets Configu- ration' summary record. The responds with a CSU_Request contain- ing the CSA 'SG Subnets Configuration' record. The LS replies with the CSU_Reply message. o The SG PROPOSE CHANGE UPDATE message proposes the new member to the rest of the SG. This is accomplished with a SCSP CSU_Req message carrying the 'SG Proposed Members' record. The SG COMMIT CHANGE UPDATE message consummates the new server joining the SG. Once the joining member has received positive CSU_Reply from all of the current members of the SG as part of the proposal phase, it then moves to the join commit phase. The new server now issues an SCSP CSU_Req message with the 'SG Members' record car- rying the newly joined member to the list of servers of the SG. o The SG PROPOSE CHANGE UPDATE message may also be used to propose the removal of an existing server from the membership of the SG. This is accomplished with a SCSP CSU_Req message carrying the 'SG Proposed Members' record containing all of the existing members of the SG minus the server ID to be removed. The SG COMMIT Kinnear, Cole & Droms [Page 55] DRAFT July 1997 CHANGE UPDATE message consummates the existing server leaving the SG. Once the removing member, i.e., the member who is actively removing the existing member from the group, has received posi- tive CSU_Reply from all of the current members of the SG (except for the member being removed) as part of the proposal phase, it then moves to the remove commit phase. The removing server now issues an SCSP CSU_Req message with the 'SG Members' record car- rying the new membership minus the removed server. 8.3. Necessary Modifications to SCSP The SCSP modifications required to support the DHCP interserver pro- tocol are as follows: o The operation of the SCSP protocol in this application is initi- ated upon the successful completion of the interserver 'Group Management Protocol'. o The SCSP messages, and in fact all of the DHCP interserver mes- sages are carried in UDP packets. Therefore a UDP port number needs to be defined for SCSP. DISCUSSION: Currently SCSP is defined only for NMBA networks. This mani- fests itself in two ways; a) the operation of the SCSP proto- col is initiated upon the establishment of NBMA connectivity, i.e., a virtual circuit being established, and b) the SCSP messages are encapsulated into link level frames using the LLC/SNAP encapsulation method. Instead of relying upon the establishment of a virtual circuit connection, the interserver protocol will initiate the SCSP protocol based upon the results of the 'Group Management Pro- tocol'. This divorces the operation of the interserver proto- col from the specifics of the link layer. Also, by carrying the messages within UDP, the protocol achieves independence in the deployment and proximity of the servers which are members of the same server group, i.e., servers are not required to have an interface on a common subnet. Because SCSP provides a generic capability to synchronize caches in distributed servers, it is best to define a separate UDP port number for the 'generic' SCSP protocol and a separate UDP port for the DHCP interserver Group Management protocol. These UPD port numbers are tbd. Kinnear, Cole & Droms [Page 56] DRAFT July 1997 o A SG Generation Number SCSP extension field needs to be defined. DISCUSSION: We have defined the notion of a Server Group Generation Number to distinguish between the various instantiations of a partic- ular SG. The membership of a particular SG will change over time. Because it is necessary for the correct operation of the DHCP interserver protocol for each server to know the cur- rent membership, it was deemed necessary to define a Genera- tion Number which is incremented each time a new server joins the SG or an existing server is removed from the SG. This GN is to be carried in every interserver message. No obvious place existed with the SCSP message formats to carry such information. Therefore, we have chosen to define a new SCSP extension type and will carry the GN in this method. o Some modification to the Authentication extension in the SCSP protocol may be required. DISCUSSION: Currently SCSP states that the authentication extension covers the SCSP message other than the extensions. However we have chosen to carry a new extension within the SCSP messages; the Generation Number. Ideally we would prefer that this exten- sion be protected by the authentication extension. Because it is not, we will also include the Generation Number in the SG Specifier record. Through this record a server may reverify the current Generation Number through a protected channel. o The three step Solicit_Request_Reply seems excessive when one server wishes to simply query another server. Perhaps this could be simplified (when desirable) by adding a bit to the CSU_Solicit message indicating whether the soliciting server wishes the DCS to expect or not to expect a CSU-Rep from the soliciting server. DISCUSSION: Currently SCSP states that the three step process of CSU_Sol followed by a CSU_Req which is then followed by a CSU_Rep. In certain situations this may be a desirable sequence. However, in other situations it may not be necessary. When the CSU_Sol is sent a CSUSReXmtInterval timer is set which tracks the sta- tus of the receipt of the requested CSU_Req records. For sim- ply queries, this re-transmit timer may be sufficient. There- fore, it seems reasonable that DCS should expect a CSU_Rep from the LS which sent the CSU_Sol message. Kinnear, Cole & Droms [Page 57] DRAFT July 1997 8.4. DHCP Specific CSA and CSAS Records This section presents the CSA and the CSAS records specific to the DHCP inter-server protocol. The mappings of the interserver protocol onto SCSP messages discussed in the previous section relys upon the definition of a number of record types. These record types will be distinguished within the CSAS defined 'Cache Key', which for the pur- pose of running the DHCP interserver protocol will consist of a TYPE/Key pair. The following CSAS and CSA record types are required to run the interserver protocol: For Client Binding Management: o Binding Record - contains the complete client binding informa- tion. For Address Management: o Address Record - contains the status of a specific IP address, e.g., unbindable, bindable, bound, expired, etc. o Subnet Bindable Record - contains information regarding the sub- net addresses, e.g., number of bindable addresses. For Group Management: o SG Specifier Record - contains the current Server Group speci- fiers, i.e., the SG ID (which is fixed for the duration of the life of the SG) and the SG Generation Number which is incremented for each new server add or old server delete. o SG Members Record - contains the current list of member servers of the SG. o SG Subnets Configuration Record - contains a list of all subnets, i.e., subnet address and mask, for all of the subnets served by the SG as well as the assignable addresses per subnet, and poten- tially other configuration parameters necessary for the proper operation of the DHCP interserver protocol. o SG Proposed Members Record - contains a list of the proposed mem- ber servers of the SG used in the group join proposal process. This record has a finite duration associated with it and times out if the proposed join fails. Kinnear, Cole & Droms [Page 58] DRAFT July 1997 8.4.1. The SCSP CSAS Records for the Interserver Protocol The CSAS record is completely specified in [2]. The format of the CSAS record is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hop Count | Record Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cache Key Len | Orig ID Len |N| unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSA Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cache Key (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Originator ID (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8.4.1-1 SCSP CSAS Record Format where: o Hop Count - this represents the number of hops that the record may take before being dropped. o Record Length - this is the length in bytes of the CSAS record if stand-alone, otherwise it is the length in bytes of the CSAS record and the protocol specific part of the cache entry com- bined, i.e., the length of the CSA record. o Cache Key Length - this is the length of the Cache Key field in bytes. o Originator ID Length - this is the length of the Originator ID field in bytes. o N bit - this bit, when set, signifies a Null record. This may be the case when the LS receives a solicitation for a record that has been released by the DHCP client. o CSA Sequence Number - this field contains the sequence number that identifies the 'newness' of a CSA record instance being sum- marized. This number is assigned by the originator of the CSA record, i.e., the last transaction server. Kinnear, Cole & Droms [Page 59] DRAFT July 1997 o Cache Key - is an opaque string used by the receiving server to identify the cache entry referred to by the record. For the pur- poses of running the DHCP interserver protocol, the Cache Key will be encoded as a Type/Key pair, where the type is an 8 bit field and the length of the Key is derived from the Cache Key Length field in the header. The Type indicates the type of record and equivalently the Interserver message type, e.g., Unbindable Address Query, SG Configuration Query, etc. The 8 bit type encodings are defined in the table below. o Originator ID - this field contains an ID which is administra- tively assigned to the server which is the originator of the CSA record. For the DHCP interserver mapping, the the Originating Server ID is chosen to be the IP address of the server. In the event that the server has multiple IP addresses assigned to it, then the Originating Server ID is set to the IP address with the highest value. The CSAS record is specified by SCSP except for the specifics of the Cache Key and the Originator ID. For the purpose of the DHCP interserver specification, the Originat- ing Server ID is chosen to be the IP address of the server. In the event that the server has multiple IP addresses assigned to it, then the Originating Server ID is set to the IP address with the highest value. The Cache Key used is dependent upon the specific CSAS record in question. The table below identifies the specific Cache Keys for the various CSAS records within the DHCP interserver protocol. These are composed of a type and key field, both of which are identified in the table. Kinnear, Cole & Droms [Page 60] DRAFT July 1997 Table 8.4.1-1 Cache Keys for the various CSAS and CSA records Record Type | Encoding | Key -------------------------------------------------- | | Client Binding | 0x00 | Client ID | | or hwaddr Address | 0x10 | IP addr | | Subnet Bindable Addrs | 0x11 | Subnet/Mask * | | SG Specifiers | 0x20 | IP addr | | SG Subnet Configs | 0x21 | SG ID | | SG Members | 0x22 | SG ID/SG GN ** | | SG Proposed Members | 0x23 | SG ID/SG GN ** * The subnet address and the subnet mask will be encoded as 32 bit strings with the subnet address followed by the subnet mask. ** The SG ID and SG GN are encoded as 16 bit strings with the SG ID first, immediately followed by the SG GN. 8.4.2. The SCSP CSA Records for the Interserver Protocol There are several types of DHCP specific CSA records defined corre- sponding to each of the CSAS record types discussed above and found in Table 8.4.1-1. For many of these records, DHCP options appear in the records in the same format as specified in [7]. The records are: o The Client Binding record carries the complete client binding information. The Key for this record is the chaddr or the 'client ID' from the optional DHCP extension. This is utilized in the Cache Mgmt sub-protocol in handling the COMPLETE PUSH, POLL and SCSP cache alignment operations. o The Address record carries the information required to achieve the desired response from the CSU_Solicit message. The Key is the IP address. This is utilized in the Address Mgmt sub- protocol in handling the UNBINDABLE COMPLETE POLL operation. Kinnear, Cole & Droms [Page 61] DRAFT July 1997 o The Subnet Bindable Address record carries the information required to determine the status of the available IP addresses which are bindable to the DCS and which it is will to transfer to the LS. The Key for this record is the subnet address and mask of the subnet in question. This is utilized in the Address Mgmt sub-protocol by the TRANSFER operation. o The SG Specifier record contains the total list of SG specifiers, i.e., SG ID and SG GN pairs, of which the server in question is currently a member. This is utilized in the Group Mgmt sub- protocol by the DISCOVERY operation. The Key for this record is the Server ID, i.e., the IP address of the server. o The SG Members record contains a list of the Server IDs which comprise the SG in question. This is utilized in the Group Mgmt sub-protocol by the DISCOVER MEMBERS operation. The Key for this record is the SG Specifier, i.e., the SGID and SG GN pair. o The SG Proposed Members record contains a list of the SG members, including the newly proposed member, of the server group. This is utilized in the Group Mgmt sub-protocol by the PROPOSE JOIN operation. The Key for this record is the SG Specifier, i.e., the SGID and SG GN pair where the SG GN is one greater than the current GN of the SG. 8.4.2.1. Binding Records The approach taken in defining the Client Binding record is as fol- lows. It is possible, while still maintaining the correct operation of the DHCP client/server protocol, to have the different server con- figurations within the same server group with respect to certain parameters. For these parameters we do not require synchronization of the server configurations and we make the passing of these parame- ters as optional. However there are some configuration parameters and binding information which is critical to the correct operation of the protocol. For these client parameters we require that they be included in the Client Binding records. The minimal, required set of parameters to be sent in the Client Binding are the IP address (ciaddr), the lease period, the last transaction type, the client hardware address, the Client-Identifier and the Renewel (T1) and Rebinding (T2) Time values (if present in the DHCP options extensions of the DHCPACK). The format of the CSA Binding record for the DCHP inter-server proto- col is: Kinnear, Cole & Droms [Page 62] DRAFT July 1997 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSAS Record (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LTT |resrv'd| HTYPE | HLEN | resrv'd | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CHADDR (HLEN in octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CIADDR (4 octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Last Transaction Time (4 octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Address Lease Time (encoded as tag=51) (6 octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Optional ClientID (encoded as tag=61) (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Optional Renewal Time (encoded as tag=58) (6 octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Optional Rebinding Time (encoded as tag=59) (6 octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Other desirable DCHP extensions (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | End Option (encoded as in BOOTP options, tag=255) (1 octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8.4.2.1-1 DHCP inter-server CSA Binding record format where: o CSAS Record - represents the full CSAS record as identified in Section 8.4.1. o LLT - indicates the Last Transaction Type. The allowed LTTs are: DHCPREQUEST/SELECTING (0x0), DHCPREQUEST/REBINDING (0x3), DHCPRE- QUEST/RENEWING(0x2), DHCPREQUEST/INIT-REBOOT (0x1), DHCPRELEASE (0x4), and EXPIRATION (0x5). o HTYPE - hardware address type (defined in [1]) o HLEN - hardware address length o CHADDR - client hardware address o CIADDR - client IP address (if assigned). If not assigned, this field is all 0s. Kinnear, Cole & Droms [Page 63] DRAFT July 1997 o Last Transaction Time - the time from now in seconds of the last transaction time associated with the LTT as indicated in the mes- sage. o IP Address Lease Time - the IP Address Lease Time encoded as in the DHCP options and BOOTP vendor extensions defined in [7]. This represents the time from now that the client lease is to expire. o (Optional) Client ID - this field is the optional Client ID encoded as in the DHCP options and BOOTP vendor extensions defined in RFC 2132 [7]. If present, the Client ID is the 'search string'. o (Optional) Renewal Time - this field is the optional Client Renewal Time (T1) as encoded in the DHCP options and BOOTP vendor extensions defined in RFC 2132 [7]. o (Optional) Rebinding Time - this field is the optional Client Rebinding Time (T2) as encoded in the DHCP options and BOOTP ven- dor extensions defined in RFC 2132 [7]. o Remaining Options - any remaining options carried in the original DHCPOFFER message to the client encoded as in the DHCP options and BOOTP vendor extensions defined in [7] o End option - determines the end of the CSAS record DISCUSSION: As discussed in the previous section on the CSAS record for- mat, the format shown above is intended to be the Binding type CSA record. The binding record is used in the PUSH and COM- PLETE PUSH operations to transfer to the DCSes the newly cre- ated or changed binding and in the cache alignment procedures. The structure of the Client Binding is defined, for the pro- pose of the DHCP interserver protocol into a mandatory part and an optional part. The mandatory part is everything upto and including the (Optional) Rebinding Time. The optional part is everything following the (Optional) Rebinding Time. The PUSHing server may include any additional parameters which were part of the DHCPACK message to the client within the Client Binding Record and encode this as defined in the the DHCP options and BOOTP vendor extensions defined in RFC 2132 [7]. The server which is the recipient of the PUSH may chose to save and forward these optional parameters in the record or may chose not to save and forward these optional parameters. Kinnear, Cole & Droms [Page 64] DRAFT July 1997 8.4.2.2. Address Records The format of the CSA Address record for the DCHP inter-server proto- col is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSAS Record (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ST | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8.4.2.2-1 DHCP inter-server CSA Address record format where: o CSAS Record - represents the full CSAS record as identified in Section 8.4.1. o ST - represents the state of the (client) record, e.g., unbind- able, bindable, bound, expired, polling, static DISCUSSION: The Address record is used within the UNBINDABLE COMPLETE POLL operation to move an unbindable address to a bindable address. The POLLed server returns the Address record indicating the current status of the address within the server. If all of the servers indicate that the address is unbindable, then and only then will the LS move the address to its Bindable pool. The ST field indicates the servers view of the state of the address. The states (defined in Section 3.4.2) are: UNBIND- ABLE, POLLING, BINDABLE, BOUND, PUSHED, and EXPIRED. The IP address states are encoded in the following manner: Kinnear, Cole & Droms [Page 65] DRAFT July 1997 Table 8.4.2.2-1 IP Address State Encodings IP Address State | Encoding -------------------------------------------------- | UNBINDABLE | 0x01 POLLING | 0x02 BINDABLE | 0x03 BOUND | 0x04 PUSHED | 0x05 EXPIRED | 0x06 8.4.2.3. Subnet Bindable Addresses Record The CSA Subnet Bindable Addresses record indicates the set of addresses that a server is willing to TRANSFER to a requesting server. This record is used in the TRANSFER operation. The format of the CSA Subnet Bindable Addresses record for the DCHP inter-server protocol is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSAS Record (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | No. Addresses |No. Addr.Ranges|R| reserved |No.Ownd|No.Reqd| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | List of IP Addresses | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8.4.2.3-1 DHCP inter-server CSA Subnet Bindable Addresses record format where: o CSAS Record - represents the full CSAS record as identified in Section 8.4.1. o No. Address - indicates the number of IP addresses contained within the subnet record. These are the addresses that the DCS is transferring to the LS as part of the TRANSFER operation. This is set to 0 when the R-bit is set to 1 (see R-bit below). Kinnear, Cole & Droms [Page 66] DRAFT July 1997 o No. Addr. Ranges - indicates the number of IP address ranges of the form 135.16.114.5 to 135.16.114.235. These will immediately follow the listing of the individual addresses. This is set to 0 when the R-bit is set to 1 (see R-bit below). o R - represents the request bit. When this bit is set to 1, it indicates that the LS is requesting BINDABLE addresses from the DCS as part of the TRANSFER operation. When it is set to 0, it indicates that the DCS is transferring these addresses to the LS. o No. Ownd - indicates the current number of BINDABLE addresses owned by the LS when the R-bit is set to 1. o No.Reqd - indicates the number of additional BINDABLE addresses requested by the LS when the R-bit is set to 1. o List of IP Addresses - this is a consecutive list of IP address and address ranges. DISCUSSION: The Subnet record is used in the TRANSFER operation to indi- cate 1) the list of bindable IP addresses that the DCS is willing to transfer to the LS when the R bit is 0, and 2) the IP addresses that the LS is requesting when the R bit is 1. Further, it may be useful to develop similar records for Sub- net UNBINDABLE, BOUND, PUSHED, and EXPIRED address. They can have an identical record format and be distinguished through the 8 bit type field encoded into the SCSP Cache Key. The utility of these record types is TBD. 8.4.2.4. SG Specifier Record The CSA SG Specifier Record indicates the total list of DHCP Inter- server protocol Server Groups that the DCS is currently a member. This is used in the Group Management subprotocol during the initial contact of a prospective new member to the Server Group. The format of the CSA SG Specifier Record for the DCHP inter-server protocol is: Kinnear, Cole & Droms [Page 67] DRAFT July 1997 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSAS Record (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |No. Specifiers | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | List of Specifier Pairs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8.4.2.4-1 DHCP inter-server CSA SG Specifiers record format where: o CSAS Record - represents the full CSAS record as identified in Section 8.4.1. o No. Specifiers - is a count of the number of specifier pairs con- tained within this CSA record. o List of Specifier Pairs - represents a consecutive listing of the specifier pairs of which the DCS is current a mamber. The encod- ing of the specifier pairs is SG ID first, which is a 16 bit string, followed by the SG Generation Number, which is also a 16-bit string. DISCUSSION: This record is initially requested by a server which is inter- ested in joining a DHCP Interserver Server Group and has been configured with the IP address of a server to first contact. The first contacted server then replies with the SG Specifier record. This record can also be solicited when a server, which an existing member of a group becomes uncertain regard- ing the current Generation Number of the group. The SG Generation Number, obtained from this record, is car- ried in every DHCP Interserver protocol message, encoded as an extension to the SCSP message extension fields. The extension encoding is TBD. 8.4.2.5. SG Subnets Configuration Record The CSA SG Subnet Configuration Record carries SG configuration information necessary to ensure the correct protocol operation of the group. The encoding of this record is essentially the subnet address and mask followed by the pool of addresses which are dynamically Kinnear, Cole & Droms [Page 68] DRAFT July 1997 managed by the Server Group for this subnet. The encoding of the address pool with be consistent with the address pool encoding of the Subnet Bindable Addresses Record discussed in Section 8.4.2.3 above. Other configuration parameters may be including if deemed important to the correct operation of the DHCP interserver protocol. Section 7.2 specifies that additional information (specifically client configuration information and vendor specific configuration information) will be also be available. The precise details of how this information is encoded is TBD. The format of the CSA SG Subnets Configuration Record for the DCHP inter-server protocol is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSAS Record (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | No. Subnets | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Subnet Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Subnet Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address Pool of first subnet (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Subnet Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address Pool of last subnet (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8.4.2.5-1 DHCP inter-server CSA SG Subnets Configuration record format where: o CSAS Record - represents the full CSAS record as identified in Section 8.4.1. o No. Subnets - indicates the number of subnet configurations con- tained in this record. Kinnear, Cole & Droms [Page 69] DRAFT July 1997 o Subnet Address - this is the subnet address of the subnet for which the following address pool is related. o Subnet Mask - this is the mask of the subnet in question. o Address pool of subnet - this is a listing of the address pool for which this SG can allocate from for this particular subnet. The encoding will follow the address pool encoding for the Subnet Bindable Addresses record. Therefore, the address pool should contain two count fields, the first indicating the number of individually listed addresses, followed by another field indicat- ing the number of address ranges. These are then followed by the list of individual IP addresses and then the list of address ranges. DISCUSSION: The total list of configuration items to be incorporated into this record needs to be further fleshed out. Currently this record is planned to contain a list of the subnets and the address pools associated with each from which this SG can allocate. If other configuration parameters are deemed neces- sary for the proper operation of the DHCP Interserver proto- col, then these need to be incorporated into this record. 8.4.2.6. SG Members Record The CSA SG Members Record indicates the list of the current SG mem- bers, in the opinion of the sending server, including itself. The format of the CSA SG Members Record for the DCHP inter-server protocol is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSAS Record (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | No. Server IDs|P| reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | List of Server IDs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8.4.2.6-1 DHCP inter-server CSA SG Members record format where: Kinnear, Cole & Droms [Page 70] DRAFT July 1997 o CSAS Record - represents the full CSAS record as identified in Section 8.4.1. o No. Server IDs - this is the number of Server IDs contained within this record. o P bit - the Proposal bit is used to indicate that this record is a current group members record (here set to 0) or a proposed group members record (discussed in the next section). o List of the Server IDs - this is a consecutive list of Server IDs which comprise this server's view of the current SG membership. The Server IDs are IP addresses associated with one of the server's interfaces. 8.4.2.7. SG Proposed Members Record The CSA SG Proposed Members Record indicates the list of the current SG members, in the opinion of the sending server, and adding itself. This is a temporary record (with a lifetime associated with the period during which a Group Management SG CHANGE operation has to complete). Once the SG COMMIT CHANGE UPDATE is received, this record replaces the old SG Members record as the new member record contain- ing the newly joined server. The format of the CSA SG Proposed Members Record for the DCHP inter- server protocol is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSAS Record (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | No. Server IDs|P| reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | List of Server IDs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8.4.2.7-1 DHCP inter-server CSA SG Proposed Members format where: o CSAS Record - represents the full CSAS record as identified in Section 8.4.1. Kinnear, Cole & Droms [Page 71] DRAFT July 1997 o No. Server IDs - this is the number of Server IDs contained within this record. o P bit - the Proposal bit is used to indicate that this record is a proposed group members record (here set to 1) or a current group members record (discussed in the previous section). o List of the Server IDs - this is a consecutive list of Server IDs which comprise the sending server's view of the proposed SG mem- bership. The Server IDs are IP addresses associated with one of the server's interfaces. DISCUSSION: This record contains the proposed group membership from the view of the proposing server. This record conceptually has a temporary lifetime associated with the period for which a group join proposal can live. If a server receives a SG COM- MIT CHANGE UPDATE message, then this record becomes the new SG Members record. If a SG COMMIT CHANGE UPDATE message is not received within the appropriate period, then this record expires. If the server receives a second SG PROPOSE CHANGE UPDATE message while another Proposed Members record is active, it should NAK this second Proposed Members record. Only one group join can be in process at any given time. 8.5. Open Questions with the Mapping onto SCSP The following questions are identified as outstanding issues to be resolved for the CSAS and CSA record definitions to be considered complete: o SCSP is currently LLC/SNAP encapsulated. We are proposing that a UDP port be defined to carry SCSP messages for DHCP. In fact we are proposing that the entire DHCP interserver protocol be run over UDP. o SCSP has currently reserved its Protocol ID = 4 for DHCP. This draft discusses DHCPv4 Interserver protocol and therefore the SCSP Protocol ID reservation should reflect that fact. If a DHCPv6 extension to this draft were developed it would require a separate SCSP Protocol ID. o SCSP dropped support for message fragmentation. We need to look into the size required for the various records defined in this draft and, if necessary, consider how to handle records larger than can fit into a single UDP packet. Kinnear, Cole & Droms [Page 72] DRAFT July 1997 o Need to give further thought to the partitioning of the DHCP interserver protocol into three separate but related subproto- cols; the Group Management, the Binding Management and the Address Management subprotocols. Currently this draft has these as separate subprotocols, with the Group Management subprotocol run separate from the SCSP protocol and in fact on a different UDP port as the SCSP protocol. The Group Management does however share common message semantics and syntax with the SCSP messages in order to simplify parsing the various messages associated with the DHCP interserver protocol. The Binding Management and the Address Management subprotocols are run on top of SCSP with a single Protocol ID. o We need to explicitly discuss the method used to authenticate the DHCP Interserver protocol messages. Current thinking is to use the SCSP authentication extensions. This should be investigated and should be consistent with the 'Security Architecture for DHCP' draft [8]. 9. IP Address State Transitions The possible states of an IP address were defined in Section 3.2.2, and the state transition diagram appears there. The state transi- tions though which an IP address can move were discussed implicitly in Section 6 in the context of the receipt of DHCP messages from DHCP clients. However, an explicit examination of the processing required of a server by this protocol on each of the state transitions will serve to highlight some important aspects of this protocol. The IP address state transitions are handled in the following way: o UNBINDABLE -> POLLING When a server attempts to make a particular IP address BINDABLE, it first moves that IP address into the POLLING state. Once in this state, if queried about whether that IP address is UNBIND- ABLE, the server will reply negatively. o UNBINDABLE -> BOUND When a server is removed from a server group, all of the IP addresses must be scanned to see if any of them show that server as the server who performed the last transaction (as set by that server successfully completing a CLIENT BINDING COMPLETE PUSH). For all of those IP addresses, if there is a client recorded in the IP address, and if that client does not have a currently dif- ferent binding, then that IP address must be set to BOUND and the lease time must be reset to the value sent in the latest CLIENT Kinnear, Cole & Droms [Page 73] DRAFT July 1997 BINDING COMPLETE PUSH. The only states from which this transition will be made are UNBINDABLE and EXPIRED. o POLLING -> BINDABLE A fundamental point and guarantee of this state transition dia- gram is that for an IP address to move from the UNBINDABLE state (where it is not owned by any server) through the POLLING state and on to the BINDABLE state (where it is owned by a single server) requires the server seeking to own the IP address to con- tact all of the other servers in the group. It requires an UNBINDABLE COMPLETE POLL to complete successfully. The server attempting to move an IP address from the UNBINDABLE through the POLLING and on to the BINDABLE state must ask every other server in the group if it believes that the IP address is currently UNBINDABLE using an UNBINDABLE COMPLETE POLL. If any server says that the IP address is either BINDABLE (i.e., it cur- rently owns the IP address) or BOUND (i.e., a client currently owns the IP address), then the server attempting to move the IP address from the UNBINDABLE to BINDABLE state MUST abandon the attempt. If any server fails to respond at all, the server MUST abandon the attempt as well. DISCUSSION: In addition (and this is important!) if the server attempting to move the IP address from the UNBINDABLE state through the POLLING state and on to the BINDABLE state fails to hear from some other server, then the attempt cannot complete. This means that if a server cannot communicate with every other server (due to communications failure, transient server fail- ure, or network partition) then this state transition cannot be made. Thus, all addresses in the UNBINDABLE state will stay in that state while any server in the group is out of communication with the group for any reason at all. Of course, the detailed description of the protocol suggests that a server build up a supply of BINDABLE IP addresses so that in the event of server failure it has BINDABLE addresses that are available to offer to new DHCP clients. o BINDABLE -> BOUND Kinnear, Cole & Droms [Page 74] DRAFT July 1997 Once an IP address is BINDABLE it may be BOUND to a client through the normal actions of the DHCP protocol. Once a server has received a DHCPREQUEST/SELECTING message from a client it can move the IP address into the BOUND state, update its stable stor- age, and reply with a DHCPACK message to the client. After the DHCPACK has been sent, the DHCP server MUST also attempt to update all servers in the group with information indi- cating that the IP address is now BOUND to a particular client. It must perform a CLIENT BINDING COMPLETE PUSH operation with this information. An IP address that is BOUND will always result in a lease time that is no greater than the MAXIMUM-UNPUSHED-LEASE-TIME when given to a client, although the normal lease time is used in all interactions with other servers. DISCUSSION: In an ideal world, the server who created the binding would always succeed in updating all other servers in the group with the binding information. Then, in the event that the binding server failed at some later time, another server to whom the client could broadcast would receive a DHCPREQUEST/REBINDING request and could reply with updated binding information. However, there is obviously a window where a server can crash after sending a DHCPACK and prior to updating even one addi- tional server. This protocol has been designed so that not only is the process of updating all of the servers in the group with information concerning a new binding "lazy" (i.e., performed after the actual binding is made), but also unneces- sary for correct operation. The protocol only requires that a server try to update the other servers -- not that it succeed at updating even one server. The protocol accomplishes this by allowing a server to respond to a DHCPREQUEST/REBINDING message from a client without any information having been propagated from the server who created the binding. Thus, a server who receives a rebinding request for an IP address about which it has no information must check with all available servers in the group, but in the absence of information to the contrary arriving within a relatively short timeout period, the server should respond to the rebinding request with an extension of the existing lease on the IP address. Kinnear, Cole & Droms [Page 75] DRAFT July 1997 o BINDABLE -> UNBINDABLE A server can relinquish an IP address in the BINDABLE state that it owns simply by responding to requests for information about the IP address as if it were UNBINDABLE. No explicit action need be taken other than to respond correctly to POLL operations from other servers. o BOUND -> PUSHED Once an IP address that is BOUND to a client has a CLIENT BINDING COMPLETE PUSH succeed (and that means succeed to all of the servers), then it moves from the BOUND to the PUSHED state. At this point, the normal lease time may be returned to the client on the next renewal or discover or rebinding. Note that only the server which executes the CLIENT BINDING COM- PLETE PUSH will set its IP address into the PUSHED state. The state that it PUSHes to the other servers is BOUND. o BOUND -> UNBINDABLE In order for an IP address to move from the BOUND to the UNBIND- ABLE state, the client that owns the IP address (i.e., to which it is BOUND) must send a DHCPRELEASE message. In this case, the receiving server (which may or may not be the server who created original binding) will update its stable storage with information that the IP address is not currently BOUND by any client. It should then transmit this information to all other servers to which it can communicate at that time by performing a CLIENT BINDING COMPLETE PUSH operation. In the event that the server fails to update any other server with the new information about the IP address prior to undergoing some failure, then the worst that will happen is that the other servers will believe that an IP address is in the BOUND state when it need not be. Ultimately the lease on the IP address will expire. o BOUND -> EXPIRED Any server which has information concerning a BOUND IP address may determine that the lease on the IP address has expired, and after an appropriate grace period has elapsed, that the IP address should be moved to the EXPIRED state. A record of the client to which the IP address was BOUND must be kept. Kinnear, Cole & Droms [Page 76] DRAFT July 1997 o PUSHED -> UNBINDABLE In order for an IP address to move from the PUSHED to the UNBIND- ABLE state, the client that owns the IP address (i.e., to which it is BOUND) must send a DHCPRELEASE message. In this case, the receiving server (which may or may not be the server who created original binding) will update its stable storage with information that the IP address is not currently BOUND by any client. It should then transmit this information to all other servers to which it can communicate at that time by performing a CLIENT BINDING COMPLETE PUSH operation. In the event that the server fails to update any other server with the new information about the IP address prior to undergoing some failure, then the worst that will happen is that the other servers will believe that an IP address is in the PUSHED state when it need not be. Ultimately the lease on the IP address will expire. o PUSHED -> EXPIRED Any server which has information concerning a PUSHED IP address may determine that the lease on the IP address has expired, and after an appropriate grace period has elapsed, that the IP address should be moved to the EXPIRED state. A record of the client to which the IP address was PUSHED must be kept. o EXPIRED -> UNBINDABLE If any server asks for information concerning this IP address, then the receiving server should set the IP address to be UNBIND- ABLE, update its stable storage, and respond to the requesting server. o EXPIRED -> BOUND If a server receives a message from a client and the IP address is EXPIRED, but was last BOUND or PUSHED to that client, then the IP address can be moved back into the BOUND state. This is pos- sible because no other server can have attempted to make this IP address BINDABLE. If it had, the IP address would not be in the EXPIRED state anymore, but in the UNBINDABLE state (see the EXPIRED -> UNBINDABLE transition above). Another reason this transition can occur is as follows. When a server is removed from a server group, all of the IP addresses must be scanned to see if any of them show that server as the server who performed the last transaction (as set by that server Kinnear, Cole & Droms [Page 77] DRAFT July 1997 successfully completing a CLIENT BINDING COMPLETE PUSH). For all of those IP addresses, if there is a client recorded in the IP address, and if that client does not have a currently different binding, then that IP address must be set to BOUND and the lease time must be reset to the value sent in the latest CLIENT BINDING COMPLETE PUSH. The only states from which this transition will be made are UNBINDABLE and EXPIRED. 10. Security Considerations Minimal security would be provided by configuring every server in a group with the IP addresses of the allowable servers that could ever join that group. Some additional security is created by using the SCSP security mecha- nism, although there are limitations to that for other than the client binding management part of the protocol. Other, more powerful security approaches are and must be addressed prior to further progress on this protocol. 11. Open Questions The following open questions set off by the "*" character remain from Ralph Droms' original draft: draft-ietf-dhc-interserver-00.txt. Comments have been added in square brackets []. Additional open questions new to this draft are listed with the "o" character. * Each server must know all other servers. Requiring each server to know about every other server imposes additional administrative overhead in the configuration of DHCP servers. However, this configuration overhead is probably mini- mal relative to any other configuration required for DHCP servers. [The group management messages in Section 7 provide a step towards an answer here. A server needs to know only one other server.] * Each server must contact all other servers before reassigning an address. Kinnear, Cole & Droms [Page 78] DRAFT July 1997 [This is fundamental if we wish to use the "lazy synchronization" mode -- you can't get one without the other.] There is a potential issue here in which no new DHCP clients can be configured if any of the DHCP servers cannot be contacted. Servers can mitigate this problem by maintaining a list of pre- checked addresses that can be allocated without contacting all other servers at the time of address allocation. The protocol may need additional definition of specific actions on the part of DHCP servers in response to situations in which a server cannot contact all other servers. [Added a lot of these in this draft.] * Servers cooperating to achieve "fair" distribution of available addresses. The protocol may need additional mechanisms or definition of default behavior through which servers cooperate among themselves to ensure that each has a sufficient pool of prechecked-addresses on each network. [Not yet addressed, and needs work. Initial thinking is that all addresses should be allocated to some server, so that if the event of a SG where one member can't be contacted, the maximum addresses are available for TRANSFER operations as necessary.] * User intervention in case of database incoherency. Fixing the collective database on the DHCP servers in case of a problem could be a *real* nightmare. * Potential deadlock in checking address - suppose two servers check the same address for reassignment simultaneously? [Solved with the introduction of the POLLING state.] * Potential configuration for new server? One ancillary use of the inter-server protocol might be in con- figuring new DHCP servers. Suppose the inter-server protocol were extended to allow download of a server's configuration file and to allow addition of a new server to the list of DHCP servers. A new server might be configured by simply giving it the address of an existing server. The new server could then download a list of all other known servers, the pool of candidate addresses, any special configuration information (e.g., vendor class information) and the existing bindings. The new server Kinnear, Cole & Droms [Page 79] DRAFT July 1997 could also announce itself to all of the other existing servers. [Much of this is in the current draft, principally in the group management configuration messages. At this stage, a server can figure out which groups correspond with which subnets, which addresses that group manages on that subnet, and some additional configuration information. This is considerable distance towards both ensuring that all servers in the SG have compatible configu- rations, as well as towards one server downloading configuration data from another server. Downloading configuration files would not be a great idea for servers which don't use configuration files.] * DHCP server maintenance There is likely an opportunity for the development of a server management tool that would download the database information from all servers and check for conflicts/inconsistencies such as assignment of an IP address to multiple clients, bindings that are not replicated across all servers, bindings that have incon- sistent lease expiration times, etc. o Group-id selection. The group-id's for various groups need to be sufficiently unique that no server will ever be a member of two groups with the same group-id. No mechanism is provided yet in this protocol to gen- erate group-id's which conform to this requirement. Possibly a group-id can be synthesized in some manner to ensure that they conform to this requirement. o The original draft discussed the requirement for each server to have a synchronized clock using available time synchronization protocols. That requirement has been removed in this draft, and in its place all times are sent in "seconds from now" as a signed 32 bit number. There is clearly a bit of additional complexity required to do this, but we have been so impressed at how well DHCP works with "relative" instead of "absolute" time that we felt the complexity of using relative time worth it (since using synchronized time is not without its own complexities). o UNAVAILABLE IP addresses There are several cases where a server can determine that some sort of serious error has occurred, and apparently an IP address is in an inconsistent state. In these cases, the server should Kinnear, Cole & Droms [Page 80] DRAFT July 1997 make the IP address UNAVAILABLE -- i.e., no other server should be able to operate on it. Just what is necessary to make this happen? Could it be a passive response to address information messages, or must it involve a complete push to all of the other servers, and a new IP address state? 12. Acknowledgments Many of the ideas in this proposal are due to Jeff Mogul, Greg Min- shall, Rob Stevens, Walt Wimer, Ted Lemon and the DHC working group. Thanks to all who have contributed their ideas and participated in the discussion of the inter-server protocol. At American Internet, Brad Parker and Mark Stapp have been key con- tributors to the design discussions that have resulted in our contri- butions to the this draft. They have each invested many hours of work in this protocol. 13. References [1] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131, March 1997. [2] Luciani, J., Armitage, G., Halpern, J., "Server Cache Synchro- nization Protocol (SCSP)", draft-ietf-ion-scsp-01.txt. [3] Moy, J. "OSPF Version 2", IETF RFC1247, July 1991. [4] Luciani, J., "A Distributed NHRP Service Using SCSP", draft- ietf-ion-scsp-nhrp-00.txt. [5] Luciani, J., Fox, B., "A Distributed ATMARP Service Using SCSP", draft-ietf-ion-scsp-atmarp-00.txt. [6] Reynolds, J., Postel, J., "Assigned Numbers", Internet STD 2, Internet RFC 1340, USC/Information Sciences Institute, July 1992. [7] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor Extensions", Internet RFC 2132, March 1997. [8] Gudmundsson, Olafur, "Security Architecture for DHCP", draft- ietf-dhc-security-arch-00.txt. Kinnear, Cole & Droms [Page 81] DRAFT July 1997 14. Author's information Kim Kinnear American Internet Corporation 4 Preston Ct. Bedford, MA 01730-2334 Phone: (617) 276-4587 EMail: kinnear@american.com Robert G. Cole AT&T Laboratories Managed Network Solutions Division Rm. 3L-533 101 Crawfords Corner Road Holmdel, NJ 07733 Phone: (908) 949-1950 EMail: rgc@qsun.att.com Ralph Droms Computer Science Department 323 Dana Engineering Bucknell University Lewisburg, PA 17837 Phone: (717) 524-1145 EMail: droms@bucknell.edu Kinnear, Cole & Droms [Page 82] DRAFT July 1997 Appendix A: An Overview of SCSP This appendix presents an overview of the SCSP protocol and supple- ments Section 8.2 in the main text of this specification. For a com- plete discussion of the SCSP protocol see [2]. This appendix is divided into three following sections on the SCSP Hello, Cache Alignment and Cache Update subprotocols respectively. The last section of this appendix presents a summary of the SCSP mes- sage sets. A.1 The SCSP "Hello" Sub-protocol Overview The function of the SCSP "Hello" protocol is to monitor the status of the LS to DCS connection. The LS must be configured with the addresses of its DCSs. The protocol contains a 'Family ID' which allows for the multiplexing of multiple protocol specific SCSP imple- mentations to rely on a single Hello mechanism between each server pair. For each DCS (whether the low level connection is point-to- point or point-to-multipoint), the LS maintains an Hello Finite State Machine (HFSM). The HFSM is shown in the figure below. +---------------+ | | +------->| DOWN |<-------+ | | | | | +---------------+ | | | ^ | | | | | | | | | | | | | | V | | | +---------------+ | | | | | | | WAITING | | | +--| |--+ | | | +---------------+ | | | | ^ ^ | | | | | | | | | V | | V | +---------------+ +---------------+ | BIDIRECTION |---->| UNIDIRECTION | | | | | | CONNECTION |<----| CONNECTION | +---------------+ +---------------+ Figure A.1-1 The Hello Finite State Machine Kinnear, Cole & Droms [Page 83] DRAFT July 1997 Key: 1: Link layer connection is established 2: Transition based upon the receipt of a Hello message (and whether the LS ID is found in the Rec ID portion of the message 3: Hello Interval * Dead Factor exceeded 4: Loss of link layer connectivity The LS to DCS connections are initialized into the down state. The numbers in the figure refer to the actions discussed in the Key that cause a transition in the HFSM (Note: These numbers didn't appear in the original figure in [2], and are TBD). The Hello protocol employs poll messages to monitor the status of the LS to DCS connections. The Hello messages contain the ID s of the DCS s that the LS has received a Hello message from. The LS' HFSM uses these ID s to determine the status of the HFSM for each of the DCS s. Multiple DCS ID s are present in order to support point-to-multipoint connections. The messages also contain two fields; the Polling Interval and the Dead Factor. The product of the Polling Interval and the Dead Factor determines the length of time that the HFSM will hold open a connec- tion without receiving a Hello from a peer DCS and transitioning the HFSM for that DCS to the Wait state. A.2 The SCSP "Cache Alignment" Sub-protocol The Cache Alignment protocol supports the initial server cache syn- chronization process of an LS with its DCSs. This process may occur at initial boot time of the server, at reconnect time of the server to the network, or other possible initialization or failure recovery scenarios. Like the Hello protocol, the Cache Alignment (CA) proto- col maintains a Cache Alignment Finite State Machine (CAFSM) for each of its DCSs to monitor the status of its cache alignment. The figure below shows the CAFSM and indicates some of the triggers that would cause the state transitions to occur. Kinnear, Cole & Droms [Page 84] DRAFT July 1997 +------------+ | | +--->| DOWN | | | | | +------------+ | | | | | V | +------------+ | |Master/Slave| |----| |<---+ | |Negotiation | | | +------------+ | | | | | | | | V | | +------------+ | | | Cache | | |----| |----| | | Summarize | | | +------------+ | | | | | | | | V | | +------------+ | | | Update | | |----| |----| | | Cache | | | +------------+ | | | | | | | | V | | +------------+ | | | | | +----| Aligned |----+ | | +------------+ Figure A.2-1 Cache Alignment Finite State Machine Key: 1: When HFSM reaches Bi-directional state Kinnear, Cole & Droms [Page 85] DRAFT July 1997 2: HFSM transitions out of Bi-directional state 3: Master/Slave relationship is established 4: Once both LS and DCS exchange CA messages, both with O-bit set to 0, then CRL is complete 5: E.g., Errored sequence number 6: Full cache update achieved (Note: The key numbers don't appear in the figure in [2],a and are TBD.) Each of the CAFSMs is coupled with the respective HFSMs in the LS. The CAFSM is initialized in the Down state. It transitions to the Master/Slave Negotiation state when the corresponding HFSM transi- tions to the Bi-Directional state. The CAFSM transitions back to the Down state in the event that the corresponding HFSM transitions out of the Bi-Directional state. In the Master/Slave state the LS-DCS pair negotiate who is to be the master of the connection during the cache alignment process. In the Cache Summary state the LS/DCS pair exchange Client State Advertise- ment Summary (CSAS) records within the CA messages. The servers use these message exchanges to build a Client State Advertisement Request List (CRL). The CRL indicates the portions of the respective server caches that are out of alignment. The cache mis-alignment (as indi- cated in the local CRL) is resolved in the Update Cache state where the servers exchange full client state information in CSA records within the CSU messages, only where mis-alignment occurs. Once the CRL is resolved, the LS/DCS caches are aligned and the CAFSM transi- tions to the Aligned state. The protocol further defines the high-level syntax of a generic CA message as discussed in a later section of this appendix. A.3 The SCSP "Client State Update" Sub-protocol Overview The purpose of the Client State Update (CSU) protocol is to provide a capability to constantly update the server caches through asyn- chronous CSU message exchanges. These updates are necessary because the status of the clients are in constant flux. Unlike the other two sub-protocols, the Client State Update protocol does not maintain a separate finite state machine. Instead, the activity of this proto- col is tied to the CAFSM. Each CSU can contain zero or more Client State Advertisement records. Kinnear, Cole & Droms [Page 86] DRAFT July 1997 The LS may send and receive CSUs when the corresponding CAFSM is in either the Aligned or the Cache Update states. The CSU protocol defines both CSU requests and reply messages. As consistent through- out the definition of the SCSP, the CSU protocol supports both point- to-point and point-to-multipoint connections. A.4 The SCSP Message Set Overview The structure of the SCSP messages is a)a fixed length, generic header, b) a SCSP message specific part header of variable length, c) an fixed length, message field and d) zero, one or more SCSP message specific records. This is shown in the following figure. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | type | Packet Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Checksum | Start of Extensions | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SCSP Message Specific part (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Protocol ID | SG ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | unused | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sender ID Len | Recvr ID Len | No. of Records | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sender ID (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Receiver ID (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SCSP Message Specific Records (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4-1 SCSP Message Format where o Version - is the version of the SCSP protocol defined in [2] o type - represents the SCSP message type, i.e., CA, Hello, CSU_Req, CSU_Reply, and CSU_Solicit o Packet Size - Kinnear, Cole & Droms [Page 87] DRAFT July 1997 The SCSP messages have identical syntax except for the 1) the SCSP message specific part header and 2) the SCSP message specific part record. The following table summarizes the content of these specific parts: Table A.4-1 SCSP Message Specific Parts | Hello | CA | CSUS | CSU_Req | CSU_Reply ------------------------------------------------------------------------ | | | | | SCSP mesg | hello int,|CSA Seq.No.| null | null | null spec header | dead fac.,| | | | | Family ID | | | | ------------------------------------------------------------------------ | | | | | SCSP mesg |Additional |CSAS Rec. | CSAS Rec. | CSA Rec. | CSAS Rec. spec record | Recvr ID | | | | | records | | | | The detailed formats of the various SCSP messages are given in [2]. However, two SCSP message specific records are of particular interest to the development of the DHCP interserver specification. These are: 1) the CSAS record and 2) the CSA record. The CSAS record is defined within the SCSP specification as: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hop Count | Record Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cache Key Len | Orig ID Len |N| unused | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSA Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cache Key (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Originator ID (variable) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4-2 SCSP CSAS Record Format See Section 8.4.1 for details. Kinnear, Cole & Droms [Page 88] DRAFT July 1997 The CSA record is defined within the SCSP specification as: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSAS Record | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Client/Server Protocol Specific Part Cache Entry | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure A.4-3 SCSP CSA Record Format The CSA records for the DHCP interserver mapping to SCSP are defined in Section 8.4.2. [end of document <draft-ietf-dhc-interserver-02.txt>] Kinnear, Cole & Droms [Page 89]