Hi, I added a couple of paragraphs to my notes. Please disregard the previous email. If you started reading it (vain hope, I know ;-) the new paragraphs are inserted just below the paragraph that ends 'used to pump message words out faster' around line 82. Here are the updated notes: * Introduction There has been some discussion in the group about variable length messaging and streaming recently. Russ Vreeland sent out an attempt at defining these two topics. I'd like to bring some more thoughts to the table. * Variable Length Messages One can get a reasonable definition of variable length messages (VLMs) by viewing them as an extension of the fixed length/width messages provided in SCE-MI 1.x. SCE-MI 1.x messages are defined by one parameter, the port width. Even though SCE-MI 1.x does not limit the port width, in practice the available resources will limit the port width. Many transactions can be much larger in terms of number of bits than what can reasonably be supported by an individual SCE-MI 1.x port given the resource constraints. This leads to the need for segmenting transactions. Transactions are segmented into segments that are small enough to be handled by SCE-MI 1.x ports without unreasonable resource consumption. Given the fixed width nature of SCE-MI 1.x ports it is natural to segment the transactions into a series of equal sized segments. Obviously, the last segment will in general be a partial segment and some mechanism for dealing with this must be implemented. Segmented transactions can be divided into two categories: Fixed size transactions: Transactions are always the same size, i.e., the number of segments is constant. Obviously, it is assumed that the transactions are too large to map to a fixed width port without segmentation. Variable size transactions: The name says it all. The number of segments varies. Given that variable size transactions constitute a superset of fixed size transactions, it is not worth considering fixed size transactions any further. In SCE-MI 1.x this type of transaction segmentation and reassembly must be implemented at the application layer. This implies the creation and management of multiple SCE-MI messages per transaction as well as including the appropriate infrastructure on the hardware side to understand the relationship between a sequence of SCE-MI messages and the stream of transactions, most notably the delineation of a batch of SCE-MI messages. This implies the introduction of either in-band data, i.e., extra bits or fields in the individual messages, or out-of-band data, i.e., one or more auxiliary ports. The goal of variable length messaging is to encapsulate the segmentation and reassembly in the SCE-MI layer. The transaction can now be mapped to a single VLM message which simplifies the handling of transactions on both the software and hardware sides. VLM benefits include: No need to reinvent the wheel: segmentation/reassembly is provided by the infrastructure and does not have to be implemented at the application layer for each transactor requiring it. Simpler application code: less code to write, less code to debug, less code to maintain. Potential for improved performance: the implementation will have new opportunities for improving the message transfer performance as it now knows that the segments of a transaction are part of the same unit, i.e., the VLM. The performance gain possible is obviously highly dependent on the implementation, but could be quite significant. To understand why this claim can be made, consider single-word PCI accesses versus burst PCI accesses. Variable length messaging involves VLMs and ports that can accept VLMs. Ports will still have a port width, and this port width will be the message word width of the VLM. Hence, a VLM can be defined as a sequence of one or more message words where each word has the same fixed width. The SCE-MI 1.x messages are thus a degenerate case of VLM, i.e., a VLM that is one message word long. It is reasonable to assume that at most one message word is accessed per clock cycle. If it turned out the transactor needs to access more than one message word per clock cycle, the port can be widened, or a faster clock could be used to pump message words out faster. There is another use model of VLMs that has not been considered above. Transactions that map to multiple sequential words on the signal level interface such as PCI burst transactions can still be mapped to a single SCE-MI 1.x fixed length/width message if the maximum size of the transactions is small enough. In this case, individual signal level words are mapped to different fields of the message. The transaction can be fixed or variable size up to a maximum size. The transactor needs to implement muxing and counting logic to extract the individual signal level words from the message. Similar considerations exist for the opposite direction. In either case, a single SCE-MI 1.x message conveys the transaction. VLMs and VLM ports are a natural fit for this kind of scenario. They allow the elimination of the muxing/counting logic from the transactor by narrowing the port width to one that matches the signal level interface. In many cases, the signal level interface can now be connected directly to the message port. For simplicity, the message words will be assumed to be accessed sequentially in the following discussion. See the random access requirement note for other possibilities. There are two distinct modes in which variable length messaging could be used: Uclock driven: individual message words are transferred on uclock edges. Cclock driven: individual message words are transferred on a selected cclock's edges. These will be described in the following: ** Uclock Driven In this mode, message words are transferred on the rising edge of uclock. It does not appear reasonable to consider falling edge transfers as most everything in SCE-MI is defined relative to the rising edge of uclock. Each message word is transferred using the dual ready protocol. This means that the transfer may stall at any time due to data not being available. Since the uclock does not stop, the transfer must stall. This is a direct extension of SCE-MI 1.x fixed length/width messaging, so is not surprising. In the uclock driven mode, the complete message may be transferred in zero time (cclocks are stopped during the whole transfer) or non-zero time (cclocks are running during the transfer) depending on how the transactor controls the cclocks. ** Cclock Driven In the cclock driven mode, it makes sense to take advantage of clock control such that the stalls that can occur in the uclock driven mode can be hidden from the user of the port. For an input port, this means that next message word of a message is always available on the next rising edge of the cclock, unless the current message word is the last word of the message, of course. The SCE-MI infrastructure will automatically issue clock control when the port is starving for data. Similarly, an output port will always be able to accept the next message word on the next rising edge of the cclock. The SCE-MI infrastructure will automatically issue clock control when the port gets behind in transferring the data to the software side. In the cclock driven mode, the complete transfer always happens in non-zero time. Russ' definition of VLM corresponds to the uclock driven mode with the message transferred in zero time. Russ' definition of streaming applies to the uclock driven mode with the message transferred in non-zero time and the cclock driven mode. Streaming will be discussed in the next section. * Streaming Typically streaming is defined as a situation where a producer and consumer are running concurrently as much as possible. In the non-streaming case, the producer and consumer takes turns running. The producer produces one or more items and hands them over to the consumer and then waits for the consumer to finish consuming before it produces another batch of items. In the streaming scenario, the producer runs continuously and hands off items to the consumer as they are produced. If the producer and consumer are well balanced and there is sufficient buffering in the channel between them, the times when either of them are stalled waiting for the other will be minimized and the overall run time will be reduced. In the context of SCE-MI 1.x, streaming applies to sequences of messages, i.e., the items are messages. The cclock is allowed to run while message are transferred, otherwise there is no benefit to streaming. Note, this represents a different clock control paradigm than `sane' clock control. It seems reasonable that a streaming clock control paradigm can be defined as an alternative to `sane' clock control. Russ pointed out the risks of non-deterministic behavior when streaming SCE-MI ports are connected directly to the DUT without intervening buffers. If buffers/FIFOs are included and clock control is used based on FIFO full/empty flags, then streaming can provide repeatable results without reducing the benefit of streaming too much. Note, the reduction in streaming performance is due to the occasional clock control activity. This activity can be reduced by adjusting the FIFO sizes. However, adjusting the FIFO sizes could lead to a change in the overall behavior of the test as message arrival times can change. This is another source of non-determinism that is introduced by streaming. In the context of VLMs as defined above, streaming also applies across individual words of a message. This only makes sense in the two non-zero time cases, i.e., uclock driven non-zero time, and cclock driven. Assuming sequential access, the implementation is free to transfer the words of the messages in non-zero time as the later words of the message are not needed right away. In both cases, clock control is used to guarantee repeatable results. Comparing the VLM streaming case to the SCE-MI 1.x streaming case, it should be clear that the former has pushed the FIFOs and associated clock control into the SCE-MI infrastructure. In terms of how the software side handles the streams, it appears reasonable to delegate this to the SCE-MI service loop. The actual production of input messages and consumption of output messages are done by the test, of course, but the service loop handles the actual transfer of messages and will be responsible for maintaining streaming. This is probably more effective of the test is multi-threaded and the service loop is running in its own thread, but nothing precludes the user from implementing this in a single-threaded test, although it may be tricky to obtain high streaming efficiency. Per -- Per Bojsen Email: <bojsen@zaiqtech.com> Zaiq Technologies, Inc. WWW: http://www.zaiqtech.com 78 Dragon Ct. Tel: 781 721 8229 Woburn, MA 01801 Fax: 781 932 7488Received on Wed Mar 9 14:00:38 2005
This archive was generated by hypermail 2.1.8 : Wed Mar 09 2005 - 14:00:40 PST