* Introduction There has been some discussion in the group about variable length messaging and streaming recently. Russ Vreeland sent out an attempt at defining these two topics. I'd like to bring some more thoughts to the table. * Variable Length Messages One can get a reasonable definition of variable length messages (VLMs) by viewing them as an extension of the fixed length/width messages provided in SCE-MI 1.x. SCE-MI 1.x messages are defined by one parameter, the port width. Even though SCE-MI 1.x does not limit the port width, in practice the available resources will limit the port width. Many transactions can be much larger in terms of number of bits than what can reasonably be supported by an individual SCE-MI 1.x port given the resource constraints. This leads to the need for segmenting transactions. Transactions are segmented into segments that are small enough to be handled by SCE-MI 1.x ports without unreasonable resource consumption. Given the fixed width nature of SCE-MI 1.x ports it is natural to segment the transactions into a series of equal sized segments. Obviously, the last segment will in general be a partial segment and some mechanism for dealing with this must be implemented. Segmented transactions can be divided into two categories: Fixed size transactions: Transactions are always the same size, i.e., the number of segments is constant. Obviously, it is assumed that the transactions are too large to map to a fixed width port without segmentation. Variable size transactions: The name says it all. The number of segments varies. Given that variable size transactions constitute a superset of fixed size transactions, it is not worth considering fixed size transactions any further. In SCE-MI 1.x this type of transaction segmentation and reassembly must be implemented at the application layer. This implies the creation and management of multiple SCE-MI messages per transaction as well as including the appropriate infrastructure on the hardware side to understand the relationship between a sequence of SCE-MI messages and the stream of transactions, most notably the delineation of a batch of SCE-MI messages. This implies the introduction of either in-band data, i.e., extra bits or fields in the individual messages, or out-of-band data, i.e., one or more auxiliary ports. The goal of variable length messaging is to encapsulate the segmentation and reassembly in the SCE-MI layer. The transaction can now be mapped to a single VLM message which simplifies the handling of transactions on both the software and hardware sides. VLM benefits include: No need to reinvent the wheel: segmentation/reassembly is provided by the infrastructure and does not have to be implemented at the application layer for each transactor requiring it. Simpler application code: less code to write, less code to debug, less code to maintain. Potential for improved performance: the implementation will have new opportunities for improving the message transfer performance as it now knows that the segments of a transaction are part of the same unit, i.e., the VLM. The performance gain possible is obviously highly dependent on the implementation, but could be quite significant. To understand why this claim can be made, consider single-word PCI accesses versus burst PCI accesses. Variable length messaging involves VLMs and ports that can accept VLMs. Ports will still have a port width, and this port width will be the message word width of the VLM. Hence, a VLM can be defined as a sequence of one or more message words where each word has the same fixed width. The SCE-MI 1.x messages are thus a degenerate case of VLM, i.e., a VLM that is one message word long. It is reasonable to assume that at most one message word is accessed per clock cycle. If it turned out the transactor needs to access more than one message word per clock cycle, the port can be widened, or a faster clock could be used to pump message words out faster. There is another use model of VLMs that has not been considered above. Transactions that map to multiple sequential words on the signal level interface such as PCI burst transactions can still be mapped to a single SCE-MI 1.x fixed length/width message if the maximum size of the transactions is small enough. In this case, individual signal level words are mapped to different fields of the message. The transaction can be fixed or variable size up to a maximum size. The transactor needs to implement muxing and counting logic to extract the individual signal level words from the message. Similar considerations exist for the opposite direction. In either case, a single SCE-MI 1.x message conveys the transaction. VLMs and VLM ports are a natural fit for this kind of scenario. They allow the elimination of the muxing/counting logic from the transactor by narrowing the port width to one that matches the signal level interface. In many cases, the signal level interface can now be connected directly to the message port. For simplicity, the message words will be assumed to be accessed sequentially in the following discussion. See the random access requirement note for other possibilities. There are two distinct modes in which variable length messaging could be used: Zero time: the whole message is transferred in zero simulation time, i.e., all the individual words of the message are transferred in zero time. Non-zero time: the message is transferred over some span of simulation time. These will be described in the following: ** Zero Time In this mode, all message words are transferred at the same simulation time. It can be assumed that message words are transferred sequentially (not necessarily in order, if random access is supported), because otherwise this would be a traditional SCE-MI 1.x fixed-length message. Processing of the words is also done at least in part using zero-time sequential operations. ** Non-Zero Time In the non-zero time mode, the message transfer consumes simulation time. There are at least two possibilities that fit within this category: Clocked: individual message words are transferred on the edge of a user clock. Unclocked: message words are transferred using a mechanism similar to the zero time mode, except that the message is not transferred completely at a given simulation time. In the clocked scenario, for simplicity it can be assumed that words clock out on the rising edge of the user clock, because one can always construct a user clock that allows this given the constraints of the application, e.g., if the application requires words to be clocked out on the falling edge of some user clock, one can construct a clock that is the inverse of this clock and use that to clock the words out. Further, message words do not have to transfer on every rising edge of the user clock. Using appropriate control mechanisms the message word transfer can be delayed to some future time. The unclocked scenario can be seen as a generalization of the non-zero time mode. In this scenario, the number of message words transferred at any given simulation time is variable, i.e, it could be 0, 1, or more than 1. In other words, this is like the zero time mode, except time is allowed to pass after only some of the words have transferred. As a special case of the unclocked scenario, a sub-scenario where times message word transfers happens coincide with the rising edges of a user clock can be considered. In other words, message word transfers do not happen at completely arbitrary times. At any given rising edge of the user clock 0, 1, or more messages may be trasnferred, but no messages transfer at any other time. Russ defined one type of VLMs as large messages transferred in zero simulation time. The above extends this concept to the non-zero time case as well. Russ' definition of streaming applies to the non-zero time case, but is not identical to it. Streaming will be discussed in the next section. * Streaming Typically streaming is defined as a situation where a producer and consumer are running concurrently as much as possible. In the non-streaming case, the producer and consumer takes turns running. The producer produces one or more items and hands them over to the consumer and then waits for the consumer to finish consuming before it produces another batch of items. In the streaming scenario, the producer runs continuously and hands off items to the consumer as they are produced. If the producer and consumer are well balanced and there is sufficient buffering in the channel between them, the times when either of them are stalled waiting for the other will be minimized and the overall run time will be reduced. In the context of SCE-MI 1.x, streaming applies to sequences of messages, i.e., the items are messages. The cclock is allowed to run while message are transferred, otherwise there is no benefit to streaming. Note, this represents a different clock control paradigm than `sane' clock control. It seems reasonable that a streaming clock control paradigm can be defined as an alternative to `sane' clock control. Russ pointed out the risks of non-deterministic behavior when streaming SCE-MI ports are connected directly to the DUT without intervening buffers. If buffers/FIFOs are included and clock control is used based on FIFO full/empty flags, then streaming can provide repeatable results without reducing the benefit of streaming too much. Note, the reduction in streaming performance is due to the occasional clock control activity. This activity can be reduced by adjusting the FIFO sizes. However, adjusting the FIFO sizes could lead to a change in the overall behavior of the test as message arrival times can change. This is another source of non-determinism that is introduced by streaming. In the context of VLMs as defined above, streaming also applies across individual words of a message. This only makes sense in the two non-zero time cases, i.e., uclock driven non-zero time, and cclock driven. Assuming sequential access, the implementation is free to transfer the words of the messages in non-zero time as the later words of the message are not needed right away. In both cases, clock control is used to guarantee repeatable results. Comparing the VLM streaming case to the SCE-MI 1.x streaming case, it should be clear that the former has pushed the FIFOs and associated clock control into the SCE-MI infrastructure. In terms of how the software side handles the streams, it appears reasonable to delegate this to the SCE-MI service loop. The actual production of input messages and consumption of output messages are done by the test, of course, but the service loop handles the actual transfer of messages and will be responsible for maintaining streaming. This is probably more effective of the test is multi-threaded and the service loop is running in its own thread, but nothing precludes the user from implementing this in a single-threaded test, although it may be tricky to obtain high streaming efficiency.