* Introduction

There has been some discussion in the group about variable length
messaging and streaming recently.  Russ Vreeland sent out an attempt
at defining these two topics.  I'd like to bring some more thoughts to
the table.

* Variable Length Messages

One can get a reasonable definition of variable length messages (VLMs)
by viewing them as an extension of the fixed length/width messages
provided in SCE-MI 1.x.  SCE-MI 1.x messages are defined by one
parameter, the port width.  Even though SCE-MI 1.x does not limit the
port width, in practice the available resources will limit the port
width.  Many transactions can be much larger in terms of number of
bits than what can reasonably be supported by an individual SCE-MI 1.x
port given the resource constraints.  This leads to the need for
segmenting transactions.

Transactions are segmented into segments that are small enough to be
handled by SCE-MI 1.x ports without unreasonable resource consumption.
Given the fixed width nature of SCE-MI 1.x ports it is natural to
segment the transactions into a series of equal sized segments.
Obviously, the last segment will in general be a partial segment and
some mechanism for dealing with this must be implemented.

Segmented transactions can be divided into two categories:

  Fixed size transactions: Transactions are always the same size,
    i.e., the number of segments is constant.  Obviously, it is
    assumed that the transactions are too large to map to a fixed
    width port without segmentation.

  Variable size transactions: The name says it all.  The number of
    segments varies.

Given that variable size transactions constitute a superset of fixed
size transactions, it is not worth considering fixed size transactions
any further.

In SCE-MI 1.x this type of transaction segmentation and reassembly
must be implemented at the application layer.  This implies the
creation and management of multiple SCE-MI messages per transaction as
well as including the appropriate infrastructure on the hardware side
to understand the relationship between a sequence of SCE-MI messages
and the stream of transactions, most notably the delineation of a
batch of SCE-MI messages.  This implies the introduction of either
in-band data, i.e., extra bits or fields in the individual messages,
or out-of-band data, i.e., one or more auxiliary ports.

The goal of variable length messaging is to encapsulate the
segmentation and reassembly in the SCE-MI layer.  The transaction can
now be mapped to a single VLM message which simplifies the handling of
transactions on both the software and hardware sides.  VLM benefits
include:

  No need to reinvent the wheel: segmentation/reassembly is provided
    by the infrastructure and does not have to be implemented at the
    application layer for each transactor requiring it.

  Simpler application code: less code to write, less code to debug,
    less code to maintain.

  Potential for improved performance: the implementation will have new
    opportunities for improving the message transfer performance as it
    now knows that the segments of a transaction are part of the same
    unit, i.e., the VLM.  The performance gain possible is obviously
    highly dependent on the implementation, but could be quite
    significant.  To understand why this claim can be made, consider
    single-word PCI accesses versus burst PCI accesses.

Variable length messaging involves VLMs and ports that can accept
VLMs.  Ports will still have a port width, and this port width will be
the message word width of the VLM.  Hence, a VLM can be defined as a
sequence of one or more message words where each word has the same
fixed width.  The SCE-MI 1.x messages are thus a degenerate case of
VLM, i.e., a VLM that is one message word long.  It is reasonable to
assume that at most one message word is accessed per clock cycle.
If it turned out the transactor needs to access more than one message
word per clock cycle, the port can be widened, or a faster clock could
be used to pump message words out faster.

There is another use model of VLMs that has not been considered above.
Transactions that map to multiple sequential words on the signal level
interface such as PCI burst transactions can still be mapped to a
single SCE-MI 1.x fixed length/width message if the maximum size of
the transactions is small enough.  In this case, individual signal
level words are mapped to different fields of the message.  The
transaction can be fixed or variable size up to a maximum size.  The
transactor needs to implement muxing and counting logic to extract the
individual signal level words from the message.  Similar
considerations exist for the opposite direction.  In either case, a
single SCE-MI 1.x message conveys the transaction.

VLMs and VLM ports are a natural fit for this kind of scenario.  They
allow the elimination of the muxing/counting logic from the transactor
by narrowing the port width to one that matches the signal level
interface.  In many cases, the signal level interface can now be
connected directly to the message port.

For simplicity, the message words will be assumed to be accessed
sequentially in the following discussion.  See the random access
requirement note for other possibilities.

There are two distinct modes in which variable length messaging could
be used:

  Zero time: the whole message is transferred in zero simulation time,
    i.e., all the individual words of the message are transferred in
    zero time.

  Non-zero time: the message is transferred over some span of
    simulation time.

These will be described in the following:

** Zero Time

In this mode, all message words are transferred at the same simulation
time.  It can be assumed that message words are transferred
sequentially (not necessarily in order, if random access is
supported), because otherwise this would be a traditional SCE-MI 1.x
fixed-length message.  Processing of the words is also done at least
in part using zero-time sequential operations.

** Non-Zero Time

In the non-zero time mode, the message transfer consumes simulation
time.  There are at least two possibilities that fit within this
category:

  Clocked: individual message words are transferred on the edge of a
    user clock.

  Unclocked: message words are transferred using a mechanism similar
    to the zero time mode, except that the message is not transferred
    completely at a given simulation time.

In the clocked scenario, for simplicity it can be assumed that words
clock out on the rising edge of the user clock, because one can always
construct a user clock that allows this given the constraints of the
application, e.g., if the application requires words to be clocked out
on the falling edge of some user clock, one can construct a clock that
is the inverse of this clock and use that to clock the words out.

Further, message words do not have to transfer on every rising edge of
the user clock.  Using appropriate control mechanisms the message
word transfer can be delayed to some future time.

The unclocked scenario can be seen as a generalization of the
non-zero time mode.  In this scenario, the number of message words
transferred at any given simulation time is variable, i.e, it could be
0, 1, or more than 1.  In other words, this is like the zero time
mode, except time is allowed to pass after only some of the words have
transferred.

As a special case of the unclocked scenario, a sub-scenario where
times message word transfers happens coincide with the rising edges of
a user clock can be considered.  In other words, message word
transfers do not happen at completely arbitrary times.  At any given
rising edge of the user clock 0, 1, or more messages may be
trasnferred, but no messages transfer at any other time.

Russ defined one type of VLMs as large messages transferred in zero
simulation time.  The above extends this concept to the non-zero time
case as well.  Russ' definition of streaming applies to the non-zero
time case, but is not identical to it.  Streaming will be discussed in
the next section.

* Streaming

Typically streaming is defined as a situation where a producer and
consumer are running concurrently as much as possible.  In the
non-streaming case, the producer and consumer takes turns running.
The producer produces one or more items and hands them over to the
consumer and then waits for the consumer to finish consuming before it
produces another batch of items.  In the streaming scenario, the
producer runs continuously and hands off items to the consumer as they
are produced.  If the producer and consumer are well balanced and
there is sufficient buffering in the channel between them, the times
when either of them are stalled waiting for the other will be
minimized and the overall run time will be reduced.

In the context of SCE-MI 1.x, streaming applies to sequences of
messages, i.e., the items are messages.  The cclock is allowed to run
while message are transferred, otherwise there is no benefit to
streaming.  Note, this represents a different clock control paradigm
than `sane' clock control.  It seems reasonable that a streaming clock
control paradigm can be defined as an alternative to `sane' clock
control.

Russ pointed out the risks of non-deterministic behavior when
streaming SCE-MI ports are connected directly to the DUT without
intervening buffers.  If buffers/FIFOs are included and clock control
is used based on FIFO full/empty flags, then streaming can provide
repeatable results without reducing the benefit of streaming too
much.  Note, the reduction in streaming performance is due to the
occasional clock control activity.  This activity can be reduced by
adjusting the FIFO sizes.  However, adjusting the FIFO sizes could lead
to a change in the overall behavior of the test as message arrival
times can change.  This is another source of non-determinism that is
introduced by streaming.

In the context of VLMs as defined above, streaming also applies across
individual words of a message.  This only makes sense in the two
non-zero time cases, i.e., uclock driven non-zero time, and cclock
driven.  Assuming sequential access, the implementation is free to
transfer the words of the messages in non-zero time as the later words
of the message are not needed right away.  In both cases, clock
control is used to guarantee repeatable results.

Comparing the VLM streaming case to the SCE-MI 1.x streaming case, it
should be clear that the former has pushed the FIFOs and associated
clock control into the SCE-MI infrastructure.

In terms of how the software side handles the streams, it appears
reasonable to delegate this to the SCE-MI service loop.  The actual
production of input messages and consumption of output messages are
done by the test, of course, but the service loop handles the actual
transfer of messages and will be responsible for maintaining
streaming.  This is probably more effective of the test is
multi-threaded and the service loop is running in its own thread, but
nothing precludes the user from implementing this in a single-threaded
test, although it may be tricky to obtain high streaming efficiency.