RE: Some thoughts on variable length messages and streaming

From: Bojsen, Per <bojsen_at_.....> Date: Wed Mar 09 2005 - 14:00:46 PST · This archive was generated by hypermail 2.1.8 : Wed Mar 09 2005 - 14:00:40 PST

Hi,

I added a couple of paragraphs to my notes.  Please disregard the previous
email.  If you started reading it (vain hope, I know ;-) the new paragraphs
are inserted just below the paragraph that ends 'used to pump message
words out faster' around line 82.

Here are the updated notes:

* Introduction

There has been some discussion in the group about variable length
messaging and streaming recently.  Russ Vreeland sent out an attempt
at defining these two topics.  I'd like to bring some more thoughts to
the table.

* Variable Length Messages

One can get a reasonable definition of variable length messages (VLMs)
by viewing them as an extension of the fixed length/width messages
provided in SCE-MI 1.x.  SCE-MI 1.x messages are defined by one
parameter, the port width.  Even though SCE-MI 1.x does not limit the
port width, in practice the available resources will limit the port
width.  Many transactions can be much larger in terms of number of
bits than what can reasonably be supported by an individual SCE-MI 1.x
port given the resource constraints.  This leads to the need for
segmenting transactions.

Transactions are segmented into segments that are small enough to be
handled by SCE-MI 1.x ports without unreasonable resource consumption.
Given the fixed width nature of SCE-MI 1.x ports it is natural to
segment the transactions into a series of equal sized segments.
Obviously, the last segment will in general be a partial segment and
some mechanism for dealing with this must be implemented.

Segmented transactions can be divided into two categories:

  Fixed size transactions: Transactions are always the same size,
    i.e., the number of segments is constant.  Obviously, it is
    assumed that the transactions are too large to map to a fixed
    width port without segmentation.

  Variable size transactions: The name says it all.  The number of
    segments varies.

Given that variable size transactions constitute a superset of fixed
size transactions, it is not worth considering fixed size transactions
any further.

In SCE-MI 1.x this type of transaction segmentation and reassembly
must be implemented at the application layer.  This implies the
creation and management of multiple SCE-MI messages per transaction as
well as including the appropriate infrastructure on the hardware side
to understand the relationship between a sequence of SCE-MI messages
and the stream of transactions, most notably the delineation of a
batch of SCE-MI messages.  This implies the introduction of either
in-band data, i.e., extra bits or fields in the individual messages,
or out-of-band data, i.e., one or more auxiliary ports.

The goal of variable length messaging is to encapsulate the
segmentation and reassembly in the SCE-MI layer.  The transaction can
now be mapped to a single VLM message which simplifies the handling of
transactions on both the software and hardware sides.  VLM benefits
include:

  No need to reinvent the wheel: segmentation/reassembly is provided
    by the infrastructure and does not have to be implemented at the
    application layer for each transactor requiring it.

  Simpler application code: less code to write, less code to debug,
    less code to maintain.

  Potential for improved performance: the implementation will have new
    opportunities for improving the message transfer performance as it
    now knows that the segments of a transaction are part of the same
    unit, i.e., the VLM.  The performance gain possible is obviously
    highly dependent on the implementation, but could be quite
    significant.  To understand why this claim can be made, consider
    single-word PCI accesses versus burst PCI accesses.

Variable length messaging involves VLMs and ports that can accept
VLMs.  Ports will still have a port width, and this port width will be
the message word width of the VLM.  Hence, a VLM can be defined as a
sequence of one or more message words where each word has the same
fixed width.  The SCE-MI 1.x messages are thus a degenerate case of
VLM, i.e., a VLM that is one message word long.  It is reasonable to
assume that at most one message word is accessed per clock cycle.
If it turned out the transactor needs to access more than one message
word per clock cycle, the port can be widened, or a faster clock could
be used to pump message words out faster.

There is another use model of VLMs that has not been considered above.
Transactions that map to multiple sequential words on the signal level
interface such as PCI burst transactions can still be mapped to a
single SCE-MI 1.x fixed length/width message if the maximum size of
the transactions is small enough.  In this case, individual signal
level words are mapped to different fields of the message.  The
transaction can be fixed or variable size up to a maximum size.  The
transactor needs to implement muxing and counting logic to extract the
individual signal level words from the message.  Similar
considerations exist for the opposite direction.  In either case, a
single SCE-MI 1.x message conveys the transaction.

VLMs and VLM ports are a natural fit for this kind of scenario.  They
allow the elimination of the muxing/counting logic from the transactor
by narrowing the port width to one that matches the signal level
interface.  In many cases, the signal level interface can now be
connected directly to the message port.

For simplicity, the message words will be assumed to be accessed
sequentially in the following discussion.  See the random access
requirement note for other possibilities.

There are two distinct modes in which variable length messaging could
be used:

  Uclock driven: individual message words are transferred on uclock
    edges.

  Cclock driven: individual message words are transferred on a
    selected cclock's edges.

These will be described in the following:

** Uclock Driven

In this mode, message words are transferred on the rising edge of
uclock.  It does not appear reasonable to consider falling edge
transfers as most everything in SCE-MI is defined relative to the
rising edge of uclock.  Each message word is transferred using the
dual ready protocol.  This means that the transfer may stall at any
time due to data not being available.  Since the uclock does not stop,
the transfer must stall.  This is a direct extension of SCE-MI 1.x
fixed length/width messaging, so is not surprising.

In the uclock driven mode, the complete message may be transferred in
zero time (cclocks are stopped during the whole transfer) or non-zero
time (cclocks are running during the transfer) depending on how the
transactor controls the cclocks.

** Cclock Driven

In the cclock driven mode, it makes sense to take advantage of clock
control such that the stalls that can occur in the uclock driven mode
can be hidden from the user of the port.  For an input port, this
means that next message word of a message is always available on the
next rising edge of the cclock, unless the current message word is the
last word of the message, of course.  The SCE-MI infrastructure will
automatically issue clock control when the port is starving for data.

Similarly, an output port will always be able to accept the next
message word on the next rising edge of the cclock.  The SCE-MI
infrastructure will automatically issue clock control when the port
gets behind in transferring the data to the software side.

In the cclock driven mode, the complete transfer always happens in
non-zero time.

Russ' definition of VLM corresponds to the uclock driven mode with the
message transferred in zero time.  Russ' definition of streaming
applies to the uclock driven mode with the message transferred in
non-zero time and the cclock driven mode.  Streaming will be discussed
in the next section.

* Streaming

Typically streaming is defined as a situation where a producer and
consumer are running concurrently as much as possible.  In the
non-streaming case, the producer and consumer takes turns running.
The producer produces one or more items and hands them over to the
consumer and then waits for the consumer to finish consuming before it
produces another batch of items.  In the streaming scenario, the
producer runs continuously and hands off items to the consumer as they
are produced.  If the producer and consumer are well balanced and
there is sufficient buffering in the channel between them, the times
when either of them are stalled waiting for the other will be
minimized and the overall run time will be reduced.

In the context of SCE-MI 1.x, streaming applies to sequences of
messages, i.e., the items are messages.  The cclock is allowed to run
while message are transferred, otherwise there is no benefit to
streaming.  Note, this represents a different clock control paradigm
than `sane' clock control.  It seems reasonable that a streaming clock
control paradigm can be defined as an alternative to `sane' clock
control.

Russ pointed out the risks of non-deterministic behavior when
streaming SCE-MI ports are connected directly to the DUT without
intervening buffers.  If buffers/FIFOs are included and clock control
is used based on FIFO full/empty flags, then streaming can provide
repeatable results without reducing the benefit of streaming too
much.  Note, the reduction in streaming performance is due to the
occasional clock control activity.  This activity can be reduced by
adjusting the FIFO sizes.  However, adjusting the FIFO sizes could lead
to a change in the overall behavior of the test as message arrival
times can change.  This is another source of non-determinism that is
introduced by streaming.

In the context of VLMs as defined above, streaming also applies across
individual words of a message.  This only makes sense in the two
non-zero time cases, i.e., uclock driven non-zero time, and cclock
driven.  Assuming sequential access, the implementation is free to
transfer the words of the messages in non-zero time as the later words
of the message are not needed right away.  In both cases, clock
control is used to guarantee repeatable results.

Comparing the VLM streaming case to the SCE-MI 1.x streaming case, it
should be clear that the former has pushed the FIFOs and associated
clock control into the SCE-MI infrastructure.

In terms of how the software side handles the streams, it appears
reasonable to delegate this to the SCE-MI service loop.  The actual
production of input messages and consumption of output messages are
done by the test, of course, but the service loop handles the actual
transfer of messages and will be responsible for maintaining
streaming.  This is probably more effective of the test is
multi-threaded and the service loop is running in its own thread, but
nothing precludes the user from implementing this in a single-threaded
test, although it may be tricky to obtain high streaming efficiency.

Per

-- 
Per Bojsen                                Email: <bojsen@zaiqtech.com> 
Zaiq Technologies, Inc.                   WWW:   http://www.zaiqtech.com 
78 Dragon Ct.                             Tel:   781 721 8229 
Woburn, MA 01801                          Fax:   781 932 7488