IM208

Shabtay’s take 04/28 - This is mostly addressed with automatic flush-on-eom mode and is quite high level.

Committee agrees.

Streaming/Reactivity

a) Is it a goal to create a standard that allows the IP provider to create transactors that do not need to know whether the system or the particular channel serving the transactor is streaming/reactive?

b) How will the user control reactivity/streaming?

** This is a consolidated proposal from JohnS dealing with IM208-211

As per my AI this week, I would like to make the following proposal

to try to solidify the semantics of pipes.

As I said in last week's meeting, this is largely a consolidation

of some of the clarifications in Per's e-mails to questions

raised by Shabtay concerning pipes.

Let me try to lay out all the issues here and if we can

all agree to these, perhaps we can close out IM's 208-211

and go a long way toward coming to agreement on the

transaction pipe proposal.

Requirements for transaction pipes:

1. [This addresses IM 210]

Determinism a "must requirement"

- Consumption of data from a receive pipe on the H/W side

or production of data to a send pipe will always occur

on the same clock cycles from one simulation to another

2. [This partially addresses IM 208 - user vs. vendor control

of optimization]

It is possible to implement pipes as a reference model

of source code built over basic DPI function calls.

As such they can be made to run on any DPI compliant

software simulator. Such a reference model would provide

a reactive implementation of pipes which could be used

as the basis for more optimized "builtin" implementations

that might deploy batching, streaming, and concurrency

optimizations.

It is an absolute requirement however that such optimizations

do not change functional and deterministic behavior of a

design that runs on the basic reactive reference model

implementation of pipes as described above.

In other words, code using a pipe interface must behave

identically whether running over the reactive "reference"

implementation or running over an optimized custom

implementation.

Within this constraint, vendors are free to perform

any optimizations of pipes that are appropriate

to their platform.

3. [This addresses IMs 208, 209]

Buffer depth is implementation specified. This allows

vendors to chose buffer depth that is optimal to

their platform. The flush mechanism is what gives

the user the chance to specify a "synchronization point"

to the infrastructure indicating that an HVL thread is

switching from a "streaming mode" where it is doing pipe

operations to a "reactive mode" where it is doing conventional

reactive DPI function call interactions.

As we said in the meeting last week, it is at this point

in where queries of the H/W simulator time will make

sense as well.

4. [This addresses IM 211]

Operation of pipes is identical whether successive

access ops (sends or receives) are done in 0-time

or over user clock time, i.e. 1 access per clock.

It is strictly a function of modeling subset as to

whether 0-time ops are supported or not. But the

pipe interface itself does not preclude this support.

It is useful to compare and contrast the semantics

of pipes to those of fifos. I think the reason that

we often stumble when discussing issues like

user vs. implementation specified buffer depth, its

effect on determinism, etc. is because people are

thinking of a fifo model rather than a pipe model.

Both pipes and fifos are deterministic and have similar

functions in term of providing buffered data throughput

capability. But they have different basic semantics.

Here is a small listing that tries to compare and

contrast the semantics of fifos vs. pipes:

Fifos

- Follows classical OSCI-TLM like FIFO model

- User specified fixed sized buffer depth

- Automatic synchronization

- Support blocking and non-blocking put/get ops

- "Under the hood" optimizations possible - batching

- No notion of a flush

Pipes

- Follows Unix stream model (future/past/present semantics)

- Implementation specified buffer depth

- User controlled synchronization

- Makes concurrency optimization more straightforward

- Support only blocking ops (for determinism)

- "Under the hood" optimizations possible - batching, concurrency

- More naturally supports data shaping, vlm, eom, flushing

One could argue that we may wish to entertain the notion

of a "dpi_fifo" reference library to augment the "dpi_pipe"

reference library currently proposed and thus provide two

alternative DPI extension libraries that are part of the

SCE-MI II proposal that address different sets of user needs.

But it is useful to make the clear distinction between

fifos and pipes and, for now, at least converge on the semantics

of proposed pipes and making sure they address the original

requirements of variable length messaging.

Just to augment to what I've said above, I would like to recap

some of Per's and other earlier comments regarding pipes. I've

just re-stated these so it is all in one convenient place.

The following text is verbose so read it as desired. The main

part of my proposal is the text above. This is just supportive

text reiterating its main points.

------------------------------------------

This ties in and clarifies #2 above:

Per Bojsen wrote:

> Note that the text said that concurrency could be introduced by the

> implementation as long as it does not alter behavior. So we've

> established and all agreed upon that the new DPI/function based subset

> of SCE-MI 2.0 is a system that uses alternating execution. This

> follows directly from the DPI definition. However, this applies

> only to the behavior of the system, not necessarily to what is actually

> going on under the hood. There are plenty of opportunities to

> optimize the transport and execution that does not change the behavior.

> This includes introducing some degree of concurrent operation. Do

> you agree that it does not matter that there is some degree of

> concurrent operation as long as it behaves exactly like a purely

> alternating system would? SCE-MI 2.0 will describe the semantics

> of the DPI/function based interface in terms of alternating execution.

> The implementation is compliant as long as it preserves this semantics.

> It does not matter one bit how the implementation achieves this

> under the hood, agreed?

------------------------------------------

This ties in and clarifies #3 above:

Per Bojsen wrote:

> > IM209 - We had some discussion about setting buffer depth for pipes. It

> > was my understanding that Mentor proposes setting the buffer depth by

> > the infrastructure and not by the user. Is this correct?

> This is my understanding as well. If the system is deterministic and

> observes alternating semantics, then I do not see any need for a user

> setting of buffer depth. This is because the buffer depth setting

> would have no observable impact on the behavior of the system. There

> are other problems with a user settable buffer depth: it is unlikely

> that a given buffer depth setting would achieve optimum performance

> in all implementations. Note, I am saying that I do not think a user

> setting for buffer depth should be in the SCE-MI 2.0 standard. However,

> any implementation is free to provide its own performance optimization

> knobs outside of the standard which could include buffer depth setting.

> I do not see such features as leading to a non-compliant implementation

> (necessarily).

------------------------------------------

This ties in with the comments about pipes vs. fifos semantics or

even vs. use of plain DPI calls and when a user would want one

over another:

Per Bojsen wrote:

> It is my understanding that pipes are intended for streaming, batching,

> variable length messages, and potentially can be used even for more

> exotic purposes if the modeling subset allows it. Given that pipes

> can be implemented at the application layer, the choice between using

> pipes and DPI is one of convenience in many cases. However, since an

> implementation can choose to provide an optimized version of the pipes,

> this would be a factor as well in the choice to use them.

------------------------------------------

This ties in with #3 above:

John Stickley wrote:

> johnS:

> I think the main point here is that it does not matter who

> sets the delay [buffer depth] or what the delay is so long as

> there is a mechanism

> to re-synchronize the times of the pipe producer and pipe consumer

> if it becomes necessary to enter back into a mode of reactive

> (alternating) interactions. This is the purpose of the

> flush call - to provide this re-synchronization.

> For example, a pipe producer thread can be sitting there jamming

> transactions into a pipe to its heart's content. The consumer

> meanwhile is only consuming transactions which the producer

> had written well into the past.

> So in this scenario at any given point the consumer's time,

> the producer is well into the future - how far into it,

> we don't care.

> Or, put differently, at any given point in the producer's time,

> the consumer is well into the past. How far into it,

> we don't care.

> But suppose producer and consumer now want to interact reactively,

> say with plain DPI function call interactions. They must synchronize.

> i.e. the producer's present must become one and the same as

> the consumer's present. To do this, producer issues a flush.

> This guarantees that the producer thread blocks until all the

> future transactions have dissipated to the consumer and now the

> two are synchronized in time. At this point in time, the two have a

> common present and are free to communicate reactively. And

> all this can be done deterministically where interactions

> take place on the same clocks on timed side regardless of

> how much buffering an implementation provided or how much

> concurrency it chose to use.

** End of JohnS proposal

Pure DPI is alternating. 1.1 has no preference and could be either alternating or concurrent. reactivity === alternating. These terms are being used to mean the same thing.

Function calls are blocking. The call occurs in zero time, even though time may be consumed in the function called.

A 2.0 implementation must support concurrency if 1.1 models require it.

For streaming, only supported by models that are pure sources or pure syncs. Any concurrency can be added to an alternating system so long as it does not alter behavior. In these cases an alternating behavior is the benchmark. These are all viewed as implementation optimizations and should not be specified or mentioned in the specification.

Shabtay> We need to evaluate if concurrency does not introduce lack of compliance among various simulation and emulation environments. I don't see how the rate that messages are sourced or synced could be the same when concurrency is used. Does it?

Per> Note that the text said that concurrency could be introduced by the implementation as long as it does not alter behavior. So we've established and all agreed upon that the new DPI/function based subset of SCE-MI 2.0 is a system that uses alternating execution. This follows directly from the DPI definition. However, this applies only to the behavior of the system, not necessarily to what is actually going on under the hood. There are plenty of opportunities to optimize the transport and execution that does not change the behavior. This includes introducing some degree of concurrent operation. Do you agree that it does not matter that there is some degree of concurrent operation as long as it behaves exactly like a purely alternating system would? SCE-MI 2.0 will describe the semantics of the DPI/function based interface in terms of alternating execution. The implementation is compliant as long as it preserves this semantics. It does not matter one bit how the implementation achieves this under the hood, agreed?

Shabtay>> What is done under the hood is left to the implementers. What I care about is maintaining determinism when using all engines including simulation. Let's simply table that.

JohnS>> Yes. The idea is to maintain determinism, and therefore consistent behavior regardless of optimizations put "under the hood".

Batching is the aggregation of messages to improve communications.

Streaming will not be required by the spec.