Re: Clarification Questions, RE: Mentor-Proposed SCE-MI 2.0

From: John Stickley <John_Stickley_at_.....> Date: Wed Jun 22 2005 - 14:34:46 PDT · This archive was generated by hypermail 2.1.8 : Wed Jun 22 2005 - 14:36:58 PDT

Matt,

Sorry I missed your request to have these 16 points answered
in the huge stream of e-mail I had last Friday (left over from
DAC). Your e-mail of today pointed it out again - this time I saw it !

None of these appear to be big issues in the Mentor proposal.
I've embedded short answers although I think there will be ample
time to develop these concepts in more detail once a proposal is
decided as a starting point.

Matt Kopser wrote:
> 
>      Clarification questions in the Mentor proposal for SCE-MI 2
>         (questions refer to version 1.1 of Mentor's proposal)
> 
> Key Questions:
> =============
> 
> 1) Compliance, IP Verification
> 
> Mentor's SCE-MI 2.0 proposal is based on a subset of DPI.  How will an
> IP
> vendor verify that their IP complies with the recommended DPI subset, on
> a simulator the provides fill DPI support?
> 
> Does Mentor propose implementation of a reference DPI implementation
> (that restricts usage to the suggested subset) for verifying interface
> compliance?
> 
> Some aspects of the restricted subset can be verified statically (for
> example, adherence to restricted type set), but recommended limitations
> on function call stack, for example, can only be verified dynamically.
> How are these limitations to be verified by IP providers?

johnS:
As with the implementation of any standard, I think it is up to
the vendor to insure compliance to some degree. Some of the compliancy
can be checked at compile time, some at run-time. We can probably
get into more specifics once a proposal has been selected as a starting
point.

SCE-MI 1.1 has many of the same issues.

> 
> 2) Standard DPI
> 
> DPI being an Accellera approved System Verilog standard implies that it
> should be attempted in its entirety on all simulation/acceleration
> platforms.  Does the Mentor proposal entail that vendors should support
> two implementations on acceleration/emulation (one based on the Mentor
> proposal and one based on complete Accellera standard), or that the DPI
> implementation on Acceleration/emulation will exclusively be based on
> the
> Mentor proposal?

johnS:
I don't buy this argument. That is like saying all synthesis tools
must be required to synthesize all of Verilog rather than just
the RTL subset.

I would hope, over time that perhaps the SCE-MI could move toward full
compliancy with DPI but if this is not practical, I certainly don't
see it as a limitation.

Furthermore, the nice thing about a subset of a standard rather than
a completely different standard is that the models done in a subset
will always run on an implementation that supports the full standard
just as Verilog RTL models all run on full blown Verilog simulators,
so too will SCE-MI 2.0 models run on a DPI compliant S/W simulator.

> 
> 3) DPI Types, VHDL and pure Verilog hardware side
> 
> The proposal specifies a suggested subset of SystemVerilog types and
> their associated mapping to 'C' types on the software side.  What is
> suggested for type support in pure Verilog and VHDL?

johnS:
I think we make it clear in our proposal the types we recommend.
Basically integer based scalar types, bits, and bit vectors. Again,
we can work out details on this in the forum of the committee.

> 
> How would transactors that are written in VHDL for example be tested in
> simulation mode given that that Mentor proposed extensions are not
> approved for VHDL simulation?

johnS:
Every implementation of a standard must have compliance checking of some 
sort. In general, the infrastructure linker should be able to
check for compliant data types.

> 
> 4) Streaming/Reactivity
> 
> What will happen if an IP provider creates an IP for simulation that is
> limited to the Mentor data types but does not use transaction pipes.
> Can
> such transactors be used in streaming and reactive mode at user's
> control?

johnS:
Using the reactive (function call) interface will always be in a
reactive mode.

If the performance potentially offered by a streaming interface
is desired then pipes should be used.

For example, an MPEG transactor implementor may wish to be write it
to use pipes so MPEG frames can be streamed to from the TB to the
transactor.

If an implementation is used with that model that provides that
streaming optimization in the pipes, that model would benefit
automatically from streaming with no changes.

Additionally we see a possible useful extension would be to
provide a reactivity control (knob ?) for transaction pipes such that
the user stipulates that a given pipe should run reactively even if
the implementation is capable of streaming or if other pipes
are streaming.

This is a possibility that can be discussed in the forum of
the committee.

> 
> Alternatively, can an IP provider create IP that will work in both a
> streaming and non-streaming (reactive) use model, using the proposed
> pipe
> mechanism at user's control?

johnS:
See above.

> 
> 5) Threading Requirements
> 
> In an environment utilizing the proposed pipe mechanism, it appears that
> the proposal requires that the user's (test) C code be threaded.  Is
> this
> the case?
> 
> If not, then how is the pipe flush mechanism supposed to 'yield' control
> to the C side?  (In pure DPI, if the user's C code is running -- thus
> giving the user the opportunity to call the pipe flush -- the System
> Verilog side of the system has already yielded control to the C side.
> The only means for any other C code to run is through the user of
> threads.

johnS:
We're moving the concern of this from the user level to the vendor level.
Transaction pipes can be implemented as

     "vendor writes reusable pipe code once for a given threading system,
      user reuses many times"

rather than,

     "User writes many times and never reuses."

i.e.
rather than writing distinct dedicated proxies for each transactor or
BFM type that have to deal explicitly with threads as is required by the 
callback model.

We're willing to offer a reference model for how pipes can be
implemented as a fixed, "write once" implementation over straight
compliant DPI.

This model could be adapted by vendors to provide "builtin" versions
that are optimized for different vendors architectures and offer the
enhanced performance of streaming and concurrent execution where
relevant.

> 
> 6) Time, Cycle Stamping
> 
> How is the passage of 'time' to be tracked on the hardware side?  Since
> there is no explicit mention of time in the proposal, does this mean
> that there must be an implicit 1/1 (SCE-MI 1.1) controlled clock, that
> increments the cycle count?
> 
> How are messages/transactions stamped with cycle count?  Is this left
> up to each transactor?  Or, is there a proposed standard way of doing
> so?

johnS:
This is a problem that can be easily solved. We have done this
in our implementation but not our proposal. We're flexible to adding this
or discussing further.

> 
> 
> Detailed Questions:
> ==================
> 
> 7) Context Handling
> 
> How is context handling performed in a pure Verilog or VHDL environment?
> Does the user have to purchase and/or license a SystemVerilog compliant
> system in order to utilize the svGetScope and svGetUserData capability?

johnS:
Any implementation can implement the DPI standard - I'm not sure what
you mean by "licensing".

Context handing is strictly done on the C side in a manner conceptually
very similar to what SCE-MI 1.1 binding does today.

Just as a vendor must implement ::SceMiBindMessagePort() to implement
the standard, so too must they implement svGetScopeFromName(),
svSetUserData(), etc.

> Does the proposal need to be enhanced to add a SCE-MI 2.0 API to provide
> hardware-side-neutral calls to context functions?  Or, will the user
> need
> to call hardware-side-specific (and in some cases, vendor-specific)
> functions to access context information?

johnS:
Just as with SceMiBindMessagePort this is strictly an issue of C-Side,
it does not matter what language is on the other side. None of
the context handling is done on the HDL side.

To the extent that the implementor follows the standard, context
handling is fully addressed. And, conceptually it is not much different
than SCE-MI 1.1.

> 
> Is the proposal recommending (requiring?) that IP be written with the
> use
> of the 'context' specification for all imported functions, or is the
> choice left to the IP provider?

johnS:
We recommend it and would work with the committee to decide
if it is a requirement.

> 
> 8) Exported Tasks
> 
> The proposal does not clearly indicate exported tasks should be
> supported
> or not.  What is the recommendation for exported task support and why?

johnS:
We recommend it and would work with the committee to decide
if it should be part of the initial standard.

> 
> 9) Multiple function calls in zero time
> 
> Is this a usable mechanism?  Doesn't this encourage a low-performance
> use
> model ('ping-pong'-ing between the software side and the accelerator for
> each function call?)  In the SCE-MI 1.1 use model, and Cadence's
> proposal, multiple message transfers are achieved by simply
> instantiating
> multiple message port macros in parallel -- this approach does not
> require multiple function calls to the software side.

johnS:
There are times such as initialization of memories where performance
is not critical and multiple messages in 0-time might be useful
assuming a modeling subset supports it.

Again, our point is that the interface API should not preclude this.

I agree that it can always be done in SCE-MI 1.1 by stopping
the clock.

But for a use model that does not want to deal with controlled clocks,
your suggestion of multiple message port macros would be difficult for 
something like a broadside memory load that is done in 0-time.

If the modeling subset allows for it, the DPI interface proposal
itself would not prevent such an application.

For cases with lesser requirements, data shaping (nozzle) might
still be a lot easier than requiring the user to instantiate
multiple message port macros in parallel.

> 10) Pipe IDs
> 
> Unique pipe IDs are the Mentor-proposal equivalent of message port macro
> instance names, correct?  (The instance names of message port macros
> implicitly differentiate message 'channels', but in this proposal,
> integer IDs are necessary to 'uniquify'.)

johnS:
Yes this is correct - that is differentiation of pipes only within a module
instance context - in this sense it is simpler than use of integer IDs
to differentiate clock port macros globally and bind them to clock
control macros.

> 
> How is determinism ensured when using the pipe mechanism -- if the
> depth of pipe fifos is left up to the vendor to decide?

johnS:
How determinism is ensured by the infrastructure has been
answered in considerable detail in my previous e-mails.

Depth of pipe should be determined by the vendor and optimized to
their architecture.

The user would not have to worry about it.

See previous e-mails for more details.

> 
> 11) Transaction Pipes
> 
> What is a 'full pipe'?  The prototypes for the pipe mechanism functions
> imply infinitely-long pipes.  Why would a pipe ever be 'full'?

johnS:
Threads writing to a
pipe would automatically suspend if a pipe fills - again, transparently
to the user (see previous e-mails).

> 
> 12) Dynamic / Variable pipe data sizes
> 
> The header files supplied in the examples indicate a static value for
> the
> DPI_PIPE_MAX_BITS value.  This value is used to size the target data
> locations in the example BFMs.  How does the IP provider size the HDL
> data locations to the appropriate size for the application?
> 
> Does it limit the size of arguments defined by the modeler when calling
> the send and receives methods?

johnS:
This is an existing, proven piece of code we're donating that specifies
a static value for a useful maximum.

We don't think this is an issue - that is, there is a maximum
useful value that can probably be pre-determined.

However this issue is irrelevant to the suitability of the Mentor
proposal as a starting point. Mentor's proposal can accommodate a
refined version of this as decided by the committee. Any capability
we decided can be based on a compliant DPI function interfacing
standard.

> 
> What happens if DPI_PIPE_MAX_BITS is less than the value of
> 8 * bytes_per_element * num_elements?
> 
> The underlying atomic data size is one byte.  How is data whose size is
> not a multiple of 8 bits handled? 

johnS:
I've chosen an atomic size of 8 for ease of implementation.
It can go down to bit if necessary although I personally think
that is overkill.

> 13) Data Shaping
> 
> Is the example in section 4.4.2.1 correct?  In particular, it specifies
> that the reader of the frame could specify a bytes_per_element of 1, and
> a num_elements of 100.  In this case, couldn't a (blocking) read result
> in a transfer of less than 100 elements?  For example, if there were 3
> 1-byte elements in the pipe when a read was performed, would there be
> any
> need to block (wait) until there were 100 in the pipe?

johnS:
Yes, the reader would block until there are 100 in the pipe or
until an eom is specified, whichever comes first.

In the case of eom, "num_elements_read" would stipulate the residual
element count. (Kind of like Unix fread() works.)

> 
> It appears from the proposal that the read could (should?) return, with
> 3 elements of data in the data block, and a num_elements_read of 3.

johnS:
Yes, but only if eom was specified by the writer.

This retains determinism whether or not data shaping is used and
whether or not streaming is deployed.

> 
> 14) Clock Port usage, initialization, Creset
> 
> The proposal suggests that SCE-MI 2.0 models should still use SCE-MI
> 1.1 clock ports, but relegates clock control to legacy/1.1 models.  How
> does Mentor propose that 'pure' SCE-MI 2.0 models deal with
> initialization (and Creset, etc.)

johnS:
I'm not sure what the question is here. Creset is generated from a
clock port and therefore is available for use even if clock controls
are not used.

And if clock controls are used by SCE-MI 1.1 models, SCE-MI 2.0
models may have their clock stopped by 1.1 models but won't
know or care, just as the DUT does not know or care that its
clock is stopped in a SCE-MI 1.1 only environment.

> 
> 15) Appendix A
> 
> The side-by-side comparison of the proposed DPI approach and the SCE-MI
> 1.1 approach seems to indicate that the DPI approach can utilize the
> implicit state machine modeling style, while the SCE-MI 1.1 approach
> must adhere to the explicit state machine approach.
> 
> Is the proposal calling for implicit state machine modeling support
> as an adjunct to the message interface capability?  Does Mentor believe
> that the implicit state machine modeling style is incompatible with
> the use of SCE-MI 1.1 macros?

johnS:
This has already been answered in my previous e-mail. Both proposals
can conceptually do this example although more handshaking is required
in the macro based approach so more logic would be required to manage
that.

> 16) Appendix C
> 
> Please clarify exactly which function calls (related to the pipe
> mechanism) are blocking and which are non-blocking?

johnS:
All pipe calls are blocking.

> 
> Is there any need to use SCE-MI parameters in a pure SCE-MI 2.0
> environment? The example illustrates the use of the parameters object in
> conjunction with the SceMi::Init method call.  What is needed, required,
> suggested in a pure SCE-MI 2.0 environment?

johnS:
Not that I can think of. Unless possibly we wished to expand the
information put in there by the infrastructure. But I would suggest
that we don't.

Much of the simplicity of the DPI/simple function call approach is
that most of this is not necessary.

-- johnS

______________________________/\/            \     \
John Stickley                   \             \     \
Mgr., Acceleration Methodologies \             \________________
Mentor Graphics - MED             \_
17 E. Cedar Place                   \   john_stickley@mentor.com
Ramsey, NJ  07446                    \     Phone: (201) 818-2585
________________________________________________________________