RE: ISSUE:DirectC:DirectC i/f should support mechanismforcalling Verilog task/function from a DirectC application


Subject: RE: ISSUE:DirectC:DirectC i/f should support mechanismforcalling Verilog task/function from a DirectC application
From: Stuart Swan (stuart@cadence.com)
Date: Wed Oct 23 2002 - 16:37:48 PDT


 
It's nice to see some "out of the box" technical discussion related to
the C/C++ integration issues, though it concerns me that there seems
to be a recurring warning from some that "We can't think about more
general
approaches, since that will put our tight schedule at risk." It seems
to me that this kind of approach will only result in a lousy standard.
 
I'm in general agreement with the comments from John Stickly about what
C/C++ modeling is good for, and what sort of performance can be
obtained.
Certainly at levels of abstraction above the RTL level (but often times
still
at the cycle accurate level) very high performance can be achieved. For
example,
a number of companies today are performing cycle accurate modeling
of platforms consisting of processors (ISS models), busses, peripherals,
etc, in SystemC and getting 50K - 100K cycles per second on typical PCs.
 
Don't believe me? See:
 

        -----Original Message-----
        From: Stickley, John [mailto:john_stickley@mentorg.com]
        Sent: Wednesday, October 23, 2002 12:35 PM
        To: Kevin Cameron x3251
        Cc: Stickley, John; sv-cc@eda.org
        Subject: Re: ISSUE:DirectC:DirectC i/f should support
mechanismforcalling Verilog task/function from a DirectC application
        
        
        Kevin,

        Interesting discussion. See embedded comments.

        Kevin Cameron x3251 wrote:

> From john_stickley@mentorg.com Wed Oct 23 10:37:06
2002
>
> Kevin,
>
> Kevin Cameron x3251 wrote:
>
> > > From: "Swapnajit Mittra" <mittra@juno.com>
                ...
> > That's why it's better to write your functions in SV
rather than
> > calling external C routines :-)
> >
>
> johnS:
>
> Kevin,
>
> I think the issue is more than just context switching.

> In fact, that's not the difficulty. That by itself is
a pretty routine
> operation in all the new C++ kernels - TestBuilder,
SystemC
> and even non-C++ testbench modeling environments such
as Verisity.
>
> In fact, from what I've seen, for pure untimed
multi-threaded
> testbenches, these environments are pretty efficient
(have
> a look at systemc.org posting on some benchmarking I
did:
>
http://www.systemc.org/hypermail/systemc-forum/1282.html)

                Throughput depends on the ratio of context switching
time to actual
                model evaluation, most of the C/C++ activity is
currently at the
                testbench level (and cycle based) there aren't that many
threads and
                the models are complex. As a general approach to
simulation it doesn't
                work because a lot of the code is too short (from @/# ->
@/#). Also
                if you don't know the call-depth for the C/C++ then you
have to
                allocate a "big enough" stack for each thread which is
inefficient
                in memory (you'ld blow your 2G Linux limit fairly
easily).

        johnS: I guess it comes down whether your design
        is composed mainly of a small number of long algorithmic
sequential
        blocks (which tend to use native machine data types) vs a large
number
        of tiny concurrent blocks (which tend to use bit_vector data
types).
        SystemC is real good at the former and real bad at the latter.

        We've never seen any problems with large call depth but we have
        seen C++ simulators bog down when there's lots of fine grained
        concurrency. However our position is that has never really been
        SystemC's sweet spot anyway. It is much better at algorithmic/
        behavioral system level modeling (which - regrettably HDL
environments,
        at least those prior to SystemVerilog are not).

        What's interesting about the case I cited above is that is is a
pipeline
        of large algorithmic cores surrounded by thin timing shells
characterized
        by cycle accurate activity). The computations
        in the cores dwarf the cycle based activity in the shells which
is
        why it performs so much better in the SystemC behavioral version

        than the VHDL RTL version (8 1/2 minutes vs 9 weeks !). In the
        RTL version the entire design is cycle based activity.

        But generally from my point of view, we see the whole spectrum
of testbenches.
        We see testbenches with lots of tiny concurrent processes, we
see testbenches
        with few highly sequential processes. In the latter case, thread
context
        switch overhead does not amount to much - in the former case it
does.
          

> They all seem to use a package called QuickThreads
which is a very
> efficient non-preemptive threading package. The place
where
> SystemC performance breaks down is when processing
> bit_vectors rather than native machine types such as
int or double.
>
> But the 0-time issue is more than just context
switching. It tends
> to make it difficult to keep coherency between the
time state
> of the C++ kernel and that of the HDL kernel. By
establishing
> 0-time "synchronization points" you can easily avoid
this inconsistency.

                I don't think the DirectC approach requires a seperate
C++ kernel, so
                it's less of a problem.

        johnS:
        True, it doesn't require it, but it should probably also
accommodate it.
        TestBuilder and SystemC are good examples of independent
kernels,
        though they could be tightly bundled in to a given vendor's HDL
kernel
        at the discretion of that vendor - or not.

        I would like to make sure we design the interface in such a way
that it
        does not prevent (or at least discourage) either approach.

                I'd prefer leave the topic of handling genuine
parallelism and multiple
                stacks etc. to the Enhancement Committee for a later rev
of SV.

        johnS:
        But at least if we can provide some simple hooks into the
interface
        to accommodate these environments that would be nice. I'm in the
process
        of putting forth a simple proposal for how we might do that.
I'll include
        an example with it also.

        -- johnS

> > Kev.
> >
> > > Doug,
> > >
> > > I understand your concern. But, can not these C
models
> > > be written as cmodules (or be embedded within
cmodules) ?
> > >
> > > I think (Joao, correct me if I am wrong) a
DirectC
> > > external C function and a cmodule to some sense

> > > are analogous to a Verilog function and task
respectively.
> > > External C functions are 0-time activities
whereas
> > > cmodules are not.
> > >
> > > I would not mind if we change that proposition
(that DirectC
> > > external C functions should be 0-time
activity), but my
> > > concerns there are:
> > >
> > > o Whether that will break any other basic
structure of the
> > > whole scheme.
> > >
> > > o Whether we have sufficient time to undertake
this type of
> > > fundamental changes. Personally I think we
should rather be
> > > late than producing things that are half
cooked, but I know
> > > we are working under some tight schedule here.
> > > --
> > > Swapnajit Mittra
> > > Project VeriPage :::
http://www.angelfire.com/ca/verilog

        
__
                               ______ | \
        ______________________/ \__ / \
                                        \ H Dome ___/ |
        John Stickley E | a __ ___/ /
\____
        Principal Engineer l | l | \ /
        Verification Solutions Group | f | \/ ____
        Mentor Graphics Corp. - MED C \ -- / /
        17 E. Cedar Place a \ __/ / /
        Ramsey, NJ 07446 p | / ___/
                                         | / /
        mailto:John_Stickley@mentor.com \ /
        Phone: (201)818-2585 \ /
                                           ---------



This archive was generated by hypermail 2b28 : Wed Oct 23 2002 - 16:41:52 PDT