Subject: Re: ISSUE:DirectC:DirectC i/f should support mechanismforcalling Verilog task/function from a DirectC application
From: Stickley, John (john_stickley@mentorg.com)
Date: Wed Oct 23 2002 - 12:34:32 PDT
Kevin,
Interesting discussion. See embedded comments.
Kevin Cameron x3251 wrote:
> From john_stickley@mentorg.com Wed Oct 23 10:37:06 2002
>
> Kevin,
>
> Kevin Cameron x3251 wrote:
>
> > > From: "Swapnajit Mittra" <mittra@juno.com>
...
> > That's why it's better to write your functions in SV rather than
> > calling external C routines :-)
> >
>
> johnS:
>
> Kevin,
>
> I think the issue is more than just context switching.
> In fact, that's not the difficulty. That by itself is a pretty routine
> operation in all the new C++ kernels - TestBuilder, SystemC
> and even non-C++ testbench modeling environments such as Verisity.
>
> In fact, from what I've seen, for pure untimed multi-threaded
> testbenches, these environments are pretty efficient (have
> a look at systemc.org posting on some benchmarking I did:
> http://www.systemc.org/hypermail/systemc-forum/1282.html
<http://www.systemc.org/hypermail/systemc-forum/1282.html> )
Throughput depends on the ratio of context switching time to actual
model evaluation, most of the C/C++ activity is currently at the
testbench level (and cycle based) there aren't that many threads and
the models are complex. As a general approach to simulation it doesn't
work because a lot of the code is too short (from @/# -> @/#). Also
if you don't know the call-depth for the C/C++ then you have to
allocate a "big enough" stack for each thread which is inefficient
in memory (you'ld blow your 2G Linux limit fairly easily).
johnS: I guess it comes down whether your design
is composed mainly of a small number of long algorithmic sequential
blocks (which tend to use native machine data types) vs a large number
of tiny concurrent blocks (which tend to use bit_vector data types).
SystemC is real good at the former and real bad at the latter.
We've never seen any problems with large call depth but we have
seen C++ simulators bog down when there's lots of fine grained
concurrency. However our position is that has never really been
SystemC's sweet spot anyway. It is much better at algorithmic/
behavioral system level modeling (which - regrettably HDL environments,
at least those prior to SystemVerilog are not).
What's interesting about the case I cited above is that is is a pipeline
of large algorithmic cores surrounded by thin timing shells
characterized
by cycle accurate activity). The computations
in the cores dwarf the cycle based activity in the shells which is
why it performs so much better in the SystemC behavioral version
than the VHDL RTL version (8 1/2 minutes vs 9 weeks !). In the
RTL version the entire design is cycle based activity.
But generally from my point of view, we see the whole spectrum of
testbenches.
We see testbenches with lots of tiny concurrent processes, we see
testbenches
with few highly sequential processes. In the latter case, thread context
switch overhead does not amount to much - in the former case it does.
> They all seem to use a package called QuickThreads which is a very
> efficient non-preemptive threading package. The place where
> SystemC performance breaks down is when processing
> bit_vectors rather than native machine types such as int or double.
>
> But the 0-time issue is more than just context switching. It tends
> to make it difficult to keep coherency between the time state
> of the C++ kernel and that of the HDL kernel. By establishing
> 0-time "synchronization points" you can easily avoid this
inconsistency.
I don't think the DirectC approach requires a seperate C++ kernel, so
it's less of a problem.
johnS:
True, it doesn't require it, but it should probably also accommodate it.
TestBuilder and SystemC are good examples of independent kernels,
though they could be tightly bundled in to a given vendor's HDL kernel
at the discretion of that vendor - or not.
I would like to make sure we design the interface in such a way that it
does not prevent (or at least discourage) either approach.
I'd prefer leave the topic of handling genuine parallelism and multiple
stacks etc. to the Enhancement Committee for a later rev of SV.
johnS:
But at least if we can provide some simple hooks into the interface
to accommodate these environments that would be nice. I'm in the process
of putting forth a simple proposal for how we might do that. I'll
include
an example with it also.
-- johnS
> > Kev.
> >
> > > Doug,
> > >
> > > I understand your concern. But, can not these C models
> > > be written as cmodules (or be embedded within cmodules) ?
> > >
> > > I think (Joao, correct me if I am wrong) a DirectC
> > > external C function and a cmodule to some sense
> > > are analogous to a Verilog function and task respectively.
> > > External C functions are 0-time activities whereas
> > > cmodules are not.
> > >
> > > I would not mind if we change that proposition (that DirectC
> > > external C functions should be 0-time activity), but my
> > > concerns there are:
> > >
> > > o Whether that will break any other basic structure of the
> > > whole scheme.
> > >
> > > o Whether we have sufficient time to undertake this type of
> > > fundamental changes. Personally I think we should rather be
> > > late than producing things that are half cooked, but I know
> > > we are working under some tight schedule here.
> > > --
> > > Swapnajit Mittra
> > > Project VeriPage ::: http://www.angelfire.com/ca/verilog
<http://www.angelfire.com/ca/verilog>
__
______ | \
______________________/ \__ / \
\ H Dome ___/ |
John Stickley E | a __ ___/ / \____
Principal Engineer l | l | \ /
Verification Solutions Group | f | \/ ____
Mentor Graphics Corp. - MED C \ -- / /
17 E. Cedar Place a \ __/ / /
Ramsey, NJ 07446 p | / ___/
| / /
mailto:John_Stickley@mentor.com <mailto:John_Stickley@mentor.com> \
/
Phone: (201)818-2585 \ /
---------
This archive was generated by hypermail 2b28 : Wed Oct 23 2002 - 12:41:13 PDT