SystemC P1666 list for Technical Review: RE: [tlmwg] Revisit of

From: Aldis, James <j-aldis2@ti.com>
Date: Fri Jan 07 2011 - 02:21:53 PST

Yossi,

firstly what you say is not even true, because as we have discussed
the things you thought were rules are in fact not obligatory. you can get
a good approximation for AXI using BP if you interpret the phases slightly
differently.

secondly, why on earth would you find it surprising that a single TLM
protocol is unable to maintain the same timing fidelity for all hardware
protocols?

thirdly, why would you _always_ advise your clients not to use BP, if it
is pretty damn good in many cases (eg AHB)?

fourthly, how do you imagine any kind of definition of BP/AT accuracy
for a given set of rules,
given that accuracy depands rather more on the models than the bus API, and
that accuracy for AXI, APB, OCPa, OCPb, STBus, AHB, DDR, DDR2, SRAM, PCI,
Wishbone, Unipro/Pie, PCIe, etc, etc, etc are all different and there are
zillions of these things?

fifthly, I'll bet you any money you like that in the 5% accurate AHB
simulation you mention there are many many individual transactions which
are modelled 100% innaccurately. it is completely unfair to compare an
overall avarage with one specific directed test case.

the BP is, as Robert said, designed to cover as many bus interfaces as
possible "reasonably well". high-end bus interfaces with parallel rd and
wr, interruptible or interleaved transactions, etc attract lots of attention
but are not dominant. on an SoC with a couple of hundred components the
vast majority aren't going to have parallel rd and wr busses. those that
do are usually the components where a little extra work is invested
anyway. therefore the BP design makes perfect sense. but wait, it's
better than that - the BP can even do your example! (note that there
_are_ bus interfaces more sophisticated than AXI out there, which the BP
really can not do well). this was known when the BP was designed).

i really do not understand why you think you have a problem.

James

>
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920

-----Original Message-----

> From: tlmwg@lists.systemc.org
> [mailto:tlmwg@lists.systemc.org] On Behalf Of Veller, Yossi
> Sent: Friday, January 07, 2011 10:58 AM
> To: Robert Guenzel
> Cc: Marcelo Montoreano; Bart Vanthournout;
> john.aynsley@doulos.com; P1666 Technical WG; tlmwg@lists.systemc.org
> Subject: RE: [tlmwg] Revisit of the TLM2.0 phases rules
>
> Hi Robert,
>
> The expectations are not mine but of my customers.
>
> One of them got, using AHB (I believe), to an impressive
> accuracy of below 5% timing difference from the actual real
> board timing. Others are working with AXI and, believe me,
> their questions and requests are much more demanding then my example.
>
> The customers ask us all the time: what is the merit of AT?
> The standard answer that I have is that, apart from modeling
> each word transfer and arbitration (which you can but it will
> take much more time to simulate) and exceptions during the
> communication, you can get close to cycle accurate. And from
> what I've said above that it's not just empty words.
>
> If they would have heard your claim they would have used
> cycle accurate simulations though they are about ten times slower.
> The same will happen if I tell them that because of arbitrary
> choice of rules the timing error can get to 100% percent as
> the starting line. They will use LT, aggressive use of DMI
> can make it run a 100 times faster and its accuracy no more
> questionable. Locking and exclusive accesses do not happen at
> each transaction, but imagine that if only each the
> transactions timing is longer by 100% what will it do to the
> timing accuracy (even statistically).
>
> So what I will have to say to my customers: the tool that I
> give you does not use BP in order to guarantee some level of
> accuracy. It can generate automatically adaptors to GP but
> know that if you use them the timing accuracy can't be
> guaranteed to have less than 100% error even though each of
> the models has inner 100% timing accuracy.
>
>
> You are right when you've said "maybe it is import to define
> what exactly they are not good enough for".
> In distributed computing area you have to define a protocol
> and properties that it keeps. If there are no claims about
> the properties the protocol is worthless, any passing of
> messages can be viewed as a protocol.
>
> Like I've said in a previous mail: in TLM2.0 we are not on
> solid ground because no claims are specified about the GP
> timing accuracy. Hence I used the, admittedly heuristic rule,
> if a choice of protocol rules can get better accuracy than
> another set without sacrificing the important properties of
> the protocol, the first set should be taken.
>
>
> Can I assume that apart from the expectations you found mu
> other arguments OK?
>
> Regards
> Yossi
>
> -----Original Message-----
> From: Robert Guenzel [mailto:robert.guenzel@greensocs.com]
> Sent: Friday, January 07, 2011 10:36 AM
> To: Veller, Yossi
> Cc: Marcelo Montoreano; Bart Vanthournout;
> john.aynsley@doulos.com; P1666 Technical WG; tlmwg@lists.systemc.org
> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
>
> Hi Yossi,
>
> I have the impression that you have an unreasonable
> expectation on the AT modeling style.
> You say "... and declare that modern busses are excluded...".
> That is not correct.
> They are not excluded, you simply cannot reach the same
> timing fidelity you can reach for simpler busses, but that is
> all. Still you can model the busses.
> Remember that this is about approximate timing.
>
> Now you will say: an approximation that cannot express
> independent rd/wr channels is useless for your use case, but
> a similar argumentation can be made on bus locking and similar things.
> An approximation that ignores locks or exclusive accesses is
> certainly worthless for a number of use cases.
> Now should a locking feature be added to the BP? I don't
> think so, because then the BP will quickly be a super set of
> modern bus features and the WG said from the start it ought
> to be a common subset.
>
> The BP aims on functional interoperability. It does not aim
> on any level of timing accuracy.
> It allows to model arbitration and delays more accurately
> then LT . And frankly, in my opinion if a transaction lasts
> 320 ns or 640 ns when my whole simulation runs a simulated 5
> minutes, is acceptable at AT.
>
> But you seem to have a much higher expectation on the timing
> accuracy "guaranteed" by AT.
> There is none. At least not in my opinion. But James and John
> already said that.
> So before continuing to blame the rules that they are not
> good enough, maybe it is import to define what exactly they
> are not good enough for.
>
> best regards
> Robert
>
>
> Veller, Yossi wrote:
> >
> > Hi all,
> >
> >
> >
> > I think that it is time to wrap up.
> >
> >
> >
> > The problem is that some TLM simulations do not provide the
> modeling
> > of the throughput and latency that the user would expect from
> > out-of-order protocols that were designed to maximize these
> properties
> > (so the OCP configuration does not count). A bus with
> separate write
> > and read data channels should be able to finish both
> transactions in
> > 320 NS and not 640 NS. On a bus with shared write and read data
> > channels, one of the transactions would finish in 320 NS. I contend
> > that this has the potential to surprise, perplex and annoy users.
> >
> >
> >
> > You cannot blame the models because, apart with complying with the
> > TLM2 rules, they have to adhere to applicative timing requirements.
> >
> > . The model writer should not change the defined
> timing of the
> > target in order to overcome the problem that was demonstrated (like
> > Marcelo wrote). Moreover why use an out-of-order protocol
> if you have
> > to reply in order.
> >
> > . The bus has an arbitration policy that is defined
> regardless
> > of the TLM rules. This arbitration may dictate sometimes forwarding
> > one initiator and sometimes the other e.g. in order to prevent
> > starvation. So the bus is even deterministic and smart but
> the result
> > will be timing that is sometimes long and sometimes short.
> >
> >
> >
> > What exacerbates the problem is that models followed the
> TLM rules and
> > recommendations to the letter. If you think that rules or
> > recommendations will lead the users to problems don't give them.
> >
> >
> >
> > If your standard is designed to support timing at a certain
> level of
> > accuracy in an interoperable way it has to have rules that
> ensure it.
> > Leeway and recommendations will not get you there (and
> Marcelo agrees
> > with me).
> >
> >
> >
> > Don't blame the example: it did not use timing annotation, it
> > considers the TLM rules only on the connection between the
> bus and the
> > target etc. I didn't find any problems with the logic there.
> >
> >
> >
> > Don't blame me that I just want the BP to be an AXI, out-of-order
> > protocols are not just AXI.
> >
> >
> >
> > Hence "It is the rules!"
> >
> >
> >
> > So I can think of the following options for us to do:
> >
> >
> >
> > 1. Declare that the BP disallows out-of-order protocols and
> incur the
> > overhead of reordering everywhere and that modern buses are
> excluded.
> >
> > 2. Declare that there is no problem, but then I would
> recommend that
> > the example is added to the LRM in order to warn the users so that
> > they expect the behaviors demonstrated above and explain
> why, with the
> > existence of viable alternatives, we have chosen to stick
> to the rules.
> >
> > 3. Change the rules.
> >
> >
> >
> > Regards
> >
> > Yossi
> >
> >
> >
> > *From:* Marcelo Montoreano [mailto:Marcelo.Montoreano@synopsys.com]
> > *Sent:* Thursday, January 06, 2011 9:42 PM
> > *To:* Veller, Yossi; Bart Vanthournout; john.aynsley@doulos.com;
> > robert.guenzel@greensocs.com
> > *Cc:* P1666 Technical WG; tlmwg@lists.systemc.org
> > *Subject:* RE: [tlmwg] Revisit of the TLM2.0 phases rules
> >
> >
> >
> > HI Yossi,
> >
> >
> >
> > The rules spell out how to account for the socket utilization such
> > that flow control is possible and meaningful. I1 should not send
> > another request to B between t1 and t1+310, as the socket is being
> > used to transfer the write data. The rules don't say that activity
> > between I1-B should affect I2 - B.
> >
> >
> >
> > B could have been smarter and given that there were 2
> requests at t1,
> > prioritize I2, as it is a read.
> >
> >
> >
> > If T was a good peripheral model, it would have responded at 319
> > GP1(BEGIN_RESP), as it is an older transaction, before doing
> > GP2(BEGIN_RESP), but it is totally ok for T to do it the way you
> > describe. Maybe it takes some time to write the actual data
> to storage
> > and the peripheral you are using reflects that.
> >
> >
> >
> > Either of those would allow the initiators to finish their
> transaction
> > earlier.
> >
> >
> >
> > I don't see anything wrong on the rules, and I don't think they are
> > superfluous. Without them, it is unclear what component
> takes care of
> > what part of the timing, and although I'm sure your models will be
> > consistent among them, you could have chosen differently
> than I did,
> > so our models would not have consistent timing (reads would take
> > zero-time while writes 2x what they should). Notice that I say
> > "consistent" timing and not "accurate", as accurate implies a
> > reference that we don't have and we consciously decided not to have.
> >
> >
> >
> > Blindly coding to follow the rules will not get you a
> decent component
> > or system. There is too much leeway in there. Think of what
> you want
> > to achieve with the model, then code it without breaking BP
> rules. You
> > probably will have to compromise and stay with BP, or
> create your own,
> > incompatible protocol.
> >
> >
> >
> > Regards,
> >
> >
> >
> > Marcelo.-
> >
> >
> >
> > *From:* tlmwg@lists.systemc.org
> [mailto:tlmwg@lists.systemc.org] *On
> > Behalf Of *Veller, Yossi
> > *Sent:* Thursday, January 06, 2011 8:36 AM
> > *To:* Bart Vanthournout; john.aynsley@doulos.com;
> > robert.guenzel@greensocs.com
> > *Cc:* P1666 Technical WG; tlmwg@lists.systemc.org
> > *Subject:* RE: [tlmwg] Revisit of the TLM2.0 phases rules
> >
> >
> >
> > Hi Bart,
> >
> >
> >
> > I changed the example to reflect this chain of mails.
> >
> >
> >
> > t= t1 I1 sends GP1(BEGIN_REQ) to B
> >
> > B passes the GP1(BEGIN_REQ) to T
> >
> > T computes that the written data
> takes 310
> > NS (because of the recommendation of rule 16.2.6 b)
> >
> > and schedules an inner event
> > notification to t1+310 NS.
> >
> > I2 sends GP2(BEGIN_REQ) to B, B
> queues it
> > in a PEQ (because of the BEGIN_REQ rule 16.2.6 e).
> >
> > t= t1+310 NS T sends GP1(END_REQ) and B passes it to I1 then B
> > takes GP2(BEGIN_REQ) from the PEQ and calls T.
> >
> > T returns TLM_UPDATED and
> changes the phase
> > to END_REQ and B sends GP2(END_REQ) to I2.
> >
> > T schedules an inner event
> notification to
> > t1+319 NS.
> >
> > t= t1+319 NS T sends GP2(BEGIN_RESP) and B passes it to I2.
> >
> > I2 computes that the read data
> takes 311 NS
> > (because of the recommendation of rule 16.2.6 c)
> >
> > and schedules an inner event
> > notification to t1+640 NS.
> >
> > t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to
> T (and the
> > read finishes)
> >
> > T sends GP1(BEG_RESP) to I1 which replies
> > with TLM_COMPLETED (and the write finishes)
> >
> >
> >
> > The outcome is that in a perfectly good TLM2.0 system two
> transactions
> > each of which should have taken 320 NS, finish BOTH after
> 640 NS. This
> > seems to me a distortion of the timing and the removal of
> the response
> > exclusion rule will fix this scenario.
> >
> >
> >
> > Regards
> >
> > Yossi
> >
> >
> >
> > *From:* Bart Vanthournout [mailto:Bart.Vanthournout@synopsys.com]
> > *Sent:* Thursday, January 06, 2011 4:35 PM
> > *To:* Veller, Yossi; john.aynsley@doulos.com;
> > robert.guenzel@greensocs.com
> > *Cc:* P1666 Technical WG; tlmwg@lists.systemc.org
> > *Subject:* RE: [tlmwg] Revisit of the TLM2.0 phases rules
> >
> >
> >
> >
> >
> > Yossi,
> >
> >
> >
> > I just started reading through this chain of mails so sorry
> but I want
> > to get back to the example, I think you treat the protocol as an
> > end-to-end protocol while the rules only apply per socket.
> >
> >
> >
> > t= t1 I1 sends GP1(BEGIN_REQ) to B
> >
> > B passes the GP1(BEGIN_REQ) to T
> >
> > T computes that the written
> data takes
> > 310 NS (because of rule 16.2.6 b) and waits.
> >
> > I2 sends GP2(BEGIN_REQ) to
> B, B queues
> > it in a PEQ (because of the BEGIN_REQ rule 16.2.6 e).
> >
> > t= t1+310 NS T sends GP1(END_REQ) and B passes it to I1 then B
> > takes GP2(BEGIN_REQ) from the PEQ and calls T.
> >
> > T returns TLM_UPDATED and
> changes the
> > phase to END_REQ and B sends GP2(END_REQ) to I2.
> >
> > t= t1+319 NS T sends GP2(BEGIN_RESP) and B passes it to I2.
> >
> > I2 computes that the read data takes
> > 311 NS (because of rule 16.2.6 c) and waits.
> >
> > t= t1+320 NS T sends GP1(BEGIN_RESP) and B pushes it
> into the PEQ
> > (because of the BEGIN_RESP rule16.2.6 f).
> >
> > t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to
> T (and the
> > read finishes)
> >
> > B sends the GP1(BEG_RESP)
> to I1 which
> > replies with TLM_COMPLETED
> >
> > B sends the GP1(END_RESP) to T (and
> > the write finishes)
> >
> >
> >
> > Rule 16.2.6.f) says: For the base protocol, a target or
> interconnect
> > component shall not respond to a new transaction
> >
> > through a given socket with phase BEGIN_RESP until it has received
> > END_RESP from the upstream component for the immediately preceding
> > transaction or until a component has completed the previous
> > transaction over that hop by returning TLM_COMPLETED. This
> is known as
> > the response exclusion rule.
> >
> >
> >
> >
> >
> > To me that means that the example is wrong at t = t1+320 NS, the
> > target cannot send GP2( BEGIN_RESP) over its TLM2 socket
> since it did
> > not receive an END_RESP for GP1.
> >
> > In order to accomplish what you are looking for (I think)
> the bus to
> > respond with a END_RESP for GP1 at time t =t1+319NS and pass the
> > BEGIN_RESP to I1. This allows the target to continue with an
> > BEGIN_RESP for GP2 and the bus can also forward to
> initiator I2 since
> > the response exclusion rule applies per socket.
> >
> >
> >
> > So I see the following happening:
> >
> >
> >
> > t= t1 I1 sends GP1(BEGIN_REQ) to B
> >
> > B passes the GP1(BEGIN_REQ) to T
> >
> > T computes that the written
> data takes
> > 310 NS (because of rule 16.2.6 b) and waits.
> >
> > I2 sends GP2(BEGIN_REQ) to
> B, B queues
> > it in a PEQ (because of the BEGIN_REQ rule 16.2.6 e).
> >
> > t= t1+310 NS T sends GP1(END_REQ) and B passes it to I1 then B
> > takes GP2(BEGIN_REQ) from the PEQ and calls T.
> >
> > T returns TLM_UPDATED and
> changes the
> > phase to END_REQ and B sends GP2(END_REQ) to I2.
> >
> > t= t1+319 NS T sends GP2(BEGIN_RESP) and B returns
> GP2(END_RESP)
> > to T, to allow it to continue
> >
> > B passes GP2(BEGIN_RESP) to I2.
> >
> > I2 computes that the read data takes
> > 311 NS (because of rule 16.2.6 c) and waits.
> >
> > t= t1+320 NS T sends GP1(BEGIN_RESP) and B returns
> GP2(END_RESP)
> > to T, to allow it to continue
> >
> > B passes GP1(BEGIN_RESP) to
> I1 which
> > replies with TLM_COMPLETED
> >
> > t= t1+640 NS I2 sends GP2(END_RESP) to B (and the read
> finishes)
> >
> >
> >
> >
> >
> > At least this is my reading of the standard..
> >
> >
> >
> > Bart
> >
>
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Received on Fri Jan 7 02:22:29 2011

This archive was generated by hypermail 2.1.8 : Fri Jan 07 2011 - 02:22:31 PST