SystemC P1666 list for Technical Review: RE: [tlmwg] Revisit of

From: <john.aynsley@doulos.com>
Date: Mon Jan 10 2011 - 01:25:47 PST

Yossi,

I would not describe the situation that way! You sometimes seem to use the terms BP and AT as if they were interchangeable. IMHO the limitations you have raised with respect to timing accuracy apply to the base protocol, not to the approximately-timed modeling style per se.

For what it is worth, many of the users I encounter want near-cycle-accurate simulations (nearer 99% than 95%-accurate) using specific protocols. Hence they are obliged to use TLM-2.0 custom protocol models rather than attempting to map actual protocols onto the base protocol. Such models would be TLM-2.0-custom-protocol-compliant (they CAN use the generic payload and the AT modeling style), but would not be base-protocol-compliant.

Also, given the above, I still disagree with the spin you are putting on the BP. Other people keep making the same point: interpreting the phases the way you do is your choice, not a BP rule. You are choosing to push aspects of the timing (the channel occupancy during a burst) onto the interfaces, instead of bringing it inside your components. You could get closer the the protocol you are trying to mimic while staying within the BP rules.

Cheers,

John A

-----"Veller, Yossi" <Yossi_Veller@mentor.com> wrote: -----
To: "Robert Guenzel" <robert.guenzel@greensocs.com>
From: "Veller, Yossi" <Yossi_Veller@mentor.com>
Date: 01/07/2011 09:58AM
Cc: "Marcelo Montoreano" <Marcelo.Montoreano@synopsys.com>, "Bart Vanthournout" <Bart.Vanthournout@synopsys.COM>, <john.aynsley@doulos.com>, "P1666 Technical WG" <systemc-p1666-technical@eda.org>, <tlmwg@lists.systemc.org>
Subject: RE: [tlmwg] Revisit of the TLM2.0 phases rules

Hi Robert,

The expectations are not mine but of my customers.

One of them got, using AHB (I believe), to an impressive accuracy of below 5% timing difference from the actual real board timing. Others are working with AXI and, believe me, their questions and requests are much more demanding then my example.

The customers ask us all the time: what is the merit of AT?
The standard answer that I have is that, apart from modeling each word transfer and arbitration (which you can but it will take much more time to simulate) and exceptions during the communication, you can get close to cycle accurate. And from what I've said above that it's not just empty words.

If they would have heard your claim they would have used cycle accurate simulations though they are about ten times slower.
The same will happen if I tell them that because of arbitrary choice of rules the timing error can get to 100% percent as the starting line. They will use LT, aggressive use of DMI can make it run a 100 times faster and its accuracy no more questionable. Locking and exclusive accesses do not happen at each transaction, but imagine that if only each the transactions timing is longer by 100% what will it do to the timing accuracy (even statistically).

So what I will have to say to my customers: the tool that I give you does not use BP in order to guarantee some level of accuracy. It can generate automatically adaptors to GP but know that if you use them the timing accuracy can't be guaranteed to have less than 100% error even though each of the models has inner 100% timing accuracy.

You are right when you've said "maybe it is import to define what exactly they are not good enough for".
In distributed computing area you have to define a protocol and properties that it keeps. If there are no claims about the properties the protocol is worthless, any passing of messages can be viewed as a protocol.

Like I've said in a previous mail: in TLM2.0 we are not on solid ground because no claims are specified about the GP timing accuracy. Hence I used the, admittedly heuristic rule, if a choice of protocol rules can get better accuracy than another set without sacrificing the important properties of the protocol, the first set should be taken.

Can I assume that apart from the expectations you found mu other arguments OK?

Regards
Yossi

-----Original Message-----
From: Robert Guenzel [mailto:robert.guenzel@greensocs.com]
Sent: Friday, January 07, 2011 10:36 AM
To: Veller, Yossi
Cc: Marcelo Montoreano; Bart Vanthournout; john.aynsley@doulos.com; P1666 Technical WG; tlmwg@lists.systemc.org
Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules

Hi Yossi,

I have the impression that you have an unreasonable expectation on the
AT modeling style.
You say "... and declare that modern busses are excluded...". That is
not correct.
They are not excluded, you simply cannot reach the same timing fidelity
you can reach
for simpler busses, but that is all. Still you can model the busses.
Remember that this is about approximate timing.

Now you will say: an approximation that cannot express independent rd/wr
channels
is useless for your use case, but a similar argumentation can be made on
bus locking and similar things.
An approximation that ignores locks or exclusive accesses is certainly
worthless for a number of use cases.
Now should a locking feature be added to the BP? I don't think so,
because then the BP will
quickly be a super set of modern bus features and the WG said from the
start it ought to
be a common subset.

The BP aims on functional interoperability. It does not aim on any level
of timing accuracy.
It allows to model arbitration and delays more accurately then LT . And
frankly, in my opinion
if a transaction lasts 320 ns or 640 ns when my whole simulation runs a
simulated 5 minutes, is
acceptable at AT.

But you seem to have a much higher expectation on the timing accuracy
"guaranteed" by AT.
There is none. At least not in my opinion. But James and John already
said that.
So before continuing to blame the rules that they are not good enough,
maybe it is import to
define what exactly they are not good enough for.

best regards
��Robert

Veller, Yossi wrote:
>
> Hi all,
>
> �
>
> I think that it is time to wrap up.
>
> �
>
> The problem is that some TLM simulations do not provide the modeling
> of the throughput and latency that the user would expect from
> out-of-order protocols that were designed to maximize these properties
> (so the OCP configuration does not count). A bus with separate write
> and read data channels should be able to finish both transactions in
> 320 NS and not 640 NS. On a bus with shared write and read data
> channels, one of the transactions would finish in 320 NS. I contend
> that this has the potential to surprise, perplex and annoy users.
>
> �
>
> You cannot blame the models because, apart with complying with the
> TLM2 rules, they have to adhere to applicative timing requirements.
>
> � � � � � The model writer should not change the defined timing of the
> target in order to overcome the problem that was demonstrated (like
> Marcelo wrote). Moreover why use an out-of-order protocol if you have
> to reply in order.
>
> � � � � � The bus has an arbitration policy that is defined regardless
> of the TLM rules. This arbitration may dictate sometimes forwarding
> one initiator and sometimes the other e.g. in order to prevent
> starvation. So the bus is even deterministic and smart but the result
> will be timing that is sometimes long and sometimes short.
>
> �
>
> What exacerbates the problem is that models followed the TLM rules and
> recommendations to the letter. If you think that rules or
> recommendations will lead the users to problems don,t give them.
>
> �
>
> If your standard is designed to support timing at a certain level of
> accuracy in an interoperable way it has to have rules that ensure it.
> Leeway and recommendations will not get you there (and Marcelo agrees
> with me).
>
> �
>
> Don,t blame the example: it did not use timing annotation, it
> considers the TLM rules only on the connection between the bus and the
> target etc. I didn,t find any problems with the logic there.
>
> �
>
> Don,t blame me that I just want the BP to be an AXI, out-of-order
> protocols are not just AXI.
>
> �
>
> Hence &It is the rules!8
>
> �
>
> So I can think of the following options for us to do:
>
> �
>
> 1. Declare that the BP disallows out-of-order protocols and incur the
> overhead of reordering everywhere and that modern buses are excluded.
>
> 2. Declare that there is no problem, but then I would recommend that
> the example is added to the LRM in order to warn the users so that
> they expect the behaviors demonstrated above and explain why, with the
> existence of viable alternatives, �we have chosen to stick to the rules.
>
> 3. Change the rules.
>
> �
>
> Regards
>
> Yossi
>
> �
>
> *From:* Marcelo Montoreano [mailto:Marcelo.Montoreano@synopsys.com]
> *Sent:* Thursday, January 06, 2011 9:42 PM
> *To:* Veller, Yossi; Bart Vanthournout; john.aynsley@doulos.com;
> robert.guenzel@greensocs.com
> *Cc:* P1666 Technical WG; tlmwg@lists.systemc.org
> *Subject:* RE: [tlmwg] Revisit of the TLM2.0 phases rules
>
> �
>
> HI Yossi,
>
> �
>
> The rules spell out how to account for the socket utilization such
> that flow control is possible and meaningful. I1 should not send
> another request to B between t1 and t1+310, as the socket is being
> used to transfer the write data. The rules don,t say that activity
> between I1-B should affect I2 - B.
>
> �
>
> B could have been smarter and given that there were 2 requests at t1,
> prioritize I2, as it is a read.
>
> �
>
> If T was a good peripheral model, it would have responded at 319
> GP1(BEGIN_RESP), as it is an older transaction, before doing
> GP2(BEGIN_RESP), but it is totally ok for T to do it the way you
> describe. Maybe it takes some time to write the actual data to storage
> and the peripheral you are using reflects that.
>
> �
>
> Either of those would allow the initiators to finish their transaction
> earlier.
>
> �
>
> I don,t see anything wrong on the rules, and I don,t think they are
> superfluous. Without them, it is unclear what component takes care of
> what part of the timing, and although I,m sure your models will be
> consistent among them, you could have chosen differently than I did,
> so our models would not have consistent timing (reads would take
> zero-time while writes 2x what they should). Notice that I say
> &consistent8 timing and not &accurate8, as accurate implies a
> reference that we don,t have and we consciously decided not to have.
>
> �
>
> Blindly coding to follow the rules will not get you a decent component
> or system. There is too much leeway in there. Think of what you want
> to achieve with the model, then code it without breaking BP rules. You
> probably will have to compromise and stay with BP, or create your own,
> incompatible protocol.
>
> �
>
> Regards,
>
> �
>
> Marcelo.-
>
> �
>
> *From:* tlmwg@lists.systemc.org [mailto:tlmwg@lists.systemc.org] *On
> Behalf Of *Veller, Yossi
> *Sent:* Thursday, January 06, 2011 8:36 AM
> *To:* Bart Vanthournout; john.aynsley@doulos.com;
> robert.guenzel@greensocs.com
> *Cc:* P1666 Technical WG; tlmwg@lists.systemc.org
> *Subject:* RE: [tlmwg] Revisit of the TLM2.0 phases rules
>
> �
>
> Hi Bart,
>
> �
>
> I changed the example to reflect this chain of mails.
>
> �
>
> t= t1 � � � � � � � � � �I1 sends GP1(BEGIN_REQ) to B
>
> � � � � � � � � � � � � � �B passes the GP1(BEGIN_REQ) to T
>
> � � � � � � � � � � � � � �T computes that the written data takes 310
> NS (because of the recommendation of rule 16.2.6 b)
>
> � � � � � � � � � � � � � � � � � � and schedules an inner event
> notification to t1+310 NS.
>
> � � � � � � � � � � � � � �I2 sends GP2(BEGIN_REQ) to B, B queues it
> in a PEQ (because of the BEGIN_REQ rule 16.2.6 e).
>
> t= t1+310 NS � � �T sends GP1(END_REQ) and B passes it to I1 then B
> takes GP2(BEGIN_REQ) from the PEQ and calls T.
>
> � � � � � � � � � � � � � �T returns TLM_UPDATED and changes the phase
> to END_REQ and B sends GP2(END_REQ) to I2.
>
> � � � � � � � � � � � � � �T schedules an inner event notification to
> t1+319 NS.
>
> t= t1+319 NS � � �T sends GP2(BEGIN_RESP) and B passes it to I2.
>
> � � � � � � � � � � � � � �I2 computes that the read data takes 311 NS
> (because of the recommendation of rule 16.2.6 c)
>
> � � � � � � � � � � � � � � � � � �and schedules an inner event
> notification to t1+640 NS.
>
> t= t1+640 NS � � �I2 sends GP2(END_RESP) and B passes it to T (and the
> read finishes)
>
> � � � � � � � � � � � � � T sends GP1(BEG_RESP) to I1 which replies
> with TLM_COMPLETED (and the write finishes)
>
> �
>
> The outcome is that in a perfectly good TLM2.0 system two transactions
> each of which should have taken 320 NS, finish BOTH after 640 NS. This
> seems to me a distortion of the timing and the removal of the response
> exclusion rule will fix this scenario.
>
> �
>
> Regards
>
> Yossi
>
> �
>
> *From:* Bart Vanthournout [mailto:Bart.Vanthournout@synopsys.com]
> *Sent:* Thursday, January 06, 2011 4:35 PM
> *To:* Veller, Yossi; john.aynsley@doulos.com; robert.guenzel@greensocs.com
> *Cc:* P1666 Technical WG; tlmwg@lists.systemc.org
> *Subject:* RE: [tlmwg] Revisit of the TLM2.0 phases rules
>
> �
>
> �
>
> Yossi,
>
> �
>
> I just started reading through this chain of mails so sorry but I want
> to get back to the example, I think you treat the protocol as an
> end-to-end protocol while the rules only apply per socket.
>
> �
>
> t= t1 � � � � � � � � � � � I1 sends GP1(BEGIN_REQ) to B
>
> � � � � � � � � � � � � � � � � B passes the GP1(BEGIN_REQ) to T
>
> � � � � � � � � � � � � � � � � T computes that the written data takes
> 310 NS (because of rule 16.2.6 b) and waits.
>
> � � � � � � � � � � � � � � � � I2 sends GP2(BEGIN_REQ) to B, B queues
> it in a PEQ (because of the BEGIN_REQ rule 16.2.6 e).
>
> t= t1+310 NS � � �T sends GP1(END_REQ) and B passes it to I1 then B
> takes GP2(BEGIN_REQ) from the PEQ and calls T.
>
> � � � � � � � � � � � � � � � � T returns TLM_UPDATED and changes the
> phase to END_REQ and B sends GP2(END_REQ) to I2.
>
> t= t1+319 NS � � �T sends GP2(BEGIN_RESP) and B passes it to I2.
>
> � � � � � � � � � � � � � � � � I2 computes that the read data takes
> 311 NS (because of rule 16.2.6 c) and waits.
>
> t= t1+320 NS � � �T sends GP1(BEGIN_RESP) and B pushes it into the PEQ
> (because of the BEGIN_RESP rule16.2.6 f).
>
> t= t1+640 NS � � �I2 sends GP2(END_RESP) and B passes it to T (and the
> read finishes)
>
> � � � � � � � � � � � � � � � � B sends the GP1(BEG_RESP) to I1 which
> replies with TLM_COMPLETED
>
> � � � � � � � � � � � � � � � � B sends the GP1(END_RESP) to T (and
> the write finishes)
>
> �
>
> Rule 16.2.6.f) says: For the base protocol, a target or interconnect
> component shall not respond to a new transaction
>
> through a given socket with phase BEGIN_RESP until it has received
> END_RESP from the upstream component for the immediately preceding
> transaction or until a component has completed the previous
> transaction over that hop by returning TLM_COMPLETED. This is known as
> the response exclusion rule.
>
> �
>
> �
>
> To me that means that the example is wrong at t = t1+320 NS, the
> target cannot send GP2( BEGIN_RESP) over its TLM2 socket since it did
> not receive an END_RESP for GP1.
>
> In order to accomplish what you are looking for (I think) the bus to
> respond with a END_RESP for GP1 at time t =t1+319NS and pass the
> BEGIN_RESP to I1. This allows the target to continue with an
> BEGIN_RESP for GP2 and the bus can also forward to initiator I2 since
> the response exclusion rule applies per socket.
>
> �
>
> So I see the following happening:
>
> �
>
> t= t1 � � � � � � � � � � � I1 sends GP1(BEGIN_REQ) to B
>
> � � � � � � � � � � � � � � � � B passes the GP1(BEGIN_REQ) to T
>
> � � � � � � � � � � � � � � � � T computes that the written data takes
> 310 NS (because of rule 16.2.6 b) and waits.
>
> � � � � � � � � � � � � � � � � I2 sends GP2(BEGIN_REQ) to B, B queues
> it in a PEQ (because of the BEGIN_REQ rule 16.2.6 e).
>
> t= t1+310 NS � � �T sends GP1(END_REQ) and B passes it to I1 then B
> takes GP2(BEGIN_REQ) from the PEQ and calls T.
>
> � � � � � � � � � � � � � � � � T returns TLM_UPDATED and changes the
> phase to END_REQ and B sends GP2(END_REQ) to I2.
>
> t= t1+319 NS � � �T sends GP2(BEGIN_RESP) and B returns GP2(END_RESP)
> to T, to allow it to continue
>
> B passes GP2(BEGIN_RESP) to I2.
>
> � � � � � � � � � � � � � � � � I2 computes that the read data takes
> 311 NS (because of rule 16.2.6 c) and waits.
>
> t= t1+320 NS � � �T sends GP1(BEGIN_RESP) and B returns GP2(END_RESP)
> to T, to allow it to continue
>
> � � � � � � � � � � � � � � � � B passes GP1(BEGIN_RESP) to I1 which
> replies with TLM_COMPLETED
>
> t= t1+640 NS � � �I2 sends GP2(END_RESP) to B (and the read finishes)
>
> � � � � � � � � � � � � � � � �
>
> �
>
> At least this is my reading of the standard(.
>
> �
>
> Bart
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Received on Mon Jan 10 01:26:36 2011

This archive was generated by hypermail 2.1.8 : Mon Jan 10 2011 - 01:26:45 PST