LOGIN   :::   RECOVER PASS   :::   GET ACCOUNT    
Browse
  • Projects
  • Code (CVS)
  • Forums
  • News
  • Articles
  • Polls
  •  
    OpenCores
  • FAQ
  • CVS HowTo
  • Mission
  • Media
  • Tools
  • Sponsors
  • Mirrors
  • Logos
  • Contact us
  •  
    Tools
  • Search
      
  • Download Cores (CVSGet)
  •  
    More
  • Wishbone
  • Perlilog
  • EDA tools
  • OpenTech CD
  •  
    Navigation: All forums > Cores > Message List > Message Post

    Message

    Reply | Reply all
    Date Prev | Date Next | Thread Prev | Thread Next Date Index | Thread Index

    From: Mark McDougall<markm@v...>
    Date: Tue Feb 19 01:52:33 CET 2008
    Subject: [oc] PCI core question
    Top
    Howard Harte wrote:

    > One issue I'd like some advice on is improving read latency. For single
    > 32-bit writes, they complete in about 300ns, which is fine. For 32-bit
    > reads, they complete in 2.5uS, which is a really long time. I'm reading
    > and writing to a FIFO, which occupies a single address on the wishbone
    > backplane.

    It's been a while since I used the core, but we did some profiling and
    analysis because we had similar issues.

    By far the most significant factor in read performance is the fact that
    all reads are posted, which means the core will immediately disconnect
    after latching the address/command. It is up to the chipset on the
    motherboard then to decide when to issue a retry to the core in order to
    receive the result. We were seeing gaps of 8 more more PCI clocks, for
    example, between retries.

    IIRC the *absolute* best performance you'll hope to get on reads on an
    Intel mobo is around 3MHz (300ns) - or 10 clocks per transfer, due to this
    disconnect. For 32-bit reads, that of course equates to 12MBps. The best
    we actually measured on our platform was around 9MBps for internal
    register accesses, and ~4.4MBps when accessing external SRAM due to the
    fact that it required 2 retries/access.

    You're seeing the above-mentioned 300ns per write because that likely
    corresponds to the gaps between successive commands on the PCI bus as
    well. That's pretty much exactly what we saw too.

    > Some things I've thought about are mapping the FIFO to a separate BAR,
    > ignoring the lower address bits, and enabling read prefetching, but
    > figured this might be dangerous since it's a FIFO.

    Mapping to a separate BAR and igfnoring the lower address bits likely
    won't make any difference at all.

    And you definitely *don't* want to enable pre-fetching, as your
    performance will actually drop, as it fetches an entire cache-line from
    wishbone space before allowing the retry to resume... we tried that! ;)
    Besides, Intel mobo chipsets won't burst on memory reads, so the rest of
    the cacheline will *always* be discarded! All that aside, you have a FIFO
    so it wouldn't help you at all, unless you did something really dangerous
    like alias the read port to multiple consecutive addresses... but then you
    have issues on FIFO empty... don't go there...

    > Another thing I considered is doing bus mastering to empty the FIFO into
    > main memory, but this is a large change to my design.

    We had DMA bus mastering for memory access and IDE and the throughput was
    limited by the memory/IDE, so I can't recall what the bus utilisation was
    during the actual block transfer, but we did see upwards of 16MB/s overall
    performance.

    > Any other thoughts on how to proceed?

    Simulate your design and you'll see exactly where your latencies are. Use
    a PCI bus analyser to look at how often the mobo is attempting to connect
    to the core and issue retries etc.

    Regards,

    --
    Mark McDougall, Engineer
    Virtual Logic Pty Ltd, <http://www.vl.com.au>
    21-25 King St, Rockdale, 2216
    Ph: +612-9599-3255 Fax: +612-9599-3266

     
    Copyright (c) 1999 OPENCORES.ORG. All rights reserved.