Performances
The best performances are achieved using the 2eSST mode. The small utility vmedma
was written to measure dma speed:
>sudo ./vmedma -a 0x70c000 -m 0x20 -e 2eSST-267 -n 0x10000 --pattern -r 10
Rate: 75.817035 MB/sec
Rate: 86.509022 MB/sec
Rate: 83.637215 MB/sec
Rate: 87.407086 MB/sec
Rate: 87.375438 MB/sec
Rate: 87.495800 MB/sec
Rate: 87.588622 MB/sec
Rate: 87.618829 MB/sec
Rate: 87.348328 MB/sec
Rate: 87.589972 MB/sec
Notes:
-
with
--pattern
option, the utility checks the data read are correct (the board generate a different pattern according to the address). So this also checks the data aren't corrupted during the transfer. -
The first transfer is slower. This is because the memory was just allocated but never touched. So before the first transfer, the kernel has to initialize the pages. If the pages are initialized (using memset) before the first DMA transfer, the transfer rate is similar to the next ones.
-
There is some variations in the transfer rate.
-
All these measure were done on cdv28 (ELMA crate) using a MEN A20 master (which uses the TSI148 bridge).
-
The current vmebridge driver (PCI-VME bridge: V1.7 (Feb, 04 2019)) manually split large transfers in blocks of 2048 byte, and thus create many descriptors. This is not needed as the TSI148 chip also split large transfers, but this slows down transfers because the TSI chip has to fetch a new descriptor every 2048 bytes. The measures were done with a modified driver to improve performances.
-
The current vmebridge driver doesn't support 2eSST mode. The measures were done with a modified driver.
Hints to achieve better performances:
-
use higher clock rate (this design use a 125Mhz clock)
-
use pipelined wishbone mode.
Comparisons
The figures could be compared with an MBLT transfer:
>sudo ./vmedma -a 0x70c000 -m 0x38 -e 2eSST-267 -n 0x10000 --pattern -r 10
Rate: 39.922531 MB/sec
Rate: 43.815127 MB/sec
Rate: 43.422740 MB/sec
Rate: 43.100387 MB/sec
Rate: 43.105737 MB/sec
Rate: 44.066281 MB/sec
Rate: 43.373380 MB/sec
Rate: 44.051932 MB/sec
Rate: 44.038523 MB/sec
Rate: 44.023478 MB/sec
So MBLT is about 2x slower than 2eSST.
The prefetch feature is not available for single transfer, which is much slower:
>sudo ./vmedma -a 0x70c000 -m 0x39 -e 2eSST-267 -n 0x10000 --pattern -r 10
Rate: 15.332980 MB/sec
Rate: 15.714858 MB/sec
Rate: 15.746667 MB/sec
Rate: 15.749694 MB/sec
Rate: 15.751242 MB/sec
Rate: 15.754887 MB/sec
Rate: 15.641741 MB/sec
Rate: 15.746072 MB/sec
Rate: 15.747095 MB/sec
Rate: 15.756532 MB/sec
And finally, the BLT transfer (which also doesn't prefetch) is not very interesting:
>sudo ./vmedma -a 0x70c000 -m 0x3b -e 2eSST-267 -n 0x10000 --pattern -r 10
Rate: 14.526958 MB/sec
Rate: 14.743219 MB/sec
Rate: 14.808746 MB/sec
Rate: 14.815191 MB/sec
Rate: 14.806283 MB/sec
Rate: 14.816940 MB/sec
Rate: 14.815928 MB/sec
Rate: 14.816476 MB/sec
Rate: 14.815570 MB/sec
Rate: 14.820186 MB/sec
Influence of transfer length
With a larger transfer (256KB):
>sudo ./vmedma -a 0x70c000 -m 0x20 -e 2eSST-267 -n 0x40000 --pattern -r 10
Rate: 85.419768 MB/sec
Rate: 95.507590 MB/sec
Rate: 94.650075 MB/sec
Rate: 95.816425 MB/sec
Rate: 95.896916 MB/sec
Rate: 95.903538 MB/sec
Rate: 95.845078 MB/sec
Rate: 95.918551 MB/sec
Rate: 94.879865 MB/sec
Rate: 95.867571 MB/sec
With a small transfer (4KB):
>sudo ./vmedma -a 0x70c000 -m 0x20 -e 2eSST-267 -n 0x1000 --pattern -r 10
Rate: 18.724774 MB/sec
Rate: 26.765587 MB/sec
Rate: 29.100086 MB/sec
Rate: 30.661062 MB/sec
Rate: 26.153779 MB/sec
Rate: 29.114400 MB/sec
Rate: 32.230849 MB/sec
Rate: 25.023863 MB/sec
Rate: 23.884425 MB/sec
Rate: 31.247000 MB/sec
Because the variation with the transfer size is very important, it is to see bus utilisation.
>sudo ./vmespy -s 12 -a start 0
INFO: Map VME 0x00600000 AM 0x39 (base=00600000 mask=fff80000)
Start trigger after 0 transfers
>sudo ./vmedma -a 0x70c000 -m 0x20 -e 2eSST-267 -n 0x1000 --pattern -r 1
Rate: 22.074322 MB/sec
>sudo ./vmespy -s 12 -a dump
INFO: Map VME 0x00600000 AM 0x39 (base=00600000 mask=fff80000)
cycl: data addr am as wr ds dtack
0000: 00000000 0070c011 20 1 0 0 0
0001: 00000000 0070c011 20 1 0 1 0
0011: 0000007f 0070c011 20 1 0 1 0
0012: 00000000 0070c011 20 1 0 1 0
0015: 00000000 00018000 20 1 0 1 0
0016: 00000001 00018000 20 1 0 0 0
0027: 00000000 00000000 20 1 0 0 0
0028: 00000000 00000000 20 1 0 1 0
0041: 00000000 00000000 20 1 0 3 0
0042: 00003f00 ffffffff 20 1 0 3 0
0043: ffffffff ffffffff 20 1 0 3 0
0046: fef1ffff ff00ff00 20 1 0 3 0
0047: fe01fe01 ff00ff00 20 1 0 3 0
0052: fc03fe01 fd02fd02 20 1 0 3 0
0053: fc03fc03 fd02fd02 20 1 0 3 0
0058: fa05fa07 fb04fb04 20 1 0 3 0
0059: fa05fa05 fb04fb04 20 1 0 3 0
0064: f807fa07 f906f906 20 1 0 3 0
0065: f807f807 f906f906 20 1 0 3 0
[...]
(The clock cycles are 8ns, dtack is always 0 as it cannot be read by a SVEC board).
The "setup" cost is quite important: the transfer needs 46*8ns (360ns) between the rise of AS and the first data transfer). Then 8 bytes are transferred every 6 cycles (48ns) which is slightly above 160MB/s.
An transfer is at most 2KB.
[...]
1570: 02fd02fd 03fc03fc 20 1 0 3 0
1576: 00ff00ff 01fe01fe 20 1 0 3 0
1582: 00ff00ff 7ffe01fe 20 1 0 3 0
1583: 00ff0cff ffffffff 20 1 0 3 0
1584: ffffffff ffffffff 20 1 0 3 0
1585: ffffffff ffffffff 20 1 0 0 0
1596: ffffffff ffffffff 20 0 0 0 0
1598: ffffffff ffffffff 3f 0 0 0 0
1614: 00000000 ffffffff 3f 0 0 0 0
1618: 00000000 0070c811 20 0 0 0 0
1624: 00000000 0070c811 20 1 0 1 0
1639: 00000001 00018000 20 1 0 1 0
1640: 00000001 00018000 20 1 0 0 0
1650: 00000001 00000000 20 1 0 0 0
1651: 00000000 00000000 20 1 0 1 0
1664: 00000000 00000000 20 1 0 3 0
1666: 3fffff7a ffffffff 20 1 0 3 0
1667: ffffffff ffffffff 20 1 0 3 0
1670: fe01feff ff00ff00 20 1 0 3 0
1671: fe01fe01 ff00ff00 20 1 0 3 0
1676: fc03fc03 fd02fd02 20 1 0 3 0
1682: fa05fa05 fb04fb04 20 1 0 3 0
[...]
Between two blocks, about 100 cycles (so 800ns) are needed to release the bus and start a new transfer. Back-off timer of the TSI148 was not changed (default is 0).