Commit 4177c4b6 authored by Wesley W. Terpstra's avatar Wesley W. Terpstra

wr_endpoint: fix (one cause of) packet filter misclassification

The symptom of this bug is that about 3% of the time a WR endpoint will
power-up such that it always fails to reach track phase state.  This is
caused by the endpoint dropping the first PTP packet after calibration.

The packet is dropped, because it is misclassified. This happens because
it is possible for the U_match_buffer and fab_pipe in the RX path to
become desynchronized. When this happens, packets receive the classification
of the previous packet. Since calibration is slow, it is virtually assured
that a BOOTP request is seen, leading to the misclassification of the
following PTP packet.

The U_match_buffer can become desynchronized multiple ways, but the one we
saw "in the wild" is due to the lowering of PFCR0 in wrpc-sw during packet
filter configuration.  Due to an unsafe transfer from clk_sys to clk_rx in
ep_packet_filter:p_gen_status, it is possible for the transition of PFCR0 to
cause a glitch that sets done_int high, even though there is no packet being
processed.  This puts an excess class tag into U_match_buffer, which leads
to the mismatch between packets and classes.  This patch fixes the transfer.

Unfortunately, even after this patch, it is my opinion that this code
remains completely unsafe.  The core problem is that desynchronization of
U_match_buffer and fab_pipe is possible at all.  This is a very brittle
design.  One can imagine many scenarios that can lead to this state, after
which point the WR endpoint will never recover.  A simple example: consider
a packet arriving while PFCR0 is switched.  ep_rx_path:mbuf_we can then
pulse twice, once, or never for the packet depending on the race condition
between ematch_done and pfilter_done.  If this happens, the RX path will
remain permanently desynchronized.
parent 1098119a
......@@ -162,8 +162,21 @@ architecture behavioral of ep_packet_filter is
signal stage1, stage2 : std_logic;
signal r_pfcr1_mm_data_lsb : std_logic_vector(11 downto 0);
signal pfcr0_enable_rxclk : std_logic;
begin -- behavioral
U_sync_pfcr0_enable : gc_sync_ffs
generic map (
g_sync_edge => "positive")
port map (
clk_i => clk_rx_i,
rst_n_i => '1',
data_i => regs_i.pfcr0_enable_o,
synced_o => pfcr0_enable_rxclk,
npulse_o => open,
ppulse_o => open);
process(clk_sys_i)
begin
if rising_edge(clk_sys_i) then
......@@ -314,7 +327,7 @@ begin -- behavioral
done_int <= '0';
drop_o <= '0';
else
if(regs_i.pfcr0_enable_o = '0') then
if(pfcr0_enable_rxclk = '0') then
done_int <= '0';
drop_o <= '0';
pclass_o <= (others => '0');
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment