Patchwork [BUG] moving fq back to clock monotonic breaks my setup

login
register
mail settings
Submitter Eric Dumazet
Date Jan. 10, 2019, 5:53 a.m.
Message ID <CANn89iLkciYGn7GyKc2QitSjjTms58_GKFqjOJ=6G8wEFi=SJA@mail.gmail.com>
Download mbox | patch
Permalink /patch/696365/
State New
Headers show

Comments

Eric Dumazet - Jan. 10, 2019, 5:53 a.m.
On Wed, Jan 9, 2019 at 4:48 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> Hi,
>
> Just been trough ~5+ hours of bisecting and eventually actually found
> the culprit =)
>
> commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri Sep 28 10:28:44 2018 -0700
>
>     tcp/fq: move back to CLOCK_MONOTONIC
>
> [--8<--]
>
> So this might be because my setup might be "odd".
>
> Basically I have a firewall with four nics that uses two of those nics
> to handle my normal
> internet connection (firewall/MASQ/NAT) and the other two are assigned
> to one bridge each.
>
> The firewall is also my local caching DNS server and DHCP server,
> which is also used by the VM:s...
> But with 4.20 DHCP replies disappeared before entering the bridge - i
> couldn't even see them in
> tcpdump! (all nics are ixgbe on a atom soc)
>
> I'm currently running a kernel with that patch reversed but I'm also
> wondering about possible ways
> forward since I'm reverting a fix from someone else...

I suggest you use netdev@ mailing list instead of lkml

Then, we probably need to clear skb->tstamp in more paths (you are
mentioning bridge ...)

See commit 8203e2d844d34af247a151d8ebd68553a6e91785 for reference.

Can you try :

        return 0;

Thanks.
Ian Kumlien - Jan. 10, 2019, 8:25 a.m.
On Thu, Jan 10, 2019 at 6:53 AM Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Jan 9, 2019 at 4:48 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > Hi,
> >
> > Just been trough ~5+ hours of bisecting and eventually actually found
> > the culprit =)
> >
> > commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Fri Sep 28 10:28:44 2018 -0700
> >
> >     tcp/fq: move back to CLOCK_MONOTONIC
> >
> > [--8<--]
> >
> > So this might be because my setup might be "odd".
> >
> > Basically I have a firewall with four nics that uses two of those nics
> > to handle my normal
> > internet connection (firewall/MASQ/NAT) and the other two are assigned
> > to one bridge each.
> >
> > The firewall is also my local caching DNS server and DHCP server,
> > which is also used by the VM:s...
> > But with 4.20 DHCP replies disappeared before entering the bridge - i
> > couldn't even see them in
> > tcpdump! (all nics are ixgbe on a atom soc)
> >
> > I'm currently running a kernel with that patch reversed but I'm also
> > wondering about possible ways
> > forward since I'm reverting a fix from someone else...
>
> I suggest you use netdev@ mailing list instead of lkml
>
> Then, we probably need to clear skb->tstamp in more paths (you are
> mentioning bridge ...)
>
> See commit 8203e2d844d34af247a151d8ebd68553a6e91785 for reference.
>
> Can you try :
>
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 5372e2042adfe20d3cd039c29057535b2413be61..bd4fa141420c92a44716bd93fcd8aa3d3310203a
> 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -53,6 +53,7 @@ int br_dev_queue_push_xmit(struct net *net, struct
> sock *sk, struct sk_buff *skb
>                 skb_set_network_header(skb, depth);
>         }
>
> +       skb->tstamp = 0;
>         dev_queue_xmit(skb);
>
>         return 0;

This works, and so does: https://marc.info/?l=linux-netdev&m=154696956604748&w=2

Pointed out by Paolo (tested both separately)
Paolo Abeni - Jan. 10, 2019, 8:55 a.m.
On Thu, 2019-01-10 at 09:25 +0100, Ian Kumlien wrote:
> On Thu, Jan 10, 2019 at 6:53 AM Eric Dumazet <edumazet@google.com> wrote:
> > On Wed, Jan 9, 2019 at 4:48 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > Hi,
> > > 
> > > Just been trough ~5+ hours of bisecting and eventually actually found
> > > the culprit =)
> > > 
> > > commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
> > > Author: Eric Dumazet <edumazet@google.com>
> > > Date:   Fri Sep 28 10:28:44 2018 -0700
> > > 
> > >     tcp/fq: move back to CLOCK_MONOTONIC
> > > 
> > > [--8<--]
> > > 
> > > So this might be because my setup might be "odd".
> > > 
> > > Basically I have a firewall with four nics that uses two of those nics
> > > to handle my normal
> > > internet connection (firewall/MASQ/NAT) and the other two are assigned
> > > to one bridge each.
> > > 
> > > The firewall is also my local caching DNS server and DHCP server,
> > > which is also used by the VM:s...
> > > But with 4.20 DHCP replies disappeared before entering the bridge - i
> > > couldn't even see them in
> > > tcpdump! (all nics are ixgbe on a atom soc)
> > > 
> > > I'm currently running a kernel with that patch reversed but I'm also
> > > wondering about possible ways
> > > forward since I'm reverting a fix from someone else...
> > 
> > I suggest you use netdev@ mailing list instead of lkml
> > 
> > Then, we probably need to clear skb->tstamp in more paths (you are
> > mentioning bridge ...)
> > 
> > See commit 8203e2d844d34af247a151d8ebd68553a6e91785 for reference.
> > 
> > Can you try :
> > 
> > diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> > index 5372e2042adfe20d3cd039c29057535b2413be61..bd4fa141420c92a44716bd93fcd8aa3d3310203a
> > 100644
> > --- a/net/bridge/br_forward.c
> > +++ b/net/bridge/br_forward.c
> > @@ -53,6 +53,7 @@ int br_dev_queue_push_xmit(struct net *net, struct
> > sock *sk, struct sk_buff *skb
> >                 skb_set_network_header(skb, depth);
> >         }
> > 
> > +       skb->tstamp = 0;
> >         dev_queue_xmit(skb);
> > 
> >         return 0;
> 
> This works, and so does: https://marc.info/?l=linux-netdev&m=154696956604748&w=2
> 
> Pointed out by Paolo (tested both separately)

Note: I cleared the tstamp in br_forward_finish() instead of
br_dev_queue_push_xmit() because I think the latter could be called
also in the local xmit path, via br_nf_post_routing.

We must preserve the tstamp in output path, right?

Thanks,

Paolo
Eric Dumazet - Jan. 11, 2019, 9:34 a.m.
On Thu, Jan 10, 2019 at 12:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2019-01-10 at 09:25 +0100, Ian Kumlien wrote:


> > This works, and so does: https://marc.info/?l=linux-netdev&m=154696956604748&w=2
> >
> > Pointed out by Paolo (tested both separately)
>
> Note: I cleared the tstamp in br_forward_finish() instead of
> br_dev_queue_push_xmit() because I think the latter could be called
> also in the local xmit path, via br_nf_post_routing.
>
> We must preserve the tstamp in output path, right?
>

 I was not aware of your patch, SGTM, thanks.
Ian Kumlien - Jan. 11, 2019, 9:51 a.m.
On Fri, Jan 11, 2019 at 10:35 AM Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Jan 10, 2019 at 12:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Thu, 2019-01-10 at 09:25 +0100, Ian Kumlien wrote:
>
>
> > > This works, and so does: https://marc.info/?l=linux-netdev&m=154696956604748&w=2
> > >
> > > Pointed out by Paolo (tested both separately)
> >
> > Note: I cleared the tstamp in br_forward_finish() instead of
> > br_dev_queue_push_xmit() because I think the latter could be called
> > also in the local xmit path, via br_nf_post_routing.
> >
> > We must preserve the tstamp in output path, right?
> >
>
>  I was not aware of your patch, SGTM, thanks.

And you can add Tested-by: ian.kumlien@gmail.com

Patch

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 5372e2042adfe20d3cd039c29057535b2413be61..bd4fa141420c92a44716bd93fcd8aa3d3310203a
100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -53,6 +53,7 @@  int br_dev_queue_push_xmit(struct net *net, struct
sock *sk, struct sk_buff *skb
                skb_set_network_header(skb, depth);
        }

+       skb->tstamp = 0;
        dev_queue_xmit(skb);