Project

General

Profile

Actions

Bug #14767

closed

Massive packet loss to/from gw.sepia.ceph.com

Added by Zack Cerza about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Immediate
Category:
DC ops
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Related issues 2 (0 open2 closed)

Related to sepia - Bug #15199: git.ceph.com[0: 67.205.20.229]: errno=Connection timed outResolvedDavid Galloway03/18/2016

Actions
Is duplicate of sepia - Bug #14763: teuthology.front.sepia.ceph.com: unable to connect to git.ceph.comResolvedDavid Galloway02/15/2016

Actions
Actions #1

Updated by Zack Cerza about 8 years ago

From my laptop in CO to gw:

                                                  My traceroute  [v0.86]
zwork.local (0.0.0.0)                                                                            Mon Feb 15 10:35:20 2016
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                                                 Packets               Pings
 Host                                                                          Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 192.168.1.1                                                                 0.0%   241    1.5   1.0   0.7   4.3   0.3
 2. c-98-245-112-1.hsd1.co.comcast.net                                          0.0%   241    9.7  11.3   8.7  30.2   2.4
 3. xe-9-1-3-sur02.boulder.co.denver.comcast.net                                0.0%   241    9.7  11.3   8.4  28.8   3.0
 4. ae-10-sur03.boulder.co.denver.comcast.net                                   0.0%   241    9.1  10.8   8.6  22.4   2.0
 5. ae-29-ar01.denver.co.denver.comcast.net                                     0.0%   240   10.3  12.1   9.7  28.7   2.8
 6. 4.68.63.165                                                                 0.0%   240   13.3  14.8   9.7  81.3   8.9
 7. ae-2-3602.ear3.Washington1.Level3.net                                      92.9%   240   58.2 1175.  52.1 10029 2504.
 8. 4.16.240.122                                                                0.0%   240   59.4  60.6  58.1  76.5   2.8
 9. 8.43.84.1                                                                  25.8%   240  40532 33568 25428 46669 4768.
10. 8.43.84.3                                                                  53.6%   240   58.2  59.8  57.7  69.4   2.2
11. 8.43.84.4                                                                  64.4%   240  28595 27510 25672 28642 828.9
12. 8.43.84.190                                                                62.8%   240   71.1  68.5  59.8 152.3  11.8
13. 8.43.84.129                                                                67.9%   240   59.6  60.2  57.7  73.3   2.8

Actions #2

Updated by Zack Cerza about 8 years ago

From gw to google.com:

                                                  My traceroute  [v0.85]
gw (0.0.0.0)                                                                                     Mon Feb 15 17:34:42 2016
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                                                 Packets               Pings
 Host                                                                          Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 8.43.84.190                                                                 0.0%   117   11.7  18.1   6.3  84.8  10.1
 2. 8.43.84.4                                                                  71.6%   116  28477 28368 28139 28551 100.0
 3. 8.43.84.5                                                                  14.7%   116    0.5   0.6   0.4   0.8   0.0
 4. 8.43.84.2                                                                  70.4%   116  28479 28400 28131 28564  91.0
 5. 8.43.84.0                                                                   0.0%   116    0.8   0.9   0.7   1.2   0.0
 6. 6-1-23-101.ear3.Washington1.Level3.net                                      0.0%   116    7.4   8.1   7.3  24.1   2.2
 7. ae-1-3501.ear1.Washington12.Level3.net                                     90.5%   116    7.9 2776.   7.7 12923 4306.
 8. Google-level3-60G.WashingtonDC.Level3.net                                   0.0%   116    7.9   8.6   7.9  43.6   4.0
 9. 209.85.242.142                                                              0.0%   116    8.3   9.1   8.2  89.2   7.5
10. 209.85.143.212                                                              0.0%   116    9.9   9.6   9.3  10.9   0.1
11. 216.239.48.160                                                              0.0%   116   16.1  16.7  16.1  24.7   1.3
12. 216.239.49.147                                                              0.0%   116   32.9  16.6  16.0  38.3   2.5
13. ???
14. qb-in-f102.1e100.net                                                        0.0%   116   15.5  15.6  15.4  15.8   0.0

Actions #3

Updated by Zack Cerza about 8 years ago

Yet, it's not entirely broken:

$ wget -O /dev/null http://speedtest.wdc01.softlayer.com/downloads/test100.zip
--2016-02-15 17:36:32--  http://speedtest.wdc01.softlayer.com/downloads/test100.zip
Resolving speedtest.wdc01.softlayer.com (speedtest.wdc01.softlayer.com)... 2607:f0d0:3001:78::2, 208.43.102.250
Connecting to speedtest.wdc01.softlayer.com (speedtest.wdc01.softlayer.com)|2607:f0d0:3001:78::2|:80... failed: No route to host.
Connecting to speedtest.wdc01.softlayer.com (speedtest.wdc01.softlayer.com)|208.43.102.250|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104874307 (100M) [application/zip]
Saving to: ‘/dev/null’

100%[================================================================================>] 104,874,307 74.3MB/s   in 1.3s

2016-02-15 17:36:34 (74.3 MB/s) - ‘/dev/null’ saved [104874307/104874307]

Actions #4

Updated by David Galloway about 8 years ago

  • Status changed from New to In Progress
  • Assignee set to David Galloway

gw to google now

gw (0.0.0.0)                                                                                                                                                                                                         Mon Feb 15 23:41:16 2016
Resolver error: No error returned but no answers given. of fields   quit
                                                                                                                                                                                                     Packets               Pings
 Host                                                                                                                                                                                              Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 8.43.84.190                                                                                                                                                                                     0.0%    10   16.5  16.4  11.3  21.9   3.1
 2. 8.43.84.4                                                                                                                                                                                       0.0%    10    4.6   5.8   1.6  10.5   3.0
 3. 8.43.84.5                                                                                                                                                                                      40.0%    10    0.5   0.5   0.5   0.5   0.0
 4. 8.43.84.2                                                                                                                                                                                       0.0%    10   39.2  31.1  16.6  42.1   8.3
 5. 8.43.84.0                                                                                                                                                                                       0.0%    10    0.7   0.7   0.7   0.8   0.0
 6. 6-1-23-101.ear3.Washington1.Level3.net                                                                                                                                                          0.0%    10    7.3   8.3   7.3  16.4   2.8
 7. ???
 8. Google-level3-60G.WashingtonDC.Level3.net                                                                                                                                                       0.0%    10    7.9   7.9   7.8   8.1   0.0
 9. 209.85.242.142                                                                                                                                                                                  0.0%    10    8.3  11.2   8.1  37.3   9.1
10. 216.239.43.67                                                                                                                                                                                   0.0%    10    8.8   8.8   8.7   8.9   0.0
11. 209.85.250.70                                                                                                                                                                                   0.0%    10   27.2  18.5  16.6  27.2   3.9
12. 209.85.252.69                                                                                                                                                                                   0.0%    10   17.0  16.9  16.8  17.0   0.0
13. ???
14. qb-in-f102.1e100.net                                                                                                                                                                            0.0%    10   17.5  16.8  16.7  17.5   0.0

home to gw

dgallowa (0.0.0.0)                                                                                                                                                                                                   Mon Feb 15 18:43:14 2016
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                                                                                                                                                                     Packets               Pings
 Host                                                                                                                                                                                              Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. router.asus.com                                                                                                                                                                                 0.0%    11    0.4   0.4   0.3   0.4   0.0
 2. mta-107-13-32-1.nc.rr.com                                                                                                                                                                       0.0%    10   17.5  20.5   9.1  84.3  22.6
 3. mta-107-13-32-1.nc.rr.com                                                                                                                                                                       0.0%    10   69.0  19.3   9.2  69.0  17.7
 4. cpe-174-111-117-141.triad.res.rr.com                                                                                                                                                            0.0%    10   22.9  42.1  22.3 171.6  45.7
 5. cpe-024-025-062-106.ec.res.rr.com                                                                                                                                                               0.0%    10   13.4  17.1  13.2  21.6   3.3
 6. 24.93.67.202                                                                                                                                                                                    0.0%    10   23.8  25.6  22.5  31.7   2.8
 7. bu-ether14.atlngamq46w-bcr00.tbone.rr.com                                                                                                                                                       0.0%    10   30.7  29.7  25.5  33.8   2.2
 8. 0.ae5.pr0.atl20.tbone.rr.com                                                                                                                                                                    0.0%    10   23.8  25.3  22.4  27.9   1.9
 9. ???
10. ae-1-3502.ear3.Washington1.Level3.net                                                                                                                                                          77.8%    10  556.9 996.4 556.9 1436. 621.6
11. 4.16.240.122                                                                                                                                                                                    0.0%    10   43.8  45.3  41.6  56.7   4.3
12. 8.43.84.1                                                                                                                                                                                       0.0%    10   63.4  62.6  58.9  67.7   2.7
13. 8.43.84.3                                                                                                                                                                                      30.0%    10   46.4  46.7  44.4  49.8   1.6
14. 8.43.84.4                                                                                                                                                                                       0.0%    10   74.9  79.2  73.1  89.4   6.1
15. 8.43.84.190                                                                                                                                                                                     0.0%    10   52.0  52.6  42.1  59.4   5.3
16. 8.43.84.129                                                                                                                                                                                     0.0%    10   49.7  47.1  41.4  56.6   4.3

There are still a few unknowns but here's what we do know.

The Sepia lab is in a shared Community space in the RDU2 lab. On Friday
afternoon, an additional tenant was brought online so they could start
building out their infrastructure.

At that time, severe packet loss, intermittent downtime, and network
degradation was observed. This manifested itself in yum and apt updates
failures on test nodes and 404 timeouts to external sites like git.ceph.com.

As of about 0300UTC today, the packet loss to the Sepia VPN server was
resolved so we're able to reliably get to gw.sepia.ceph.com and
gitbuilder.ceph.com. There's still some packet loss on the network
equipment in front of the lab which IT is reaching out to Level3 to
diagnose.

We're still not seeing the network speeds we're expecting, however.

We've run a few smoke tests to verify stability despite the network
speed drops.

I've forced rebuild on a few critical branches in the lab so tests can
resume.

Actions #5

Updated by David Galloway about 8 years ago

  • Is duplicate of Bug #14763: teuthology.front.sepia.ceph.com: unable to connect to git.ceph.com added
Actions #6

Updated by David Galloway about 8 years ago

  • Status changed from In Progress to Resolved

Timeouts are still intermittent but mostly resolved. Lingering network issues are still being investigated by IT.

No update as of 16:51UTC 16FEB

Actions #7

Updated by Nathan Cutler about 8 years ago

  • Related to Bug #15199: git.ceph.com[0: 67.205.20.229]: errno=Connection timed out added
Actions

Also available in: Atom PDF