Bug #2764
closedxfstest hang; osd socket closed messages
0%
Description
[223564.772925] libceph: osd1 10.214.132.12:6803 socket closed [223565.551727] libceph: osd4 10.214.132.13:6803 socket closed [224463.542820] libceph: osd1 10.214.132.12:6803 socket closed [224464.256620] libceph: osd4 10.214.132.13:6803 socket closed [225362.312671] libceph: osd1 10.214.132.12:6803 socket closed [225363.040916] libceph: osd4 10.214.132.13:6803 socket closed [226261.082746] libceph: osd1 10.214.132.12:6803 socket closed [226261.818194] libceph: osd4 10.214.132.13:6803 socket closed [227159.849469] libceph: osd1 10.214.132.12:6803 socket closed [227160.591953] libceph: osd4 10.214.132.13:6803 socket closed [228058.616188] libceph: osd1 10.214.132.12:6803 socket closed [228059.366723] libceph: osd4 10.214.132.13:6803 socket closed [228957.385373] libceph: osd1 10.214.132.12:6803 socket closed [228958.145662] libceph: osd4 10.214.132.13:6803 socket closed [229856.156362] libceph: osd1 10.214.132.12:6803 socket closed [229856.925149] libceph: osd4 10.214.132.13:6803 socket closed [230754.927726] libceph: osd1 10.214.132.12:6803 socket closed [230755.705742] libceph: osd4 10.214.132.13:6803 socket closed [231653.698026] libceph: osd1 10.214.132.12:6803 socket closed [231654.486141] libceph: osd4 10.214.132.13:6803 socket closed [232552.458092] libceph: osd1 10.214.132.12:6803 socket closed [232552.475194] libceph: osd1 10.214.132.12:6803 connect authorization failure [232553.247748] libceph: osd4 10.214.132.13:6803 socket closed [232553.265260] libceph: osd4 10.214.132.13:6803 connect authorization failure [233452.098267] libceph: osd1 10.214.132.12:6803 socket closed [233453.105762] libceph: osd4 10.214.132.13:6803 socket closed [234350.858738] libceph: osd1 10.214.132.12:6803 socket closed [234350.875846] libceph: osd1 10.214.132.12:6803 connect authorization failure [234351.867907] libceph: osd4 10.214.132.13:6803 socket closed [234351.885196] libceph: osd4 10.214.132.13:6803 connect authorization failure [235250.857388] libceph: osd1 10.214.132.12:6803 socket closed [235250.987315] libceph: osd4 10.214.132.13:6803 socket closed [235251.004955] libceph: osd4 10.214.132.13:6803 connect authorization failure [236149.618319] libceph: osd1 10.214.132.12:6803 socket closed [236149.636781] libceph: osd1 10.214.132.12:6803 connect authorization failure [236150.276918] libceph: osd4 10.214.132.13:6803 socket closed [237048.639043] libceph: osd1 10.214.132.12:6803 socket closed [237049.047246] libceph: osd4 10.214.132.13:6803 socket closed [237049.066697] libceph: osd4 10.214.132.13:6803 connect authorization failure [237947.408077] libceph: osd1 10.214.132.12:6803 socket closed [237947.428231] libceph: osd1 10.214.132.12:6803 connect authorization failure [237948.126775] libceph: osd4 10.214.132.13:6803 socket closed [238846.736883] libceph: osd1 10.214.132.12:6803 socket closed [238846.758182] libceph: osd1 10.214.132.12:6803 connect authorization failure [238846.897194] libceph: osd4 10.214.132.13:6803 socket closed [238846.919355] libceph: osd4 10.214.132.13:6803 connect authorization failure [239745.976770] libceph: osd4 10.214.132.13:6803 socket closed [239746.175294] libceph: osd1 10.214.132.12:6803 socket closed [240644.745103] libceph: osd4 10.214.132.13:6803 socket closed [240644.768313] libceph: osd4 10.214.132.13:6803 connect authorization failure [240644.944420] libceph: osd1 10.214.132.12:6803 socket closed [240644.968030] libceph: osd1 10.214.132.12:6803 connect authorization failure [241544.061204] libceph: osd4 10.214.132.13:6803 socket closed [241544.901360] libceph: osd1 10.214.132.12:6803 socket closed [242442.729929] libceph: osd4 10.214.132.13:6803 socket closed [242442.752860] libceph: osd4 10.214.132.13:6803 connect authorization failure [242443.670571] libceph: osd1 10.214.132.12:6803 socket closed [242443.694228] libceph: osd1 10.214.132.12:6803 connect authorization failure [243341.938215] libceph: osd4 10.214.132.13:6803 socket closed [243342.878488] libceph: osd1 10.214.132.12:6803 socket closed [243342.901684] libceph: osd1 10.214.132.12:6803 connect authorization failure [244240.700553] libceph: osd4 10.214.132.13:6803 socket closed [244240.724281] libceph: osd4 10.214.132.13:6803 connect authorization failure [244242.211031] libceph: osd1 10.214.132.12:6803 socket closed [245140.688321] libceph: osd4 10.214.132.13:6803 socket closed [245140.977989] libceph: osd1 10.214.132.12:6803 socket closed [245141.001021] libceph: osd1 10.214.132.12:6803 connect authorization failure [246039.444867] libceph: osd4 10.214.132.13:6803 socket closed [246039.468376] libceph: osd4 10.214.132.13:6803 connect authorization failure [246040.083501] libceph: osd1 10.214.132.12:6803 socket closed [246040.107216] libceph: osd1 10.214.132.12:6803 connect authorization failure [246938.680424] libceph: osd4 10.214.132.13:6803 socket closed [246939.668001] libceph: osd1 10.214.132.12:6803 socket closed [247837.374400] libceph: osd4 10.214.132.13:6803 socket closed [247837.397713] libceph: osd4 10.214.132.13:6803 connect authorization failure [247838.436308] libceph: osd1 10.214.132.12:6803 socket closed [247838.459458] libceph: osd1 10.214.132.12:6803 connect authorization failure [248736.555714] libceph: osd4 10.214.132.13:6803 socket closed [248738.422406] libceph: osd1 10.214.132.12:6803 socket closed [248738.446330] libceph: osd1 10.214.132.12:6803 connect authorization failure [249635.323764] libceph: osd4 10.214.132.13:6803 socket closed [249635.347633] libceph: osd4 10.214.132.13:6803 connect authorization failure [249637.724051] libceph: osd1 10.214.132.12:6803 socket closed [250534.432012] libceph: osd4 10.214.132.13:6803 socket closed [250536.404949] libceph: osd1 10.214.132.12:6803 socket closed [250536.428861] libceph: osd1 10.214.132.12:6803 connect authorization failure [251433.201815] libceph: osd4 10.214.132.13:6803 socket closed [251433.225705] libceph: osd4 10.214.132.13:6803 connect authorization failure [251435.737624] libceph: osd1 10.214.132.12:6803 socket closed [252333.099897] libceph: osd4 10.214.132.13:6803 socket closed [252334.472220] libceph: osd1 10.214.132.12:6803 socket closed [252334.495534] libceph: osd1 10.214.132.12:6803 connect authorization failure [253231.869504] libceph: osd4 10.214.132.13:6803 socket closed [253231.892891] libceph: osd4 10.214.132.13:6803 connect authorization failure [253233.735924] libceph: osd1 10.214.132.12:6803 socket closed [254131.068997] libceph: osd4 10.214.132.13:6803 socket closed [254132.507521] libceph: osd1 10.214.132.12:6803 socket closed [254132.530810] libceph: osd1 10.214.132.12:6803 connect authorization failure [255029.816652] libceph: osd4 10.214.132.13:6803 socket closed [255029.839967] libceph: osd4 10.214.132.13:6803 connect authorization failure [255031.587655] libceph: osd1 10.214.132.12:6803 socket closed [255031.611062] libceph: osd1 10.214.132.12:6803 connect authorization failure [255928.922133] libceph: osd4 10.214.132.13:6803 socket closed [255930.938476] libceph: osd1 10.214.132.12:6803 socket closed [256827.694937] libceph: osd4 10.214.132.13:6803 socket closed [256827.718337] libceph: osd4 10.214.132.13:6803 connect authorization failure [256829.710442] libceph: osd1 10.214.132.12:6803 socket closed [256829.733738] libceph: osd1 10.214.132.12:6803 connect authorization failure [257726.868955] libceph: osd4 10.214.132.13:6803 socket closed [257728.761629] libceph: osd1 10.214.132.12:6803 socket closed [258625.638183] libceph: osd4 10.214.132.13:6803 socket closed [258625.661165] libceph: osd4 10.214.132.13:6803 connect authorization failure [258627.533556] libceph: osd1 10.214.132.12:6803 socket closed [258627.556778] libceph: osd1 10.214.132.12:6803 connect authorization failure [259524.716029] libceph: osd4 10.214.132.13:6803 socket closed [259526.704617] libceph: osd1 10.214.132.12:6803 socket closed [259526.727868] libceph: osd1 10.214.132.12:6803 connect authorization failure [260423.490645] libceph: osd4 10.214.132.13:6803 socket closed [260423.514092] libceph: osd4 10.214.132.13:6803 connect authorization failure [260426.265991] libceph: osd1 10.214.132.12:6803 socket closed [261323.448994] libceph: osd4 10.214.132.13:6803 socket closed [261325.035092] libceph: osd1 10.214.132.12:6803 socket closed [261325.058539] libceph: osd1 10.214.132.12:6803 connect authorization failure [262222.145504] libceph: osd4 10.214.132.13:6803 socket closed [262222.168860] libceph: osd4 10.214.132.13:6803 connect authorization failure [262224.134678] libceph: osd1 10.214.132.12:6803 socket closed [262224.158992] libceph: osd1 10.214.132.12:6803 connect authorization failure [263121.470566] libceph: osd4 10.214.132.13:6803 socket closed [263123.572666] libceph: osd1 10.214.132.12:6803 socket closed [264020.241985] libceph: osd4 10.214.132.13:6803 socket closed [264020.265299] libceph: osd4 10.214.132.13:6803 connect authorization failure [264022.271957] libceph: osd1 10.214.132.12:6803 socket closed [264022.295175] libceph: osd1 10.214.132.12:6803 connect authorization failure [264919.316845] libceph: osd4 10.214.132.13:6803 socket closed [264922.243590] libceph: osd1 10.214.132.12:6803 socket closed [265818.091136] libceph: osd4 10.214.132.13:6803 socket closed [265818.114437] libceph: osd4 10.214.132.13:6803 connect authorization failure [265820.955322] libceph: osd1 10.214.132.12:6803 socket closed [265820.978595] libceph: osd1 10.214.132.12:6803 connect authorization failure [266717.171049] libceph: osd4 10.214.132.13:6803 socket closed [266720.125106] libceph: osd1 10.214.132.12:6803 socket closed [266720.148977] libceph: osd1 10.214.132.12:6803 connect authorization failure [267615.901722] libceph: osd4 10.214.132.13:6803 socket closed [267615.925581] libceph: osd4 10.214.132.13:6803 connect authorization failure [267619.683623] libceph: osd1 10.214.132.12:6803 socket closed [267619.708236] libceph: osd1 10.214.132.12:6803 connect authorization failure [268515.231851] libceph: osd4 10.214.132.13:6803 socket closed [268519.215217] libceph: osd1 10.214.132.12:6803 socket closed ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-09_00:00:07-regression-next-testing-basic/7880$ cat config.yaml kernel: &id001 kdb: true sha1: f26d9cb6552416dad978ca1f84bdcf3a0166b75e nuke-on-error: true overrides: ceph: fs: btrfs log-whitelist: - slow request sha1: 97c1562dda2b0f3370bb186e7acbcd9c273210bc roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana65.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDTwfkF9asvpySXF/DOk10UkRDNtRwgGgLww/I/3E2r+JpsfYtW62TA1HMXjtB1g7SrcIolqCiiMd+5MIURIND94n76JiZ2o4DplLKIqUB6ys46gro7mwoeFnZNOuwdAA5bO4dfgeQ3yPtfIqpWTejkCB7ai/kG04C4ekz6EgplwtqWIfvXnij4fNaqvm3s/IxGhnO40DOGNwsAldEJo2fuJN8KHnYzsU/Dx5kJ85jQl2eQJI74VpMoh2Ge7+n9Q8rJhegfcHYPLJsX/Uyrf7Rtk1RfeTyZbIOSJIQDbQNepu278kvc9IEnFg3WfvWespfrUExVgnXq53xd1RIFcx6L ubuntu@plana66.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9QpKzRhLR+H4oCXhZtMxFDAC7E9cmtqcEPmQiW1ycxHNpk+Vy/uNNtlSh8Ljz7/R595beZe4JKgAkxZCCDmmWIEcm01bJgxVq0cbFLz+9dvyiirmr+RxbWqayu2VDC6uuiVAuxz5RdAw6+5Y/4gCrSdQKfQ8dUJZb4e/4Kz/TLr/+R5z+WqCGeutwb9QvK2anvhPJy+wa/JWHwtpTBjZVa5RFOkz9lfNDYayw3j1rylKk0d39J4VB/ch/qIBqUfxD7Hc4exu9sbG9bt5VKdomqdQvTjQXRw083+Nlj6RqJxYZfdJTWG/gYV3MXC4pwqP/ovcjoqZ9cvrsicdSezKp ubuntu@plana67.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDqiZsiT7h3fNR9yzwK2WToaotO4olIxmVdh+aSf3ILwEpHjFYbWXymL0C77hn0MdGRbaOzWOSMMng3MAHKy9xR3/CGNXXqO7iEK1fJOvSfmypkvJDyrMY/RuSvdifcXJyREvFsSK6cdmRpO235ODhfui4FC5BLmgv/VvasH/1Ur4ALfe7UE9L+cU4VeoJdl082oYeo1nn1beERgaypX67MXepG2NKbEY77jG5FXbGVpKWmsgIEWiiX8p6+afTOP+8cGsM3vsAG7nTJeFVKkEHc7A8cPkT4l/iXKjSiwWAtU5NV0QmRC/1ad78+xTOWNzJaTrIxoKuuGpB+DjdvrJgN tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: null - rbd.xfstests: null ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-09_00:00:07-regression-next-testing-basic/7880$ less teuthology.log ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-09_00:00:07-regression-next-testing-basic/7880$ grep role teuthology.log 2012-07-09T00:19:40.934 INFO:teuthology.task.internal:roles: ubuntu@plana66.front.sepia.ceph.com - ['mon.a', 'mon.c', 'osd.0', 'osd.1', 'osd.2'] 2012-07-09T00:19:40.934 INFO:teuthology.task.internal:roles: ubuntu@plana65.front.sepia.ceph.com - ['mon.b', 'mds.a', 'osd.3', 'osd.4', 'osd.5'] 2012-07-09T00:19:40.935 INFO:teuthology.task.internal:roles: ubuntu@plana67.front.sepia.ceph.com - ['client.0'] ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-09_00:00:07-regression-next-testing-basic/7880$ tail teuthology.log 2012-07-09T01:00:51.060 INFO:teuthology.orchestra.run.out:137 17s 2012-07-09T01:01:48.502 INFO:teuthology.orchestra.run.out:138 56s 2012-07-09T01:02:50.892 INFO:teuthology.orchestra.run.out:139 60s 2012-07-09T01:03:40.139 INFO:teuthology.orchestra.run.out:140 48s 2012-07-09T01:03:45.886 INFO:teuthology.orchestra.run.out:141 4s 2012-07-09T01:03:51.603 INFO:teuthology.orchestra.run.out:166 3s 2012-07-09T01:03:59.320 INFO:teuthology.orchestra.run.out:169 6s 2012-07-09T01:04:58.436 INFO:teuthology.orchestra.run.out:179 58s 2012-07-09T01:05:25.113 INFO:teuthology.orchestra.run.out:182 24s 2012-07-09T01:05:33.988 INFO:teuthology.orchestra.run.out:183 6s
Updated by Alex Elder over 11 years ago
I'm pretty sure the "socket closed" messages are fairly
harmless, and the cod that issues them should be changed
to reflect that.
I'm pretty sure the cause of them is in ceph_fault(),
first line of code:
pr_err("%s%lld %s %s\n", ENTITY_NAME(con->peer_name),
ceph_pr_addr(&con->peer_addr.in_addr), con->error_msg);
The con->error_msg value gets (or got) set to "socket closed" if
the connection's socket got closed while the connection itself was
in an open state.
This condition is worth reporting, but generally speaking it's
not very serious for a ceph connection, since a new socket
will be re-established in its place if necessary. So I think
that--independent of the cause of these socket closures--we
should change the code so it's not treated as an error but
maybe a warning. I.e., make it call pr_warning() rather than
pr_error().
At the time this bug was first filed, I'm pretty sure there were
a number of bugs that have since been fixed that may have led to
the socket closures, it's hard to know at this point.
(And the hang is a separate issue, probably not something we
can reproduce any more--at least we won't be sure a hang will
be due to the same cause.)
Updated by Alex Elder over 11 years ago
I have posted a fix for the "socket closed" messages, and it has
been reviewed and will fairly soon be pushed to the master branch.
I'll update this bug when that happens (I hope I don't forget...).
As I mentioned before, I don't think we can reproduce the hang (or
if we did, we couldn't reasonably conclude it's due to the same
problem), so once the "socket closed" messages problem is committed
I think we should just close this bug.
Updated by Alex Elder over 11 years ago
- Status changed from New to Closed
The fix for the warning messages is:
28362986f8743124b3a0fda20a8ed3e80309cce1
libceph: report connection fault with warning
I am closing this bug because this is fixed and I assert the
6-month-old hang is not reproducible.