Project

General

Profile

Actions

Bug #2764

closed

xfstest hang; osd socket closed messages

Added by Sage Weil almost 12 years ago. Updated over 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
libceph
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description


[223564.772925] libceph: osd1 10.214.132.12:6803 socket closed
[223565.551727] libceph: osd4 10.214.132.13:6803 socket closed
[224463.542820] libceph: osd1 10.214.132.12:6803 socket closed
[224464.256620] libceph: osd4 10.214.132.13:6803 socket closed
[225362.312671] libceph: osd1 10.214.132.12:6803 socket closed
[225363.040916] libceph: osd4 10.214.132.13:6803 socket closed
[226261.082746] libceph: osd1 10.214.132.12:6803 socket closed
[226261.818194] libceph: osd4 10.214.132.13:6803 socket closed
[227159.849469] libceph: osd1 10.214.132.12:6803 socket closed
[227160.591953] libceph: osd4 10.214.132.13:6803 socket closed
[228058.616188] libceph: osd1 10.214.132.12:6803 socket closed
[228059.366723] libceph: osd4 10.214.132.13:6803 socket closed
[228957.385373] libceph: osd1 10.214.132.12:6803 socket closed
[228958.145662] libceph: osd4 10.214.132.13:6803 socket closed
[229856.156362] libceph: osd1 10.214.132.12:6803 socket closed
[229856.925149] libceph: osd4 10.214.132.13:6803 socket closed
[230754.927726] libceph: osd1 10.214.132.12:6803 socket closed
[230755.705742] libceph: osd4 10.214.132.13:6803 socket closed
[231653.698026] libceph: osd1 10.214.132.12:6803 socket closed
[231654.486141] libceph: osd4 10.214.132.13:6803 socket closed
[232552.458092] libceph: osd1 10.214.132.12:6803 socket closed
[232552.475194] libceph: osd1 10.214.132.12:6803 connect authorization failure
[232553.247748] libceph: osd4 10.214.132.13:6803 socket closed
[232553.265260] libceph: osd4 10.214.132.13:6803 connect authorization failure
[233452.098267] libceph: osd1 10.214.132.12:6803 socket closed
[233453.105762] libceph: osd4 10.214.132.13:6803 socket closed
[234350.858738] libceph: osd1 10.214.132.12:6803 socket closed
[234350.875846] libceph: osd1 10.214.132.12:6803 connect authorization failure
[234351.867907] libceph: osd4 10.214.132.13:6803 socket closed
[234351.885196] libceph: osd4 10.214.132.13:6803 connect authorization failure
[235250.857388] libceph: osd1 10.214.132.12:6803 socket closed
[235250.987315] libceph: osd4 10.214.132.13:6803 socket closed
[235251.004955] libceph: osd4 10.214.132.13:6803 connect authorization failure
[236149.618319] libceph: osd1 10.214.132.12:6803 socket closed
[236149.636781] libceph: osd1 10.214.132.12:6803 connect authorization failure
[236150.276918] libceph: osd4 10.214.132.13:6803 socket closed
[237048.639043] libceph: osd1 10.214.132.12:6803 socket closed
[237049.047246] libceph: osd4 10.214.132.13:6803 socket closed
[237049.066697] libceph: osd4 10.214.132.13:6803 connect authorization failure
[237947.408077] libceph: osd1 10.214.132.12:6803 socket closed
[237947.428231] libceph: osd1 10.214.132.12:6803 connect authorization failure
[237948.126775] libceph: osd4 10.214.132.13:6803 socket closed
[238846.736883] libceph: osd1 10.214.132.12:6803 socket closed
[238846.758182] libceph: osd1 10.214.132.12:6803 connect authorization failure
[238846.897194] libceph: osd4 10.214.132.13:6803 socket closed
[238846.919355] libceph: osd4 10.214.132.13:6803 connect authorization failure
[239745.976770] libceph: osd4 10.214.132.13:6803 socket closed
[239746.175294] libceph: osd1 10.214.132.12:6803 socket closed
[240644.745103] libceph: osd4 10.214.132.13:6803 socket closed
[240644.768313] libceph: osd4 10.214.132.13:6803 connect authorization failure
[240644.944420] libceph: osd1 10.214.132.12:6803 socket closed
[240644.968030] libceph: osd1 10.214.132.12:6803 connect authorization failure
[241544.061204] libceph: osd4 10.214.132.13:6803 socket closed
[241544.901360] libceph: osd1 10.214.132.12:6803 socket closed
[242442.729929] libceph: osd4 10.214.132.13:6803 socket closed
[242442.752860] libceph: osd4 10.214.132.13:6803 connect authorization failure
[242443.670571] libceph: osd1 10.214.132.12:6803 socket closed
[242443.694228] libceph: osd1 10.214.132.12:6803 connect authorization failure
[243341.938215] libceph: osd4 10.214.132.13:6803 socket closed
[243342.878488] libceph: osd1 10.214.132.12:6803 socket closed
[243342.901684] libceph: osd1 10.214.132.12:6803 connect authorization failure
[244240.700553] libceph: osd4 10.214.132.13:6803 socket closed
[244240.724281] libceph: osd4 10.214.132.13:6803 connect authorization failure
[244242.211031] libceph: osd1 10.214.132.12:6803 socket closed
[245140.688321] libceph: osd4 10.214.132.13:6803 socket closed
[245140.977989] libceph: osd1 10.214.132.12:6803 socket closed
[245141.001021] libceph: osd1 10.214.132.12:6803 connect authorization failure
[246039.444867] libceph: osd4 10.214.132.13:6803 socket closed
[246039.468376] libceph: osd4 10.214.132.13:6803 connect authorization failure
[246040.083501] libceph: osd1 10.214.132.12:6803 socket closed
[246040.107216] libceph: osd1 10.214.132.12:6803 connect authorization failure
[246938.680424] libceph: osd4 10.214.132.13:6803 socket closed
[246939.668001] libceph: osd1 10.214.132.12:6803 socket closed
[247837.374400] libceph: osd4 10.214.132.13:6803 socket closed
[247837.397713] libceph: osd4 10.214.132.13:6803 connect authorization failure
[247838.436308] libceph: osd1 10.214.132.12:6803 socket closed
[247838.459458] libceph: osd1 10.214.132.12:6803 connect authorization failure
[248736.555714] libceph: osd4 10.214.132.13:6803 socket closed
[248738.422406] libceph: osd1 10.214.132.12:6803 socket closed
[248738.446330] libceph: osd1 10.214.132.12:6803 connect authorization failure
[249635.323764] libceph: osd4 10.214.132.13:6803 socket closed
[249635.347633] libceph: osd4 10.214.132.13:6803 connect authorization failure
[249637.724051] libceph: osd1 10.214.132.12:6803 socket closed
[250534.432012] libceph: osd4 10.214.132.13:6803 socket closed
[250536.404949] libceph: osd1 10.214.132.12:6803 socket closed
[250536.428861] libceph: osd1 10.214.132.12:6803 connect authorization failure
[251433.201815] libceph: osd4 10.214.132.13:6803 socket closed
[251433.225705] libceph: osd4 10.214.132.13:6803 connect authorization failure
[251435.737624] libceph: osd1 10.214.132.12:6803 socket closed
[252333.099897] libceph: osd4 10.214.132.13:6803 socket closed
[252334.472220] libceph: osd1 10.214.132.12:6803 socket closed
[252334.495534] libceph: osd1 10.214.132.12:6803 connect authorization failure
[253231.869504] libceph: osd4 10.214.132.13:6803 socket closed
[253231.892891] libceph: osd4 10.214.132.13:6803 connect authorization failure
[253233.735924] libceph: osd1 10.214.132.12:6803 socket closed
[254131.068997] libceph: osd4 10.214.132.13:6803 socket closed
[254132.507521] libceph: osd1 10.214.132.12:6803 socket closed
[254132.530810] libceph: osd1 10.214.132.12:6803 connect authorization failure
[255029.816652] libceph: osd4 10.214.132.13:6803 socket closed
[255029.839967] libceph: osd4 10.214.132.13:6803 connect authorization failure
[255031.587655] libceph: osd1 10.214.132.12:6803 socket closed
[255031.611062] libceph: osd1 10.214.132.12:6803 connect authorization failure
[255928.922133] libceph: osd4 10.214.132.13:6803 socket closed
[255930.938476] libceph: osd1 10.214.132.12:6803 socket closed
[256827.694937] libceph: osd4 10.214.132.13:6803 socket closed
[256827.718337] libceph: osd4 10.214.132.13:6803 connect authorization failure
[256829.710442] libceph: osd1 10.214.132.12:6803 socket closed
[256829.733738] libceph: osd1 10.214.132.12:6803 connect authorization failure
[257726.868955] libceph: osd4 10.214.132.13:6803 socket closed
[257728.761629] libceph: osd1 10.214.132.12:6803 socket closed
[258625.638183] libceph: osd4 10.214.132.13:6803 socket closed
[258625.661165] libceph: osd4 10.214.132.13:6803 connect authorization failure
[258627.533556] libceph: osd1 10.214.132.12:6803 socket closed
[258627.556778] libceph: osd1 10.214.132.12:6803 connect authorization failure
[259524.716029] libceph: osd4 10.214.132.13:6803 socket closed
[259526.704617] libceph: osd1 10.214.132.12:6803 socket closed
[259526.727868] libceph: osd1 10.214.132.12:6803 connect authorization failure
[260423.490645] libceph: osd4 10.214.132.13:6803 socket closed
[260423.514092] libceph: osd4 10.214.132.13:6803 connect authorization failure
[260426.265991] libceph: osd1 10.214.132.12:6803 socket closed
[261323.448994] libceph: osd4 10.214.132.13:6803 socket closed
[261325.035092] libceph: osd1 10.214.132.12:6803 socket closed
[261325.058539] libceph: osd1 10.214.132.12:6803 connect authorization failure
[262222.145504] libceph: osd4 10.214.132.13:6803 socket closed
[262222.168860] libceph: osd4 10.214.132.13:6803 connect authorization failure
[262224.134678] libceph: osd1 10.214.132.12:6803 socket closed
[262224.158992] libceph: osd1 10.214.132.12:6803 connect authorization failure
[263121.470566] libceph: osd4 10.214.132.13:6803 socket closed
[263123.572666] libceph: osd1 10.214.132.12:6803 socket closed
[264020.241985] libceph: osd4 10.214.132.13:6803 socket closed
[264020.265299] libceph: osd4 10.214.132.13:6803 connect authorization failure
[264022.271957] libceph: osd1 10.214.132.12:6803 socket closed
[264022.295175] libceph: osd1 10.214.132.12:6803 connect authorization failure
[264919.316845] libceph: osd4 10.214.132.13:6803 socket closed
[264922.243590] libceph: osd1 10.214.132.12:6803 socket closed
[265818.091136] libceph: osd4 10.214.132.13:6803 socket closed
[265818.114437] libceph: osd4 10.214.132.13:6803 connect authorization failure
[265820.955322] libceph: osd1 10.214.132.12:6803 socket closed
[265820.978595] libceph: osd1 10.214.132.12:6803 connect authorization failure
[266717.171049] libceph: osd4 10.214.132.13:6803 socket closed
[266720.125106] libceph: osd1 10.214.132.12:6803 socket closed
[266720.148977] libceph: osd1 10.214.132.12:6803 connect authorization failure
[267615.901722] libceph: osd4 10.214.132.13:6803 socket closed
[267615.925581] libceph: osd4 10.214.132.13:6803 connect authorization failure
[267619.683623] libceph: osd1 10.214.132.12:6803 socket closed
[267619.708236] libceph: osd1 10.214.132.12:6803 connect authorization failure
[268515.231851] libceph: osd4 10.214.132.13:6803 socket closed
[268519.215217] libceph: osd1 10.214.132.12:6803 socket closed

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-09_00:00:07-regression-next-testing-basic/7880$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: f26d9cb6552416dad978ca1f84bdcf3a0166b75e
nuke-on-error: true
overrides:
  ceph:
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 97c1562dda2b0f3370bb186e7acbcd9c273210bc
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana65.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDTwfkF9asvpySXF/DOk10UkRDNtRwgGgLww/I/3E2r+JpsfYtW62TA1HMXjtB1g7SrcIolqCiiMd+5MIURIND94n76JiZ2o4DplLKIqUB6ys46gro7mwoeFnZNOuwdAA5bO4dfgeQ3yPtfIqpWTejkCB7ai/kG04C4ekz6EgplwtqWIfvXnij4fNaqvm3s/IxGhnO40DOGNwsAldEJo2fuJN8KHnYzsU/Dx5kJ85jQl2eQJI74VpMoh2Ge7+n9Q8rJhegfcHYPLJsX/Uyrf7Rtk1RfeTyZbIOSJIQDbQNepu278kvc9IEnFg3WfvWespfrUExVgnXq53xd1RIFcx6L
  ubuntu@plana66.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9QpKzRhLR+H4oCXhZtMxFDAC7E9cmtqcEPmQiW1ycxHNpk+Vy/uNNtlSh8Ljz7/R595beZe4JKgAkxZCCDmmWIEcm01bJgxVq0cbFLz+9dvyiirmr+RxbWqayu2VDC6uuiVAuxz5RdAw6+5Y/4gCrSdQKfQ8dUJZb4e/4Kz/TLr/+R5z+WqCGeutwb9QvK2anvhPJy+wa/JWHwtpTBjZVa5RFOkz9lfNDYayw3j1rylKk0d39J4VB/ch/qIBqUfxD7Hc4exu9sbG9bt5VKdomqdQvTjQXRw083+Nlj6RqJxYZfdJTWG/gYV3MXC4pwqP/ovcjoqZ9cvrsicdSezKp
  ubuntu@plana67.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDqiZsiT7h3fNR9yzwK2WToaotO4olIxmVdh+aSf3ILwEpHjFYbWXymL0C77hn0MdGRbaOzWOSMMng3MAHKy9xR3/CGNXXqO7iEK1fJOvSfmypkvJDyrMY/RuSvdifcXJyREvFsSK6cdmRpO235ODhfui4FC5BLmgv/VvasH/1Ur4ALfe7UE9L+cU4VeoJdl082oYeo1nn1beERgaypX67MXepG2NKbEY77jG5FXbGVpKWmsgIEWiiX8p6+afTOP+8cGsM3vsAG7nTJeFVKkEHc7A8cPkT4l/iXKjSiwWAtU5NV0QmRC/1ad78+xTOWNzJaTrIxoKuuGpB+DjdvrJgN
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- rbd.xfstests: null
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-09_00:00:07-regression-next-testing-basic/7880$ less teuthology.log 
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-09_00:00:07-regression-next-testing-basic/7880$ grep role teuthology.log 
2012-07-09T00:19:40.934 INFO:teuthology.task.internal:roles: ubuntu@plana66.front.sepia.ceph.com - ['mon.a', 'mon.c', 'osd.0', 'osd.1', 'osd.2']
2012-07-09T00:19:40.934 INFO:teuthology.task.internal:roles: ubuntu@plana65.front.sepia.ceph.com - ['mon.b', 'mds.a', 'osd.3', 'osd.4', 'osd.5']
2012-07-09T00:19:40.935 INFO:teuthology.task.internal:roles: ubuntu@plana67.front.sepia.ceph.com - ['client.0']

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2012-07-09_00:00:07-regression-next-testing-basic/7880$ tail teuthology.log 
2012-07-09T01:00:51.060 INFO:teuthology.orchestra.run.out:137    17s
2012-07-09T01:01:48.502 INFO:teuthology.orchestra.run.out:138    56s
2012-07-09T01:02:50.892 INFO:teuthology.orchestra.run.out:139    60s
2012-07-09T01:03:40.139 INFO:teuthology.orchestra.run.out:140    48s
2012-07-09T01:03:45.886 INFO:teuthology.orchestra.run.out:141    4s
2012-07-09T01:03:51.603 INFO:teuthology.orchestra.run.out:166    3s
2012-07-09T01:03:59.320 INFO:teuthology.orchestra.run.out:169    6s
2012-07-09T01:04:58.436 INFO:teuthology.orchestra.run.out:179    58s
2012-07-09T01:05:25.113 INFO:teuthology.orchestra.run.out:182    24s
2012-07-09T01:05:33.988 INFO:teuthology.orchestra.run.out:183    6s
Actions #1

Updated by Alex Elder over 11 years ago

I'm pretty sure the "socket closed" messages are fairly
harmless, and the cod that issues them should be changed
to reflect that.

I'm pretty sure the cause of them is in ceph_fault(),
first line of code:
pr_err("%s%lld %s %s\n", ENTITY_NAME(con->peer_name),
ceph_pr_addr(&con->peer_addr.in_addr), con->error_msg);

The con->error_msg value gets (or got) set to "socket closed" if
the connection's socket got closed while the connection itself was
in an open state.

This condition is worth reporting, but generally speaking it's
not very serious for a ceph connection, since a new socket
will be re-established in its place if necessary. So I think
that--independent of the cause of these socket closures--we
should change the code so it's not treated as an error but
maybe a warning. I.e., make it call pr_warning() rather than
pr_error().

At the time this bug was first filed, I'm pretty sure there were
a number of bugs that have since been fixed that may have led to
the socket closures, it's hard to know at this point.

(And the hang is a separate issue, probably not something we
can reproduce any more--at least we won't be sure a hang will
be due to the same cause.)

Actions #2

Updated by Alex Elder over 11 years ago

I have posted a fix for the "socket closed" messages, and it has
been reviewed and will fairly soon be pushed to the master branch.
I'll update this bug when that happens (I hope I don't forget...).

As I mentioned before, I don't think we can reproduce the hang (or
if we did, we couldn't reasonably conclude it's due to the same
problem), so once the "socket closed" messages problem is committed
I think we should just close this bug.

Actions #3

Updated by Alex Elder over 11 years ago

  • Status changed from New to Closed

The fix for the warning messages is:

28362986f8743124b3a0fda20a8ed3e80309cce1
libceph: report connection fault with warning

I am closing this bug because this is fixed and I assert the
6-month-old hang is not reproducible.

Actions

Also available in: Atom PDF