Project

General

Profile

Actions

Bug #12696

closed

ceph-osd starts before network, fails

Added by huanwen ren over 8 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Our environment at startup, found that part of the OSD boot failure, log print is as follows:
OSD log:

11 17:32:15.443843 7fe538fad880  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 7898
2015-08-11 17:32:15.447847 7fe538fad880 -1 unable to find any IP address in networks: 111.111.111.0/24

message log:

Aug 11 17:32:15 ceph2 avahi-daemon[1779]: Registering new address record for fe80::92e2:baff:fe57:ec05 on enp7s0f1.*.
Aug 11 17:32:15 ceph2 systemd[1]: Starting /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 56 --pid-file /var/run/ceph/osd.56.pid -c /etc/ceph/ceph.conf --cluster ceph -f...
Aug 11 17:32:15 ceph2 systemd[1]: Started /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 56 --pid-file /var/run/ceph/osd.56.pid -c /etc/ceph/ceph.conf --cluster ceph -f.
Aug 11 17:32:15 ceph2 ceph[2350]: WARNING:ceph-disk:mount_activate: activate /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.0ebfabba-ae88-4ec8-befa-af12354da21e
Aug 11 17:32:15 ceph2 ceph[2350]: WARNING:ceph-disk:Mounting /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.0ebfabba-ae88-4ec8-befa-af12354da21e on /var/lib/ceph/tmp/mnt.G_UXjH with options rw,noexec,nodev,noatime,nodiratime,nobarrier
Aug 11 17:32:15 ceph2 bash[7894]: 2015-08-11 17:32:15.447847 7fe538fad880 -1 unable to find any IP address in networks: 111.111.111.0/24
Aug 11 17:32:15 ceph2 systemd[1]: run-7893.service: main process exited, code=exited, status=1/FAILURE
Aug 11 17:32:15 ceph2 ceph[2350]: WARNING:ceph-disk:clean_mpoint: begin cleaning up mpoint for osd.51
Aug 11 17:32:15 ceph2 ceph[2350]: WARNING:ceph-disk:clean_mpoint: disk path has no changed
Aug 11 17:32:15 ceph2 ceph[2350]: WARNING:ceph-disk:clean mount point finished
Aug 11 17:32:15 ceph2 ceph[2350]: WARNING:ceph-disk:ceph osd.51 already mounted in position; unmounting ours.
Aug 11 17:32:15 ceph2 ceph[2350]: WARNING:ceph-disk:Unmounting /var/lib/ceph/tmp/mnt.G_UXjH
Aug 11 17:32:15 ceph2 ceph[2350]: WARNING:ceph-disk:Starting ceph osd.51...
Aug 11 17:32:15 ceph2 ceph[2350]: === osd.51 ===
Aug 11 17:32:15 ceph2 avahi-daemon[1779]: Joining mDNS multicast group on interface enp7s0f1.IPv4 with address 111.111.111.242.
Aug 11 17:32:15 ceph2 avahi-daemon[1779]: New relevant interface enp7s0f1.IPv4 for mDNS.
Aug 11 17:32:15 ceph2 avahi-daemon[1779]: Registering new address record for 111.111.111.242 on enp7s0f1.IPv4.
Aug 11 17:32:16 ceph2 network[6084]: Bringing up interface enp7s0f1:  [  OK  ]

Problem cause analysis?
If the server node starts, cluster network configuration and OSD process simultaneously, so will appear above the print information

The solution?
Retry in function “fill_in_one_address” find cluster network


Files

messages_error (553 KB) messages_error huanwen ren, 08/28/2015 06:09 AM
Actions #2

Updated by Sage Weil over 8 years ago

  • Subject changed from fill_in_one_address's not retry on pick_address.cc to ceph-osd starts before network, fails
  • Status changed from New to Need More Info
  • Assignee set to Sage Weil
  • Priority changed from Normal to Urgent
  • Source changed from other to Community (user)
Actions #3

Updated by Sage Weil over 8 years ago

- what OS/distro is this?

- is the ceph service enabled? the final start should have done an activate-all which would have caught this case. Do you have the full startup log so we can see what happens the second time osd.51 is started?

Actions #4

Updated by Samuel Just over 8 years ago

  • Status changed from Need More Info to Can't reproduce
Actions #5

Updated by huanwen ren over 8 years ago

Sage Weil wrote:

- what OS/distro is this?

- is the ceph service enabled? the final start should have done an activate-all which would have caught this case. Do you have the full startup log so we can see what happens the second time osd.51 is started?

sorry, I'm later
my OS's info:

Kenerl Version: Linux version 3.10.0-123.el7.x86_64
Red Hat Enterprise Linux Server release 7.0 (Maipo)

Actions #6

Updated by huanwen ren over 8 years ago

Full startup log:
Please see attached file

Actions

Also available in: Atom PDF