Actions
Bug #4069
closedOSD crashed and failed to start again
Status:
Resolved
Priority:
High
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Development
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I was using 0.56 and found that all the OSDs on one host is down. When I tried to restart I got nothing successful, but since was out of office was unable to investigate. Now I'm in, and upgraded to 0.56.2. When trying to start get next:
cat ceph-osd.2.log 2013-02-10 17:14:22.227945 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount FIEMAP ioctl is supported and appears to work 2013-02-10 17:14:22.227953 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-02-10 17:14:22.228122 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount did NOT detect btrfs 2013-02-10 17:14:22.228126 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount syncfs(2) syscall not supported 2013-02-10 17:14:22.228127 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount no syncfs(2), must use sync(2). 2013-02-10 17:14:22.228128 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount WARNING: multiple ceph-osd daemons on the same host will be slow 2013-02-10 17:14:22.228161 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount found snaps <> 2013-02-10 17:14:22.230501 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-02-10 17:14:22.236526 7fd50a6de780 1 journal _open /srv/ceph/osd2.journal/journal fd 22: 1043333120 bytes, block size 4096 bytes, directio = 0, aio = 0 2013-02-10 17:14:22.284953 7fd50a6de780 1 journal _open /srv/ceph/osd2.journal/journal fd 22: 1043333120 bytes, block size 4096 bytes, directio = 0, aio = 0 2013-02-10 17:14:22.534816 7fd50a6de780 1 journal close /srv/ceph/osd2.journal/journal 2013-02-10 17:14:22.537643 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount FIEMAP ioctl is supported and appears to work 2013-02-10 17:14:22.537649 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-02-10 17:14:22.537811 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount did NOT detect btrfs 2013-02-10 17:14:22.537814 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount syncfs(2) syscall not supported 2013-02-10 17:14:22.537815 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount no syncfs(2), must use sync(2). 2013-02-10 17:14:22.537816 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount WARNING: multiple ceph-osd daemons on the same host will be slow 2013-02-10 17:14:22.538080 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount found snaps <> 2013-02-10 17:14:22.539339 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-02-10 17:14:22.542178 7fd50a6de780 1 journal _open /srv/ceph/osd2.journal/journal fd 30: 1043333120 bytes, block size 4096 bytes, directio = 0, aio = 0 2013-02-10 17:14:22.550250 7fd50a6de780 1 journal _open /srv/ceph/osd2.journal/journal fd 30: 1043333120 bytes, block size 4096 bytes, directio = 0, aio = 0 2013-02-10 17:14:22.550568 7fd50a6de780 0 osd.2 6793 crush map has features 262144, adjusting msgr requires for clients 2013-02-10 17:14:22.550573 7fd50a6de780 0 osd.2 6793 crush map has features 262144, adjusting msgr requires for osds 2013-02-10 17:14:26.032345 7fd4f8e57700 -1 osd/OSD.cc: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fd4f8e57700 time 2013-02-10 17:14:26.031303 osd/OSD.cc: 4477: FAILED assert(_get_map_bl(epoch, bl)) ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: (OSDService::get_map(unsigned int)+0x918) [0x6059d8] 2: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x185) [0x61b3b5] 3: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x269) [0x62c269] 4: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x10) [0x659ab0] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x831a49] 6: (ThreadPool::WorkThread::entry()+0x10) [0x834250] 7: (()+0x68ca) [0x7fd509b7a8ca] 8: (clone()+0x6d) [0x7fd5086a9b6d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -199> 2013-02-10 17:14:22.221847 7fd50a6de780 5 asok(0x2a98000) register_command perfcounters_dump hook 0x2a8d010 -198> 2013-02-10 17:14:22.221864 7fd50a6de780 5 asok(0x2a98000) register_command 1 hook 0x2a8d010 -197> 2013-02-10 17:14:22.221867 7fd50a6de780 5 asok(0x2a98000) register_command perf dump hook 0x2a8d010 -196> 2013-02-10 17:14:22.221872 7fd50a6de780 5 asok(0x2a98000) register_command perfcounters_schema hook 0x2a8d010 -195> 2013-02-10 17:14:22.221875 7fd50a6de780 5 asok(0x2a98000) register_command 2 hook 0x2a8d010 -194> 2013-02-10 17:14:22.221876 7fd50a6de780 5 asok(0x2a98000) register_command perf schema hook 0x2a8d010 -193> 2013-02-10 17:14:22.221879 7fd50a6de780 5 asok(0x2a98000) register_command config show hook 0x2a8d010 -192> 2013-02-10 17:14:22.221881 7fd50a6de780 5 asok(0x2a98000) register_command config set hook 0x2a8d010 -191> 2013-02-10 17:14:22.221883 7fd50a6de780 5 asok(0x2a98000) register_command log flush hook 0x2a8d010 -190> 2013-02-10 17:14:22.221886 7fd50a6de780 5 asok(0x2a98000) register_command log dump hook 0x2a8d010 -189> 2013-02-10 17:14:22.221887 7fd50a6de780 5 asok(0x2a98000) register_command log reopen hook 0x2a8d010 -188> 2013-02-10 17:14:22.222967 7fd50a6de780 0 ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061), process ceph-osd, pid 17429 -187> 2013-02-10 17:14:22.223551 7fd50a6de780 1 -- 10.251.128.6:0/0 learned my addr 10.251.128.6:0/0 -186> 2013-02-10 17:14:22.223560 7fd50a6de780 1 accepter.accepter.bind my_inst.addr is 10.251.128.6:6800/17429 need_addr=0 -185> 2013-02-10 17:14:22.223567 7fd50a6de780 1 -- 10.251.128.6:0/0 learned my addr 10.251.128.6:0/0 -184> 2013-02-10 17:14:22.223570 7fd50a6de780 1 accepter.accepter.bind my_inst.addr is 10.251.128.6:6801/17429 need_addr=0 -183> 2013-02-10 17:14:22.223576 7fd50a6de780 1 -- 10.251.128.6:0/0 learned my addr 10.251.128.6:0/0 -182> 2013-02-10 17:14:22.223578 7fd50a6de780 1 accepter.accepter.bind my_inst.addr is 10.251.128.6:6802/17429 need_addr=0 -181> 2013-02-10 17:14:22.223849 7fd50a6de780 1 finished global_init_daemonize -180> 2013-02-10 17:14:22.225364 7fd50a6de780 5 asok(0x2a98000) init /var/run/ceph/ceph-osd.2.asok -179> 2013-02-10 17:14:22.225376 7fd50a6de780 5 asok(0x2a98000) bind_and_listen /var/run/ceph/ceph-osd.2.asok -178> 2013-02-10 17:14:22.225400 7fd50a6de780 5 asok(0x2a98000) register_command 0 hook 0x2a8c0b0 -177> 2013-02-10 17:14:22.225406 7fd50a6de780 5 asok(0x2a98000) register_command version hook 0x2a8c0b0 -176> 2013-02-10 17:14:22.225409 7fd50a6de780 5 asok(0x2a98000) register_command git_version hook 0x2a8c0b0 -175> 2013-02-10 17:14:22.225412 7fd50a6de780 5 asok(0x2a98000) register_command help hook 0x2a8d0c0 -174> 2013-02-10 17:14:22.225447 7fd506672700 5 asok(0x2a98000) entry start -173> 2013-02-10 17:14:22.227945 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount FIEMAP ioctl is supported and appears to work -172> 2013-02-10 17:14:22.227953 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option -171> 2013-02-10 17:14:22.228122 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount did NOT detect btrfs -170> 2013-02-10 17:14:22.228126 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount syncfs(2) syscall not supported -169> 2013-02-10 17:14:22.228127 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount no syncfs(2), must use sync(2). -168> 2013-02-10 17:14:22.228128 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount WARNING: multiple ceph-osd daemons on the same host will be slow -167> 2013-02-10 17:14:22.228161 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount found snaps <> -166> 2013-02-10 17:14:22.230501 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount: enabling WRITEAHEAD journal mode: btrfs not detected -165> 2013-02-10 17:14:22.236504 7fd50a6de780 2 journal open /srv/ceph/osd2.journal/journal fsid d3da7cd7-134f-4246-a1ee-da4c7820fe27 fs_op_seq 8086570 -164> 2013-02-10 17:14:22.236526 7fd50a6de780 1 journal _open /srv/ceph/osd2.journal/journal fd 22: 1043333120 bytes, block size 4096 bytes, directio = 0, aio = 0 -163> 2013-02-10 17:14:22.239662 7fd50a6de780 2 journal read_entry 4096 : seq 8086568 3596962 bytes -162> 2013-02-10 17:14:22.242023 7fd50a6de780 2 journal read_entry 3604480 : seq 8086569 3619288 bytes -161> 2013-02-10 17:14:22.244057 7fd50a6de780 2 journal read_entry 7225344 : seq 8086570 3188127 bytes -160> 2013-02-10 17:14:22.245349 7fd50a6de780 2 journal read_entry 10416128 : seq 8086571 2003739 bytes -159> 2013-02-10 17:14:22.246644 7fd50a6de780 2 journal read_entry 10416128 : seq 8086571 2003739 bytes -158> 2013-02-10 17:14:22.246648 7fd50a6de780 3 journal journal_replay: applying op seq 8086571 -157> 2013-02-10 17:14:22.284927 7fd50a6de780 3 journal journal_replay: r = 0, op_seq now 8086571 -156> 2013-02-10 17:14:22.284938 7fd50a6de780 2 journal read_entry 12423168 : bad header magic, end of journal -155> 2013-02-10 17:14:22.284941 7fd50a6de780 3 journal journal_replay: end of journal, done. -154> 2013-02-10 17:14:22.284953 7fd50a6de780 1 journal _open /srv/ceph/osd2.journal/journal fd 22: 1043333120 bytes, block size 4096 bytes, directio = 0, aio = 0 -153> 2013-02-10 17:14:22.534647 7fd503e6d700 1 FileStore::op_tp worker finish -152> 2013-02-10 17:14:22.534694 7fd50366c700 1 FileStore::op_tp worker finish -151> 2013-02-10 17:14:22.534816 7fd50a6de780 1 journal close /srv/ceph/osd2.journal/journal -150> 2013-02-10 17:14:22.535214 7fd50a6de780 10 monclient(hunting): build_initial_monmap -149> 2013-02-10 17:14:22.535270 7fd50a6de780 5 adding auth protocol: cephx -148> 2013-02-10 17:14:22.535274 7fd50a6de780 5 adding auth protocol: cephx -147> 2013-02-10 17:14:22.535339 7fd50a6de780 1 -- 10.251.128.6:6800/17429 messenger.start -146> 2013-02-10 17:14:22.535363 7fd50a6de780 1 -- :/0 messenger.start -145> 2013-02-10 17:14:22.535368 7fd50a6de780 1 -- 10.251.128.6:6802/17429 messenger.start -144> 2013-02-10 17:14:22.535375 7fd50a6de780 1 -- 10.251.128.6:6801/17429 messenger.start -143> 2013-02-10 17:14:22.535431 7fd50a6de780 2 osd.2 0 mounting /srv/ceph/osd2 /srv/ceph/osd2.journal/journal -142> 2013-02-10 17:14:22.537643 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount FIEMAP ioctl is supported and appears to work -141> 2013-02-10 17:14:22.537649 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option -140> 2013-02-10 17:14:22.537811 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount did NOT detect btrfs -139> 2013-02-10 17:14:22.537814 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount syncfs(2) syscall not supported -138> 2013-02-10 17:14:22.537815 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount no syncfs(2), must use sync(2). -137> 2013-02-10 17:14:22.537816 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount WARNING: multiple ceph-osd daemons on the same host will be slow -136> 2013-02-10 17:14:22.538080 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount found snaps <> -135> 2013-02-10 17:14:22.539339 7fd50a6de780 0 filestore(/srv/ceph/osd2) mount: enabling WRITEAHEAD journal mode: btrfs not detected -134> 2013-02-10 17:14:22.542164 7fd50a6de780 2 journal open /srv/ceph/osd2.journal/journal fsid d3da7cd7-134f-4246-a1ee-da4c7820fe27 fs_op_seq 8086571 -133> 2013-02-10 17:14:22.542178 7fd50a6de780 1 journal _open /srv/ceph/osd2.journal/journal fd 30: 1043333120 bytes, block size 4096 bytes, directio = 0, aio = 0 -132> 2013-02-10 17:14:22.544610 7fd50a6de780 2 journal read_entry 4096 : seq 8086568 3596962 bytes -131> 2013-02-10 17:14:22.546910 7fd50a6de780 2 journal read_entry 3604480 : seq 8086569 3619288 bytes -130> 2013-02-10 17:14:22.548943 7fd50a6de780 2 journal read_entry 7225344 : seq 8086570 3188127 bytes -129> 2013-02-10 17:14:22.550228 7fd50a6de780 2 journal read_entry 10416128 : seq 8086571 2003739 bytes -128> 2013-02-10 17:14:22.550234 7fd50a6de780 2 journal read_entry 12423168 : bad header magic, end of journal -127> 2013-02-10 17:14:22.550236 7fd50a6de780 2 journal read_entry 12423168 : bad header magic, end of journal -126> 2013-02-10 17:14:22.550237 7fd50a6de780 3 journal journal_replay: end of journal, done. -125> 2013-02-10 17:14:22.550250 7fd50a6de780 1 journal _open /srv/ceph/osd2.journal/journal fd 30: 1043333120 bytes, block size 4096 bytes, directio = 0, aio = 0 -124> 2013-02-10 17:14:22.550362 7fd50a6de780 2 osd.2 0 boot -123> 2013-02-10 17:14:22.550568 7fd50a6de780 0 osd.2 6793 crush map has features 262144, adjusting msgr requires for clients -122> 2013-02-10 17:14:22.550573 7fd50a6de780 0 osd.2 6793 crush map has features 262144, adjusting msgr requires for osds -121> 2013-02-10 17:14:22.997282 7fd50a6de780 2 osd.2 6793 superblock: i am osd.2 -120> 2013-02-10 17:14:22.997323 7fd50a6de780 1 accepter.accepter.start -119> 2013-02-10 17:14:22.997349 7fd50a6de780 1 accepter.accepter.start -118> 2013-02-10 17:14:22.997375 7fd50a6de780 1 accepter.accepter.start -117> 2013-02-10 17:14:22.997390 7fd50a6de780 10 monclient(hunting): init -116> 2013-02-10 17:14:22.997395 7fd50a6de780 5 adding auth protocol: cephx -115> 2013-02-10 17:14:22.997396 7fd50a6de780 10 monclient(hunting): auth_supported 2 method cephx -114> 2013-02-10 17:14:22.997506 7fd50a6de780 2 auth: KeyRing::load: loaded key file /etc/ceph/keyring.osd.2 -113> 2013-02-10 17:14:22.997621 7fd50a6de780 5 asok(0x2a98000) register_command dump_ops_in_flight hook 0xc265bc0 -112> 2013-02-10 17:14:22.997627 7fd50a6de780 5 asok(0x2a98000) register_command dump_historic_ops hook 0xc265bc0 -111> 2013-02-10 17:14:22.997629 7fd50a6de780 5 asok(0x2a98000) register_command dump_op_pq_state hook 0xc265bc0 -110> 2013-02-10 17:14:22.997842 7fd50a6de780 10 monclient(hunting): _reopen_session -109> 2013-02-10 17:14:22.997857 7fd4f5650700 5 osd.2 6793 heartbeat: osd_stat(175 GB used, 289 GB avail, 464 GB total, peers []/[]) -108> 2013-02-10 17:14:22.997874 7fd50a6de780 10 monclient(hunting): _pick_new_mon picked mon.1 con 0x2b0a9a0 addr 10.252.0.4:6789/0 -107> 2013-02-10 17:14:22.997885 7fd50a6de780 10 monclient(hunting): _send_mon_message to mon.1 at 10.252.0.4:6789/0 -106> 2013-02-10 17:14:22.997887 7fd50a6de780 1 -- 10.251.128.6:6800/17429 --> 10.252.0.4:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x2b03c00 con 0x2b0a9a0 -105> 2013-02-10 17:14:22.997894 7fd50a6de780 10 monclient(hunting): renew_subs -104> 2013-02-10 17:14:22.998576 7fd4fd660700 10 monclient(hunting): renew_subs -103> 2013-02-10 17:14:25.997601 7fd4f9e59700 10 monclient(hunting): tick -102> 2013-02-10 17:14:25.997621 7fd4f9e59700 1 monclient(hunting): continuing hunt -101> 2013-02-10 17:14:25.997627 7fd4f9e59700 10 monclient(hunting): _reopen_session -100> 2013-02-10 17:14:25.997640 7fd4f9e59700 1 -- 10.251.128.6:6800/17429 mark_down 0x2b0a9a0 -- 0x2ad7500 -99> 2013-02-10 17:14:25.997689 7fd4f3e4d700 2 -- 10.251.128.6:6800/17429 >> 10.252.0.4:6789/0 pipe(0x2ad7500 sd=31 :56582 s=4 pgs=18594 cs=1 l=1).reader couldn't read tag, Success -98> 2013-02-10 17:14:25.997716 7fd4f9e59700 10 monclient(hunting): _pick_new_mon picked mon.0 con 0x2b0ac60 addr 10.252.0.3:6789/0 -97> 2013-02-10 17:14:25.997722 7fd4f3e4d700 2 -- 10.251.128.6:6800/17429 >> 10.252.0.4:6789/0 pipe(0x2ad7500 sd=31 :56582 s=4 pgs=18594 cs=1 l=1).fault 0: Success -96> 2013-02-10 17:14:25.997730 7fd4f9e59700 10 monclient(hunting): _send_mon_message to mon.0 at 10.252.0.3:6789/0 -95> 2013-02-10 17:14:25.997733 7fd4f9e59700 1 -- 10.251.128.6:6800/17429 --> 10.252.0.3:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0xc329000 con 0x2b0ac60 -94> 2013-02-10 17:14:25.997742 7fd4f9e59700 10 monclient(hunting): renew_subs -93> 2013-02-10 17:14:25.998853 7fd4fd660700 10 monclient(hunting): renew_subs -92> 2013-02-10 17:14:25.999424 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 1 ==== mon_map v1 ==== 473+0+0 (2573709153 0 0) 0xc329400 con 0x2b0ac60 -91> 2013-02-10 17:14:25.999445 7fd4fd660700 10 monclient(hunting): handle_monmap mon_map v1 -90> 2013-02-10 17:14:25.999457 7fd4fd660700 10 monclient(hunting): got monmap 1, mon.0 is now rank 1 -89> 2013-02-10 17:14:25.999459 7fd4fd660700 10 monclient(hunting): dump: epoch 1 fsid e74c9e0b-40ce-45e4-904a-676cbaaf042d last_changed 2013-01-17 19:06:13.835967 created 2013-01-17 19:06:13.835967 0: 10.252.0.2:6789/0 mon.2 1: 10.252.0.3:6789/0 mon.0 2: 10.252.0.4:6789/0 mon.1 -88> 2013-02-10 17:14:25.999485 7fd4fd660700 1 monclient(hunting): found mon.0 -87> 2013-02-10 17:14:25.999491 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 2 ==== auth_reply(proto 2 0 Success) v1 ==== 33+0+0 (1069541921 0 0) 0xc329200 con 0x2b0ac60 -86> 2013-02-10 17:14:25.999503 7fd4fd660700 10 monclient: my global_id is 10369 -85> 2013-02-10 17:14:25.999598 7fd4fd660700 10 monclient: _send_mon_message to mon.0 at 10.252.0.3:6789/0 -84> 2013-02-10 17:14:25.999602 7fd4fd660700 1 -- 10.251.128.6:6800/17429 --> 10.252.0.3:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0xc329400 con 0x2b0ac60 -83> 2013-02-10 17:14:26.000703 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 3 ==== auth_reply(proto 2 0 Success) v1 ==== 206+0+0 (3401751066 0 0) 0xc329600 con 0x2b0ac60 -82> 2013-02-10 17:14:26.000768 7fd4fd660700 10 monclient: _send_mon_message to mon.0 at 10.252.0.3:6789/0 -81> 2013-02-10 17:14:26.000773 7fd4fd660700 1 -- 10.251.128.6:6800/17429 --> 10.252.0.3:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0xc329200 con 0x2b0ac60 -80> 2013-02-10 17:14:26.001956 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 4 ==== auth_reply(proto 2 0 Success) v1 ==== 393+0+0 (2420265323 0 0) 0xc329400 con 0x2b0ac60 -79> 2013-02-10 17:14:26.002007 7fd4fd660700 10 monclient: _send_mon_message to mon.0 at 10.252.0.3:6789/0 -78> 2013-02-10 17:14:26.002012 7fd4fd660700 1 -- 10.251.128.6:6800/17429 --> 10.252.0.3:6789/0 -- mon_subscribe({monmap=0+,osd_pg_creates=0}) v2 -- ?+0 0x2abae00 con 0x2b0ac60 -77> 2013-02-10 17:14:26.002026 7fd4fd660700 10 monclient: _send_mon_message to mon.0 at 10.252.0.3:6789/0 -76> 2013-02-10 17:14:26.002030 7fd4fd660700 1 -- 10.251.128.6:6800/17429 --> 10.252.0.3:6789/0 -- mon_subscribe({monmap=0+,osd_pg_creates=0}) v2 -- ?+0 0xc32c1c0 con 0x2b0ac60 -75> 2013-02-10 17:14:26.002042 7fd4fd660700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2013-02-10 17:13:56.002041) -74> 2013-02-10 17:14:26.002055 7fd4fd660700 10 monclient: _send_mon_message to mon.0 at 10.252.0.3:6789/0 -73> 2013-02-10 17:14:26.002057 7fd4fd660700 1 -- 10.251.128.6:6800/17429 --> 10.252.0.3:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- ?+0 0xc329600 con 0x2b0ac60 -72> 2013-02-10 17:14:26.002091 7fd50a6de780 5 monclient: authenticate success, global_id 10369 -71> 2013-02-10 17:14:26.002101 7fd50a6de780 10 monclient: wait_auth_rotating waiting (until 2013-02-10 17:14:56.002100) -70> 2013-02-10 17:14:26.002634 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 5 ==== mon_map v1 ==== 473+0+0 (2573709153 0 0) 0xc329e00 con 0x2b0ac60 -69> 2013-02-10 17:14:26.002649 7fd4fd660700 10 monclient: handle_monmap mon_map v1 -68> 2013-02-10 17:14:26.002659 7fd4fd660700 10 monclient: got monmap 1, mon.0 is now rank 1 -67> 2013-02-10 17:14:26.002662 7fd4fd660700 10 monclient: dump: epoch 1 fsid e74c9e0b-40ce-45e4-904a-676cbaaf042d last_changed 2013-01-17 19:06:13.835967 created 2013-01-17 19:06:13.835967 0: 10.252.0.2:6789/0 mon.2 1: 10.252.0.3:6789/0 mon.0 2: 10.252.0.4:6789/0 mon.1 -66> 2013-02-10 17:14:26.002689 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2023876600 0 0) 0xc32c1c0 con 0x2b0ac60 -65> 2013-02-10 17:14:26.002698 7fd4fd660700 10 monclient: handle_subscribe_ack sent 2013-02-10 17:14:22.997895 renew after 2013-02-10 17:16:52.997895 -64> 2013-02-10 17:14:26.002711 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 7 ==== mon_map v1 ==== 473+0+0 (2573709153 0 0) 0xc329a00 con 0x2b0ac60 -63> 2013-02-10 17:14:26.002716 7fd4fd660700 10 monclient: handle_monmap mon_map v1 -62> 2013-02-10 17:14:26.002721 7fd4fd660700 10 monclient: got monmap 1, mon.0 is now rank 1 -61> 2013-02-10 17:14:26.002723 7fd4fd660700 10 monclient: dump: epoch 1 fsid e74c9e0b-40ce-45e4-904a-676cbaaf042d last_changed 2013-01-17 19:06:13.835967 created 2013-01-17 19:06:13.835967 0: 10.252.0.2:6789/0 mon.2 1: 10.252.0.3:6789/0 mon.0 2: 10.252.0.4:6789/0 mon.1 -60> 2013-02-10 17:14:26.002734 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (2023876600 0 0) 0x2abae00 con 0x2b0ac60 -59> 2013-02-10 17:14:26.002738 7fd4fd660700 10 monclient: handle_subscribe_ack sent 0.000000, ignoring -58> 2013-02-10 17:14:26.002991 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 9 ==== auth_reply(proto 2 0 Success) v1 ==== 194+0+0 (2726131633 0 0) 0xc329800 con 0x2b0ac60 -57> 2013-02-10 17:14:26.003033 7fd4fd660700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2013-02-10 17:13:56.003033) -56> 2013-02-10 17:14:26.003074 7fd50a6de780 10 monclient: wait_auth_rotating done -55> 2013-02-10 17:14:26.003080 7fd50a6de780 10 monclient: get_version osdmap -54> 2013-02-10 17:14:26.003082 7fd50a6de780 10 monclient: _send_mon_message to mon.0 at 10.252.0.3:6789/0 -53> 2013-02-10 17:14:26.003086 7fd50a6de780 1 -- 10.251.128.6:6800/17429 --> 10.252.0.3:6789/0 -- mon_get_version(what=osdmap handle=1) v1 -- ?+0 0x2abac40 con 0x2b0ac60 -52> 2013-02-10 17:14:26.003591 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 10 ==== mon_check_map_ack(handle=1 version=6964) v2 ==== 24+0+0 (402810245 0 0) 0xc32c8c0 con 0x2b0ac60 -51> 2013-02-10 17:14:26.003657 7fd4f9658700 10 monclient: renew_subs -50> 2013-02-10 17:14:26.003664 7fd4f9658700 10 monclient: _send_mon_message to mon.0 at 10.252.0.3:6789/0 -49> 2013-02-10 17:14:26.003670 7fd4f9658700 1 -- 10.251.128.6:6800/17429 --> 10.252.0.3:6789/0 -- mon_subscribe({monmap=2+,osd_pg_creates=0,osdmap=6793}) v2 -- ?+0 0xc32ca80 con 0x2b0ac60 -48> 2013-02-10 17:14:26.010079 7fd4fd660700 1 -- 10.251.128.6:6800/17429 <== mon.1 10.252.0.3:6789/0 11 ==== osd_map(6793..6893 src has 6464..6964) v3 ==== 101611+0+0 (3841590153 0 0) 0xc329e00 con 0x2b0ac60 -47> 2013-02-10 17:14:26.010098 7fd4fd660700 3 osd.2 6793 handle_osd_map epochs [6793,6893], i have 6793, src has [6464,6964] -46> 2013-02-10 17:14:26.030444 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6816/1928 -- pipe dne -45> 2013-02-10 17:14:26.030452 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6807/13510 -- pipe dne -44> 2013-02-10 17:14:26.030455 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6801/2026 -- pipe dne -43> 2013-02-10 17:14:26.030457 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6819/2124 -- pipe dne -42> 2013-02-10 17:14:26.030459 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6805/14176 -- pipe dne -41> 2013-02-10 17:14:26.030461 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6811/14453 -- pipe dne -40> 2013-02-10 17:14:26.030463 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6803/14710 -- pipe dne -39> 2013-02-10 17:14:26.030465 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6808/2222 -- pipe dne -38> 2013-02-10 17:14:26.030467 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6801/13792 -- pipe dne -37> 2013-02-10 17:14:26.030469 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6813/2320 -- pipe dne -36> 2013-02-10 17:14:26.030497 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6811/1928 -- pipe dne -35> 2013-02-10 17:14:26.030500 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6815/13510 -- pipe dne -34> 2013-02-10 17:14:26.030503 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6807/2026 -- pipe dne -33> 2013-02-10 17:14:26.030505 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6822/2124 -- pipe dne -32> 2013-02-10 17:14:26.030507 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6808/14176 -- pipe dne -31> 2013-02-10 17:14:26.030509 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6801/14453 -- pipe dne -30> 2013-02-10 17:14:26.030511 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6807/14710 -- pipe dne -29> 2013-02-10 17:14:26.030513 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6817/2222 -- pipe dne -28> 2013-02-10 17:14:26.030517 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6800/13792 -- pipe dne -27> 2013-02-10 17:14:26.030519 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6802/2320 -- pipe dne -26> 2013-02-10 17:14:26.030555 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6813/1928 -- pipe dne -25> 2013-02-10 17:14:26.030558 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6811/2026 -- pipe dne -24> 2013-02-10 17:14:26.030560 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6801/2124 -- pipe dne -23> 2013-02-10 17:14:26.030562 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6802/14710 -- pipe dne -22> 2013-02-10 17:14:26.030564 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6819/2222 -- pipe dne -21> 2013-02-10 17:14:26.030566 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6808/2320 -- pipe dne -20> 2013-02-10 17:14:26.030583 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6807/1928 -- pipe dne -19> 2013-02-10 17:14:26.030586 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6810/13510 -- pipe dne -18> 2013-02-10 17:14:26.030589 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6802/2026 -- pipe dne -17> 2013-02-10 17:14:26.030592 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6816/2124 -- pipe dne -16> 2013-02-10 17:14:26.030594 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6819/14176 -- pipe dne -15> 2013-02-10 17:14:26.030596 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6814/14453 -- pipe dne -14> 2013-02-10 17:14:26.030598 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6800/14710 -- pipe dne -13> 2013-02-10 17:14:26.030601 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6823/2222 -- pipe dne -12> 2013-02-10 17:14:26.030604 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6805/13792 -- pipe dne -11> 2013-02-10 17:14:26.030606 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6821/2320 -- pipe dne -10> 2013-02-10 17:14:26.030643 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6808/1928 -- pipe dne -9> 2013-02-10 17:14:26.030647 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6800/13510 -- pipe dne -8> 2013-02-10 17:14:26.030650 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6816/2026 -- pipe dne -7> 2013-02-10 17:14:26.030652 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6807/2124 -- pipe dne -6> 2013-02-10 17:14:26.030654 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6810/14176 -- pipe dne -5> 2013-02-10 17:14:26.030656 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6802/14453 -- pipe dne -4> 2013-02-10 17:14:26.030658 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6804/14710 -- pipe dne -3> 2013-02-10 17:14:26.030660 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6813/2222 -- pipe dne -2> 2013-02-10 17:14:26.030662 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.251.0.5:6812/13792 -- pipe dne -1> 2013-02-10 17:14:26.030664 7fd4fd660700 1 -- 10.251.128.6:6801/17429 mark_down 10.252.0.7:6801/2320 -- pipe dne 0> 2013-02-10 17:14:26.032345 7fd4f8e57700 -1 osd/OSD.cc: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fd4f8e57700 time 2013-02-10 17:14:26.031303 osd/OSD.cc: 4477: FAILED assert(_get_map_bl(epoch, bl)) ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: (OSDService::get_map(unsigned int)+0x918) [0x6059d8] 2: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x185) [0x61b3b5] 3: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x269) [0x62c269] 4: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x10) [0x659ab0] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x831a49] 6: (ThreadPool::WorkThread::entry()+0x10) [0x834250] 7: (()+0x68ca) [0x7fd509b7a8ca] 8: (clone()+0x6d) [0x7fd5086a9b6d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 100000 max_new 1000 log_file /var/log/ceph/ceph-osd.2.log --- end dump of recent events --- 2013-02-10 17:14:26.034483 7fd4f8e57700 -1 *** Caught signal (Aborted) ** in thread 7fd4f8e57700 ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: /usr/bin/ceph-osd() [0x7a1b99] 2: (()+0xeff0) [0x7fd509b82ff0] 3: (gsignal()+0x35) [0x7fd50860c1b5] 4: (abort()+0x180) [0x7fd50860efc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd508ea0dc5] 6: (()+0xcb166) [0x7fd508e9f166] 7: (()+0xcb193) [0x7fd508e9f193] 8: (()+0xcb28e) [0x7fd508e9f28e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x7f93c9] 10: (OSDService::get_map(unsigned int)+0x918) [0x6059d8] 11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x185) [0x61b3b5] 12: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x269) [0x62c269] 13: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x10) [0x659ab0] 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x831a49] 15: (ThreadPool::WorkThread::entry()+0x10) [0x834250] 16: (()+0x68ca) [0x7fd509b7a8ca] 17: (clone()+0x6d) [0x7fd5086a9b6d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2013-02-10 17:14:26.034483 7fd4f8e57700 -1 *** Caught signal (Aborted) ** in thread 7fd4f8e57700 ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) 1: /usr/bin/ceph-osd() [0x7a1b99] 2: (()+0xeff0) [0x7fd509b82ff0] 3: (gsignal()+0x35) [0x7fd50860c1b5] 4: (abort()+0x180) [0x7fd50860efc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fd508ea0dc5] 6: (()+0xcb166) [0x7fd508e9f166] 7: (()+0xcb193) [0x7fd508e9f193] 8: (()+0xcb28e) [0x7fd508e9f28e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x7f93c9] 10: (OSDService::get_map(unsigned int)+0x918) [0x6059d8] 11: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x185) [0x61b3b5] 12: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x269) [0x62c269] 13: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x10) [0x659ab0] 14: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x831a49] 15: (ThreadPool::WorkThread::entry()+0x10) [0x834250] 16: (()+0x68ca) [0x7fd509b7a8ca] 17: (clone()+0x6d) [0x7fd5086a9b6d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 100000 max_new 1000 log_file /var/log/ceph/ceph-osd.2.log --- end dump of recent events ---
Updated by Ivan Kudryavtsev about 11 years ago
Removed and formatted again to XFS and started well but a lot of data should be replicated from other OSDs.
Updated by Sage Weil about 11 years ago
- Status changed from New to Resolved
This issue was fixed in 3293b31b44c9adad2b5e37da9d5342a6e4b72ade, released in v0.56.2. There were some other critical issues with v0.56, so I suggest upgraded all of your other nodes as well!
Actions