Project

General

Profile

Actions

Bug #4855

closed

peek map assert

Added by Samuel Just almost 11 years ago. Updated almost 11 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From list:

Hey folks,

I'm helping put together a new test/experimental cluster, and hit this today when bringing the cluster up for the first time (using mkcephfs).

After doing the normal "service ceph -a start", I noticed one OSD was down, and a lot of PGs were stuck creating. I tried restarting the down OSD, but it would come up. It always had this error:

-1> 2013-04-27 18:11:56.179804 b6fcd000  2 osd.1 0 boot
0> 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::bufferlist*)' thread b6fcd000 time 2013-04-27 18:11:56.399089
osd/PG.cc: 2556: FAILED assert(values.size() == 1)
ceph version 0.60-401-g17a3859 (17a38593d60f5f29b9b66c13c0aaa759762c6d04)
1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x1ad) [0x2c3c0a]
2: (OSD::load_pgs()+0x357) [0x28cba0]
3: (OSD::init()+0x741) [0x290a16]
4: (main()+0x1427) [0x2155c0]
5: (__libc_start_main()+0x99) [0xb69bcf42]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I then did a full cluster restart, and now I have ten OSDs down -- each showing the same exception/failed assert.


Files

dmesg.txt (60 KB) dmesg.txt dmesg output for node Nigel Williams, 05/31/2013 03:55 PM
Actions #1

Updated by Samuel Just almost 11 years ago

  • Priority changed from Urgent to High
Actions #2

Updated by Samuel Just almost 11 years ago

  • Priority changed from High to Urgent
Actions #3

Updated by Samuel Just almost 11 years ago

  • Priority changed from Urgent to High
Actions #4

Updated by Samuel Just almost 11 years ago

  • Status changed from New to Can't reproduce
Actions #5

Updated by Nigel Williams almost 11 years ago

root@ceph2:/var/log/ceph# ceph -v
ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60)

Hit this reported bug: http://tracker.ceph.com/issues/4855

osd/PG.cc: 2676: FAILED assert(values.size() == 1)
Sequence of events:
Knocked power plug out of storage server (contains 4 x OSDs using XFS)
Cluster went into recovery
Re-powered and booted storage server, 3 of the 4 OSDs came back ok, 4th OSD needed XFS repair
Had to throw away XFS log to get OSD to mount, re-mounted, tried to start OSD, got this:
2013-05-31 17:32:07.788645 7f2c56900780  0 ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process ceph-osd, pid 1491
2013-05-31 17:32:07.788715 7f2c56900780 -1 * ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-10: (2) No such file or directory
2013-05-31 17:34:46.570916 7f50165ba780 0 ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process ceph-osd, pid 2639
2013-05-31 17:34:46.570986 7f50165ba780 -1 *
ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-10: (2) No such file or directory
2013-05-31 17:43:03.941892 7fbb531e9780 0 ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process ceph-osd, pid 2873
2013-05-31 17:43:04.137430 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount FIEMAP ioctl is supported and appears to work
2013-05-31 17:43:04.137503 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-05-31 17:43:04.138431 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount did NOT detect btrfs
2013-05-31 17:43:04.170762 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-05-31 17:43:04.171450 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount found snaps <>
2013-05-31 17:43:04.532579 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-05-31 17:43:04.852883 7fbb531e9780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2013-05-31 17:43:04.852998 7fbb531e9780 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 20: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-05-31 17:43:04.868170 7fbb531e9780 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 20: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-05-31 17:43:04.869359 7fbb531e9780 1 journal close /var/lib/ceph/osd/ceph-10/journal
2013-05-31 17:43:04.930436 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount FIEMAP ioctl is supported and appears to work
2013-05-31 17:43:04.930501 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-05-31 17:43:04.931433 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount did NOT detect btrfs
2013-05-31 17:43:04.996709 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-05-31 17:43:04.996874 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount found snaps <>
2013-05-31 17:43:05.065377 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-05-31 17:43:05.074489 7fbb531e9780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2013-05-31 17:43:05.074527 7fbb531e9780 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 28: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-05-31 17:43:05.074679 7fbb531e9780 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 28: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0
2013-05-31 17:43:05.211360 7fbb531e9780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::bufferlist*)' thread 7fbb531e9780 time 2013-05-31 17:43:05.177589
osd/PG.cc: 2676: FAILED assert(values.size() == 1)
ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60)
1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x4d7) [0x6b6627]
2: (OSD::load_pgs()+0x17f3) [0x63f803]
3: (OSD::init()+0xf9d) [0x641e4d]
4: (main()+0x2351) [0x574831]
5: (__libc_start_main()+0xed) [0x7fbb50e2676d]
6: /usr/bin/ceph-osd() [0x576edd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
--- begin dump of recent events ---
63> 2013-05-31 17:43:03.936433 7fbb531e9780 5 asok(0x2d43000) register_command perfcounters_dump hook 0x2d38010
-62> 2013-05-31 17:43:03.936522 7fbb531e9780 5 asok(0x2d43000) register_command 1 hook 0x2d38010
-61> 2013-05-31 17:43:03.936537 7fbb531e9780 5 asok(0x2d43000) register_command perf dump hook 0x2d38010
-60> 2013-05-31 17:43:03.936558 7fbb531e9780 5 asok(0x2d43000) register_command perfcounters_schema hook 0x2d38010
-59> 2013-05-31 17:43:03.936574 7fbb531e9780 5 asok(0x2d43000) register_command 2 hook 0x2d38010
-58> 2013-05-31 17:43:03.936586 7fbb531e9780 5 asok(0x2d43000) register_command perf schema hook 0x2d38010
-57> 2013-05-31 17:43:03.936599 7fbb531e9780 5 asok(0x2d43000) register_command config show hook 0x2d38010
-56> 2013-05-31 17:43:03.936612 7fbb531e9780 5 asok(0x2d43000) register_command config set hook 0x2d38010
-55> 2013-05-31 17:43:03.936627 7fbb531e9780 5 asok(0x2d43000) register_command log flush hook 0x2d38010
-54> 2013-05-31 17:43:03.936641 7fbb531e9780 5 asok(0x2d43000) register_command log dump hook 0x2d38010
-53> 2013-05-31 17:43:03.936655 7fbb531e9780 5 asok(0x2d43000) register_command log reopen hook 0x2d38010
-52> 2013-05-31 17:43:03.941892 7fbb531e9780 0 ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process ceph-o
sd, pid 2873
-51> 2013-05-31 17:43:03.958270 7fbb531e9780 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6810/2873 need_addr=1
-50> 2013-05-31 17:43:03.958353 7fbb531e9780 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6811/2873 need_addr=1
-49> 2013-05-31 17:43:03.958418 7fbb531e9780 1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6812/2873 need_addr=1
-48> 2013-05-31 17:43:03.959668 7fbb531e9780 1 finished global_init_daemonize
-47> 2013-05-31 17:43:03.966276 7fbb531e9780 5 asok(0x2d43000) init /var/run/ceph/ceph-osd.10.asok
-46> 2013-05-31 17:43:03.966348 7fbb531e9780 5 asok(0x2d43000) bind_and_listen /var/run/ceph/ceph-osd.10.asok
-45> 2013-05-31 17:43:03.966417 7fbb531e9780 5 asok(0x2d43000) register_command 0 hook 0x2d370b0
-44> 2013-05-31 17:43:03.966466 7fbb531e9780 5 asok(0x2d43000) register_command version hook 0x2d370b0
-43> 2013-05-31 17:43:03.966497 7fbb531e9780 5 asok(0x2d43000) register_command git_version hook 0x2d370b0
-42> 2013-05-31 17:43:03.966523 7fbb531e9780 5 asok(0x2d43000) register_command help hook 0x2d380d0
-41> 2013-05-31 17:43:03.966928 7fbb4ee70700 5 asok(0x2d43000) entry start
-40> 2013-05-31 17:43:04.137430 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount FIEMAP ioctl is supported and appears to work
-39> 2013-05-31 17:43:04.137503 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
-38> 2013-05-31 17:43:04.138431 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount did NOT detect btrfs
-37> 2013-05-31 17:43:04.170762 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount syncfs(2) syscall fully supported (by glibc and kernel)
-36> 2013-05-31 17:43:04.171450 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount found snaps <>
-35> 2013-05-31 17:43:04.532579 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode:
btrfs not detected
-34> 2013-05-31 17:43:04.852621 7fbb531e9780 2 journal open /var/lib/ceph/osd/ceph-10/journal fsid 75b4925e-219b-4f8a-972e-09
9fce91d856 fs_op_seq 1291294
-33> 2013-05-31 17:43:04.852883 7fbb531e9780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_
force_aio to force use of aio anyway
-32> 2013-05-31 17:43:04.852998 7fbb531e9780 1 journal open /var/lib/ceph/osd/ceph-10/journal fd 20: 1048576000 bytes, block
size 4096 bytes, directio = 1, aio = 0
-31> 2013-05-31 17:43:04.867490 7fbb531e9780 2 journal read_entry 590299136 : seq 1291294 1245 bytes
-30> 2013-05-31 17:43:04.868006 7fbb531e9780 2 journal No further valid entries found, journal is most likely valid
-29> 2013-05-31 17:43:04.868067 7fbb531e9780 2 journal No further valid entries found, journal is most likely valid
-28> 2013-05-31 17:43:04.868081 7fbb531e9780 3 journal journal_replay: end of journal, done.
-27> 2013-05-31 17:43:04.868170 7fbb531e9780 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 20: 1048576000 bytes, block
size 4096 bytes, directio = 1, aio = 0
-26> 2013-05-31 17:43:04.869050 7fbb4c66b700 1 FileStore::op_tp worker finish
-25> 2013-05-31 17:43:04.869145 7fbb4be6a700 1 FileStore::op_tp worker finish
-24> 2013-05-31 17:43:04.869359 7fbb531e9780 1 journal close /var/lib/ceph/osd/ceph-10/journal
-23> 2013-05-31 17:43:04.870835 7fbb531e9780 10 monclient(hunting): build_initial_monmap
-22> 2013-05-31 17:43:04.871016 7fbb531e9780 5 adding auth protocol: none
-21> 2013-05-31 17:43:04.871035 7fbb531e9780 5 adding auth protocol: none
-20> 2013-05-31 17:43:04.871279 7fbb531e9780 1 -
0.0.0.0:6810/2873 messenger.start
19> 2013-05-31 17:43:04.871358 7fbb531e9780 1 - :/0 messenger.start
18> 2013-05-31 17:43:04.871388 7fbb531e9780 1 - 0.0.0.0:6812/2873 messenger.start
17> 2013-05-31 17:43:04.871419 7fbb531e9780 1 - 0.0.0.0:6811/2873 messenger.start
-16> 2013-05-31 17:43:04.871756 7fbb531e9780 2 osd.10 0 mounting /var/lib/ceph/osd/ceph-10 /var/lib/ceph/osd/ceph-10/journal
-15> 2013-05-31 17:43:04.930436 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount FIEMAP ioctl is supported and appea
rs to work
-14> 2013-05-31 17:43:04.930501 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount FIEMAP ioctl is disabled via 'files
tore fiemap' config option
-13> 2013-05-31 17:43:04.931433 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount did NOT detect btrfs
-12> 2013-05-31 17:43:04.996709 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount syncfs(2) syscall fully supported (
by glibc and kernel)
-11> 2013-05-31 17:43:04.996874 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount found snaps <>
-10> 2013-05-31 17:43:05.065377 7fbb531e9780 0 filestore(/var/lib/ceph/osd/ceph-10) mount: enabling WRITEAHEAD journal mode:
btrfs not detected
-9> 2013-05-31 17:43:05.074420 7fbb531e9780 2 journal open /var/lib/ceph/osd/ceph-10/journal fsid 75b4925e-219b-4f8a-972e-09
9fce91d856 fs_op_seq 1291294
-8> 2013-05-31 17:43:05.074489 7fbb531e9780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal

force_aio to force use of aio anyway
-7> 2013-05-31 17:43:05.074527 7fbb531e9780 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 28: 1048576000 bytes, block
size 4096 bytes, directio = 1, aio = 0
-6> 2013-05-31 17:43:05.074594 7fbb531e9780 2 journal read_entry 590299136 : seq 1291294 1245 bytes
-5> 2013-05-31 17:43:05.074618 7fbb531e9780 2 journal No further valid entries found, journal is most likely valid
-4> 2013-05-31 17:43:05.074631 7fbb531e9780 2 journal No further valid entries found, journal is most likely valid
-3> 2013-05-31 17:43:05.074636 7fbb531e9780 3 journal journal_replay: end of journal, done.
-2> 2013-05-31 17:43:05.074679 7fbb531e9780 1 journal _open /var/lib/ceph/osd/ceph-10/journal fd 28: 1048576000 bytes, block
size 4096 bytes, directio = 1, aio = 0
-1> 2013-05-31 17:43:05.075197 7fbb531e9780 2 osd.10 0 boot
0> 2013-05-31 17:43:05.211360 7fbb531e9780 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t
, hobject_t&, ceph::bufferlist*)' thread 7fbb531e9780 time 2013-05-31 17:43:05.177589
osd/PG.cc: 2676: FAILED assert(values.size() == 1)
ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60)
1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x4d7) [0x6b6627]
2: (OSD::load_pgs()+0x17f3) [0x63f803]
3: (OSD::init()+0xf9d) [0x641e4d]
4: (main()+0x2351) [0x574831]
5: (__libc_start_main()+0xed) [0x7fbb50e2676d]
6: /usr/bin/ceph-osd() [0x576edd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
Actions

Also available in: Atom PDF