Bug #2691
closedosd/ReplicatedPG.cc: 5888: FAILED assert(latest->is_update())
0%
Description
-1> 2012-07-02 14:19:01.529985 7fba53ac2700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_primary(int)' thread 7fba53ac2700 time 2012-07-02 14:19:01.528553 osd/ReplicatedPG.cc: 5888: FAILED assert(latest->is_update()) ceph version 0.47.3-547-g2472034 (commit:2472034c4fcd53575734ac8ac8e877687c9ca910) 1: (ReplicatedPG::recover_primary(int)+0x4d3) [0x553093] 2: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x7b) [0x55ef4b] 3: (OSD::do_recovery(PG*)+0x464) [0x5cb644] 4: (ThreadPool::worker()+0x605) [0x7a08d5] 5: (ThreadPool::WorkThread::entry()+0xd) [0x5e09dd] 6: (()+0x7e9a) [0x7fba65139e9a] 7: (clone()+0x6d) [0x7fba636ee4bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2012-07-02 14:19:01.530957 7fba54ac4700 0 log [INF] : 2.6 scrub ok ubuntu@teuthology:/a/sage-2012-07-02_11:21:38-regression-next-testing-basic/4596$ cat config.yaml kernel: &id001 branch: testing kdb: true nuke-on-error: true overrides: ceph: branch: next conf: client: rbd cache: true fs: btrfs log-whitelist: - slow request roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana50.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDVJ+lkgUdkr27WFzrmwSQU22m+pFIiqzhfcO4Hinu8A8uyP4FIephrEcq4Rrt4hp14Syb1pxXisV6UKwAZKikDoD1Wl0LSro4TzOs6HuMEhfvzdnISvyzE3f2w0cj1zE61rHFYfPNF14b9fkE3wBf2Vb4i6ReaN2/Yd12J/xO52tJH1lPxgsFoAIRMjdQMbfVwPU6kK9SY4ngt9iLjge6gZ0O9Jwe2vrgD6+LNoMY9qvNjgRvQdCTi85OQwitU0ZMZdGC0cQ/oNbKd+yW92rW9Wu6dcyKSisesRcm7lbtS6X2uUup+u3vWze7coT+Py3TdNW6nGpIg4muyvqHfSinz ubuntu@plana67.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDqiZsiT7h3fNR9yzwK2WToaotO4olIxmVdh+aSf3ILwEpHjFYbWXymL0C77hn0MdGRbaOzWOSMMng3MAHKy9xR3/CGNXXqO7iEK1fJOvSfmypkvJDyrMY/RuSvdifcXJyREvFsSK6cdmRpO235ODhfui4FC5BLmgv/VvasH/1Ur4ALfe7UE9L+cU4VeoJdl082oYeo1nn1beERgaypX67MXepG2NKbEY77jG5FXbGVpKWmsgIEWiiX8p6+afTOP+8cGsM3vsAG7nTJeFVKkEHc7A8cPkT4l/iXKjSiwWAtU5NV0QmRC/1ad78+xTOWNzJaTrIxoKuuGpB+DjdvrJgN ubuntu@plana69.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDE4lTHELBl2d+BXuaxMKtJX4r3GN2qSaxHTfewZxutqEC+rSNbD5Otiqwm16GtmCYklbYJL0yr6mizCZ0KzL3lVwNSczb/CxfvLTLegmBv9YzPRvoL1ymxdHggPIw47JTNRg6+UO9EConQpp7LOktT7fYf3AFx/BS9Ux6UeKyxifkb/1GUlzVT2rk1D39frBhI7AcPHovvbd/6uHvamrpUS3wSDWdz6BNRTQDaK307TJS9LPva6Cj6WAgmTZZesjWB4mOpWywrA9sva7GqSrnTMGKnIHPcsI6U57CALwHS1g/zIz73QuolZA2NkbhA7jjrfsUaWYJorE9dynIga9pf tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down or wrong addr - objects unfound and apparently lost - thrashosds: null - rbd_fsx: clients: - client.0 ops: 20000
Files
Updated by Sage Weil almost 12 years ago
took down osd.2 and osd.3 with same crash. coredumps are on the hosts..
Updated by Tamilarasi muthamizhan over 11 years ago
- Status changed from 12 to In Progress
Recent log: ubuntu@teuthology:/a/teuthology-2012-08-20_00:00:04-regression-next-testing-basic/4822
-10> 2012-08-20 02:05:18.505644 7fe225181700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::re cover_primary(int)' thread 7fe225181700 time 2012-08-20 02:05:18.499420 osd/ReplicatedPG.cc: 6004: FAILED assert(latest->is_update()) ceph version 0.50-109-gda210be (commit:da210bee091705fc488cc7c839c32d46280c1719) 1: (ReplicatedPG::recover_primary(int)+0x4f1) [0x56bac1] 2: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x7b) [0x58b8db] 3: (OSD::do_recovery(PG*)+0x361) [0x5e13a1] 4: (OSD::RecoveryWQ::_process(PG*)+0x15) [0x615835] 5: (ThreadPool::worker()+0x523) [0x7e3533] 6: (ThreadPool::WorkThread::entry()+0xd) [0x5f90fd] 7: (()+0x7e9a) [0x7fe2367f8e9a] 8: (clone()+0x6d) [0x7fe234dad4bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2012-08-20 02:05:18.727855 7fe225181700 -1 *** Caught signal (Aborted) ** in thread 7fe225181700 ceph version 0.50-109-gda210be (commit:da210bee091705fc488cc7c839c32d46280c1719) 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x72afa1] 2: (()+0xfcb0) [0x7fe236800cb0] 3: (gsignal()+0x35) [0x7fe234cf1445] 4: (abort()+0x17b) [0x7fe234cf4bab] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe23563f69d] 6: (()+0xb5846) [0x7fe23563d846] 7: (()+0xb5873) [0x7fe23563d873] 8: (()+0xb596e) [0x7fe23563d96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ea) [0x7ed1da] 10: (ReplicatedPG::recover_primary(int)+0x4f1) [0x56bac1] 11: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x7b) [0x58b8db] 12: (OSD::do_recovery(PG*)+0x361) [0x5e13a1] 13: (OSD::RecoveryWQ::_process(PG*)+0x15) [0x615835] 14: (ThreadPool::worker()+0x523) [0x7e3533] 15: (ThreadPool::WorkThread::entry()+0xd) [0x5f90fd] 16: (()+0x7e9a) [0x7fe2367f8e9a] 17: (clone()+0x6d) [0x7fe234dad4bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2012-08-20 02:05:18.727855 7fe225181700 -1 *** Caught signal (Aborted) ** in thread 7fe225181700 ceph version 0.50-109-gda210be (commit:da210bee091705fc488cc7c839c32d46280c1719) 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x72afa1] 2: (()+0xfcb0) [0x7fe236800cb0] 3: (gsignal()+0x35) [0x7fe234cf1445] 4: (abort()+0x17b) [0x7fe234cf4bab] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe23563f69d] 6: (()+0xb5846) [0x7fe23563d846] 7: (()+0xb5873) [0x7fe23563d873] 8: (()+0xb596e) [0x7fe23563d96e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ea) [0x7ed1da] 10: (ReplicatedPG::recover_primary(int)+0x4f1) [0x56bac1] 11: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x7b) [0x58b8db] 12: (OSD::do_recovery(PG*)+0x361) [0x5e13a1] 13: (OSD::RecoveryWQ::_process(PG*)+0x15) [0x615835] 14: (ThreadPool::worker()+0x523) [0x7e3533] 15: (ThreadPool::WorkThread::entry()+0xd) [0x5f90fd] 16: (()+0x7e9a) [0x7fe2367f8e9a] 17: (clone()+0x6d) [0x7fe234dad4bd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. ubuntu@teuthology:/a/teuthology-2012-08-20_00:00:04-regression-next-testing-basic/4822$ cat config.yaml kernel: &id001 kdb: true sha1: 1fe5e9932156f6122c3b1ff6ba7541c27c86718c nuke-on-error: true overrides: ceph: conf: client: rbd cache: false fs: ext4 log-whitelist: - slow request sha1: da210bee091705fc488cc7c839c32d46280c1719 workunit: sha1: da210bee091705fc488cc7c839c32d46280c1719 roles: - - mon.a - osd.0 - osd.1 - osd.2 - - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana02.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCtjMpSkaJhFqFtpo5AEe3KHygR+ueaWU+gYrrRzPa8YvmR0TCapw0kz77y1Fjcfh8rkTapnevpaYgQSMrMs0Yc34kF5XtNRuQXkpTwrhS8isZJBeNSc1W5XeKjj4KB/UuzBywJq0h/0KbH1DrMy72cGISOzdiP9CMA5KUvJo0m31wv1+MPcPn/5AhZgoWPStfaZdb4TaJUrNLrws0oRXa0yQbUa6WmUBsYhHsw4K1ukJAcJwVjcgAAv1N+GnyuWLVs+pvknBO3Whv1RhjY6EDGjun1MDPw+OE3wJsJX7BRr8eZv2Avi7pRlseWeWJwgsHMJ/j0yhf+SCy1+oSPrD2b ubuntu@plana62.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC+nI5/l38Kdw2W/qbEKrVMcnVdIxJG7hNnD7nnS3+Zx/uPiWrds26ZPrM5IY7D8Mf7sjBzUYbqsX9xGYMLLTQaeDwsZn/7RjjSg8zOS1aMP5F/AJzSQx4Nt37eLUsRHX3yA30/OQcl6sBgDjHyhSPcSuHWSnMmoy4pkDo3xpQMQMtxDG8gWq+to1hZwJbsiK9FdutEgPJg3inWM1WVc5L6NmRN2WQNEGT8HvtlBCWqX6/H/hLujQlbgyJAbeG4BriMV3gCIccJE833f/fN9KIzaMlD7qHTgWcaGk+LY84nUdNlTkNoX+L4m6WRY8/Pt9om2dOocsXyCwYLIS4heIDT ubuntu@plana63.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDDy2BKPe+fe5jK0ziU8aKM0DzODSTaKWecQRwLjnZLbjDvTyHm8x8xX/JCts3bFfrc2ozFz7ILBIWU96JRZiFF2TtFZjtf1H19kyvR8PWCxiZ/lld+C7B6U8iiPSNiSlgo7mwkpk1JoSpHe4rK/Z7WQRWBMsCC7XJETu6rRX3i0ZYaKh8BoWWhpsBs1quSNxRXNUqJ6OKnDbB5Vuan1TK9b49RXmibx+oapXm8V0sHEVLYa+NTUs+wAEHAnjFgRe75Cik/rmgeE0m2Cff1rp9tFhEEDwZ5PUdnscOTY78BxImMRdkbZ8lJXOGcOOsD3Dj1jOr4pVrgxZqUdtWfJGkj tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - rbd_fsx: clients: - client.0 ops: 20000
Updated by Tamilarasi muthamizhan over 11 years ago
Recent logs: ubuntu@teuthology:/a/teuthology-2012-09-11_02:00:03-regression-testing-testing-basic/20743
Updated by Tamilarasi muthamizhan over 11 years ago
ubuntu@teuthology:/a/teuthology-2012-09-12_02:00:04-regression-testing-testing-basic/21369
Updated by Tamilarasi muthamizhan over 11 years ago
Recent logs: ubuntu@teuthology:/a/teuthology-2012-09-13_04:00:05-regression-stable-master-basic/22002
Updated by Samuel Just over 11 years ago
- Status changed from In Progress to Resolved
Updated by Samuel Just over 11 years ago
- Status changed from Resolved to 12
- Priority changed from Urgent to Normal
- Target version deleted (
v0.52a) - Backport set to argonaut
This has shown up once in argonaut, probably not worth backporting unless it becomes more of a problem?
Updated by Tamilarasi muthamizhan over 11 years ago
for reference, ubuntu@teuthology:/a/teuthology-2013-01-10_07:00:03-regression-argonaut-master-basic/38145
Updated by Artem Grinblat about 11 years ago
- File ceph.tar.xz ceph.tar.xz added
I might have a similar assetion here on bobtail (ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)):
2013-02-15 03:46:57.148126 7f5c25686700 0 log [ERR] : 3.35 osd.0: soid 7da92cf5/LHN1Xt3V8IMebWbOB1ZRapRnDn4nMb3/head//3 size 2119242 != known size 0, digest 1367424093 != known d 2013-02-15 03:46:57.148190 7f5c25686700 0 log [ERR] : repair 3.35 7da92cf5 /LHN1Xt3V8IMebWbOB1ZRapRnDn4nMb3/head//3 on disk size (0) does not match object info size (2119242) 2013-02-15 03:49:05.151320 7f5c25686700 0 log [ERR] : 3.35 repair stat mismatch, got 51056/51056 objects, 0/0 clones, 7195419192/7197538434 bytes. 2013-02-15 03:49:05.151417 7f5c25686700 0 log [ERR] : 3.35 repair 0 missing, 1 inconsistent objects 2013-02-15 03:49:05.151450 7f5c25686700 0 log [ERR] : 3.35 repair 3 errors, 2 fixed 2013-02-15 03:49:05.156288 7f5c25e87700 0 log [ERR] : 3.35 missing primary copy of 7da92cf5/LHN1Xt3V8IMebWbOB1ZRapRnDn4nMb3/head//3, unfound 2013-02-15 03:49:05.389117 7f5c25e87700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_primary(int)' thread 7f5c25e87700 time 2013-02-15 03:49:05.311197 osd/ReplicatedPG.cc: 6537: FAILED assert(latest->is_update()) ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5) 1: (ReplicatedPG::recover_primary(int)+0x5a1) [0x670571] 2: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x112) [0x672e92] 3: (OSD::do_recovery(PG*)+0x277) [0x6e48b7] 4: (OSD::RecoveryWQ::_process(PG*)+0xd) [0x721b1d] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0x992) [0x8e9ce2] 6: (ThreadPool::WorkThread::entry()+0x10) [0x8eac70] 7: (()+0x6b50) [0x7f5c38fc6b50] 8: (clone()+0x6d) [0x7f5c37752a7d]
I've been doing "ceph pg repair 3.35" and removed the inconsistent object with "rados -p ... rm LHN1Xt3V8IMebWbOB1ZRapRnDn4nMb3".