Bug #36606
closedosd: checksum failure during upgrade test
0%
Description
2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xda1180d2, expected 0x3e37199, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xda1180d2, expected 0x3e37199, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xda1180d2, expected 0x3e37199, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xda1180d2, expected 0x3e37199, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) fsck error: #-1:7b3f43c4:::osd_superblock:0# error during read: (5) Input/output error
From: /ceph/teuthology-archive/pdonnell-2018-10-27_02:23:12-fs-wip-pdonnell-testing-20181026.235100-distro-basic-smithi/3189896/teuthology.log
Recent run that did not fail: /ceph/teuthology-archive/pdonnell-2018-10-17_19:40:05-fs-wip-pdonnell-testing-20181017.175152-distro-basic-smithi/3152440/teuthology.log
Branch: https://github.com/ceph/ceph-ci/tree/wip-pdonnell-testing-20181026.235100
Updated by Patrick Donnelly over 5 years ago
- Project changed from CephFS to RADOS
- Component(RADOS) OSD added
Updated by Patrick Donnelly over 5 years ago
Note that this could be caused by a recent merge into luminous.
Updated by Patrick Donnelly over 5 years ago
Does not affect filestore. Only upgrade tests (fs:upgrade) with bluestore (replicated or EC).
Updated by Igor Fedotov over 5 years ago
I think mkfs doesn't run properly for bluestore since --no-mon-config param isn't recognized for unknown reason):
2018-10-27T10:17:32.370 INFO:teuthology.orchestra.run.smithi184:Running: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --no-mon-config --cluster ceph --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/ceph.monmap' 2018-10-27T10:17:32.483 INFO:teuthology.orchestra.run.smithi184.stdout:usage: ceph-osd -i <ID> [flags] 2018-10-27T10:17:32.484 INFO:teuthology.orchestra.run.smithi184.stdout: --osd-data PATH data directory 2018-10-27T10:17:32.484 INFO:teuthology.orchestra.run.smithi184.stdout: --osd-journal PATH 2018-10-27T10:17:32.484 INFO:teuthology.orchestra.run.smithi184.stdout: journal file or block device 2018-10-27T10:17:32.484 INFO:teuthology.orchestra.run.smithi184.stdout: --mkfs create a [new] data directory 2018-10-27T10:17:32.484 INFO:teuthology.orchestra.run.smithi184.stdout: --mkkey generate a new secret key. This is normally used in combination with --mkfs 2018-10-27T10:17:32.484 INFO:teuthology.orchestra.run.smithi184.stdout: --convert-filestore 2018-10-27T10:17:32.485 INFO:teuthology.orchestra.run.smithi184.stdout: run any pending upgrade operations 2018-10-27T10:17:32.485 INFO:teuthology.orchestra.run.smithi184.stdout: --flush-journal flush all data out of journal 2018-10-27T10:17:32.485 INFO:teuthology.orchestra.run.smithi184.stdout: --mkjournal initialize a new journal 2018-10-27T10:17:32.485 INFO:teuthology.orchestra.run.smithi184.stdout: --check-wants-journal 2018-10-27T10:17:32.485 INFO:teuthology.orchestra.run.smithi184.stdout: check whether a journal is desired 2018-10-27T10:17:32.485 INFO:teuthology.orchestra.run.smithi184.stdout: --check-allows-journal 2018-10-27T10:17:32.485 INFO:teuthology.orchestra.run.smithi184.stdout: check whether a journal is allowed 2018-10-27T10:17:32.486 INFO:teuthology.orchestra.run.smithi184.stdout: --check-needs-journal 2018-10-27T10:17:32.486 INFO:teuthology.orchestra.run.smithi184.stdout: check whether a journal is required 2018-10-27T10:17:32.486 INFO:teuthology.orchestra.run.smithi184.stdout: --debug_osd <N> set debug level (e.g. 10) 2018-10-27T10:17:32.486 INFO:teuthology.orchestra.run.smithi184.stdout: --get-device-fsid PATH 2018-10-27T10:17:32.486 INFO:teuthology.orchestra.run.smithi184.stdout: get OSD fsid for the given block device 2018-10-27T10:17:32.486 INFO:teuthology.orchestra.run.smithi184.stdout: 2018-10-27T10:17:32.486 INFO:teuthology.orchestra.run.smithi184.stdout: --conf/-c FILE read configuration from the given configuration file 2018-10-27T10:17:32.487 INFO:teuthology.orchestra.run.smithi184.stdout: --id/-i ID set ID portion of my name 2018-10-27T10:17:32.487 INFO:teuthology.orchestra.run.smithi184.stdout: --name/-n TYPE.ID set name 2018-10-27T10:17:32.487 INFO:teuthology.orchestra.run.smithi184.stdout: --cluster NAME set cluster name (default: ceph) 2018-10-27T10:17:32.487 INFO:teuthology.orchestra.run.smithi184.stdout: --setuser USER set uid to user or uid (and gid to user's gid) 2018-10-27T10:17:32.487 INFO:teuthology.orchestra.run.smithi184.stdout: --setgroup GROUP set gid to group or gid 2018-10-27T10:17:32.487 INFO:teuthology.orchestra.run.smithi184.stdout: --version show version and quit 2018-10-27T10:17:32.487 INFO:teuthology.orchestra.run.smithi184.stdout: 2018-10-27T10:17:32.488 INFO:teuthology.orchestra.run.smithi184.stdout: -d run in foreground, log to stderr 2018-10-27T10:17:32.488 INFO:teuthology.orchestra.run.smithi184.stdout: -f run in foreground, log to usual location 2018-10-27T10:17:32.488 INFO:teuthology.orchestra.run.smithi184.stdout: 2018-10-27T10:17:32.488 INFO:teuthology.orchestra.run.smithi184.stdout: --debug_ms N set message debug level (e.g. 1) 2018-10-27T10:17:32.488 INFO:teuthology.orchestra.run.smithi184.stderr:2018-10-27 10:17:32.448617 7fe5e5076e00 -1 unrecognized arg --no-mon-config
Updated by Patrick Donnelly over 5 years ago
Igor Fedotov wrote:
I think mkfs doesn't run properly for bluestore since --no-mon-config param isn't recognized for unknown reason):
[...]
From looking at `git log -p`, it seems that option is deliberately not in luminous. So perhaps some recent QA change is incompatible with luminous and must be fixed.
Updated by Neha Ojha over 5 years ago
- Status changed from New to 12
/a/nojha-2018-10-29_19:19:04-fs:upgrade-master-distro-basic-smithi/3201377/
Updated by Sage Weil over 5 years ago
The --no-mon-config splats or normal.. qa/tasks/ceph.py tries first with --no-mon-config and, if it fails, does the mkfs command again without it. the log shows the second time succeeded.
Updated by Neha Ojha over 5 years ago
Yes, the mkfs suceeds. That part of the logs is also present in the successful run of this test.
2018-10-18T14:24:22.754 INFO:teuthology.orchestra.run.smithi176:Running: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --no-mon-config --cluster ceph --mkfs --mkkey -i 0 --monmap /home/ubuntu/cephtest/ceph.monmap' 2018-10-18T14:24:22.892 INFO:teuthology.orchestra.run.smithi176.stdout:usage: ceph-osd -i <ID> [flags] 2018-10-18T14:24:22.892 INFO:teuthology.orchestra.run.smithi176.stdout: --osd-data PATH data directory 2018-10-18T14:24:22.893 INFO:teuthology.orchestra.run.smithi176.stdout: --osd-journal PATH 2018-10-18T14:24:22.893 INFO:teuthology.orchestra.run.smithi176.stdout: journal file or block device 2018-10-18T14:24:22.893 INFO:teuthology.orchestra.run.smithi176.stdout: --mkfs create a [new] data directory 2018-10-18T14:24:22.893 INFO:teuthology.orchestra.run.smithi176.stdout: --mkkey generate a new secret key. This is normally used in combination with --mkfs 2018-10-18T14:24:22.893 INFO:teuthology.orchestra.run.smithi176.stdout: --convert-filestore 2018-10-18T14:24:22.893 INFO:teuthology.orchestra.run.smithi176.stdout: run any pending upgrade operations 2018-10-18T14:24:22.893 INFO:teuthology.orchestra.run.smithi176.stdout: --flush-journal flush all data out of journal 2018-10-18T14:24:22.894 INFO:teuthology.orchestra.run.smithi176.stdout: --mkjournal initialize a new journal 2018-10-18T14:24:22.894 INFO:teuthology.orchestra.run.smithi176.stdout: --check-wants-journal 2018-10-18T14:24:22.894 INFO:teuthology.orchestra.run.smithi176.stdout: check whether a journal is desired 2018-10-18T14:24:22.894 INFO:teuthology.orchestra.run.smithi176.stdout: --check-allows-journal 2018-10-18T14:24:22.894 INFO:teuthology.orchestra.run.smithi176.stdout: check whether a journal is allowed 2018-10-18T14:24:22.894 INFO:teuthology.orchestra.run.smithi176.stdout: --check-needs-journal 2018-10-18T14:24:22.894 INFO:teuthology.orchestra.run.smithi176.stdout: check whether a journal is required 2018-10-18T14:24:22.895 INFO:teuthology.orchestra.run.smithi176.stdout: --debug_osd <N> set debug level (e.g. 10) 2018-10-18T14:24:22.895 INFO:teuthology.orchestra.run.smithi176.stdout: --get-device-fsid PATH 2018-10-18T14:24:22.895 INFO:teuthology.orchestra.run.smithi176.stdout: get OSD fsid for the given block device 2018-10-18T14:24:22.895 INFO:teuthology.orchestra.run.smithi176.stdout: 2018-10-18T14:24:22.895 INFO:teuthology.orchestra.run.smithi176.stdout: --conf/-c FILE read configuration from the given configuration file 2018-10-18T14:24:22.895 INFO:teuthology.orchestra.run.smithi176.stdout: --id/-i ID set ID portion of my name 2018-10-18T14:24:22.896 INFO:teuthology.orchestra.run.smithi176.stdout: --name/-n TYPE.ID set name 2018-10-18T14:24:22.896 INFO:teuthology.orchestra.run.smithi176.stdout: --cluster NAME set cluster name (default: ceph) 2018-10-18T14:24:22.896 INFO:teuthology.orchestra.run.smithi176.stdout: --setuser USER set uid to user or uid (and gid to user's gid) 2018-10-18T14:24:22.896 INFO:teuthology.orchestra.run.smithi176.stdout: --setgroup GROUP set gid to group or gid 2018-10-18T14:24:22.896 INFO:teuthology.orchestra.run.smithi176.stdout: --version show version and quit 2018-10-18T14:24:22.897 INFO:teuthology.orchestra.run.smithi176.stdout: 2018-10-18T14:24:22.897 INFO:teuthology.orchestra.run.smithi176.stdout: -d run in foreground, log to stderr 2018-10-18T14:24:22.897 INFO:teuthology.orchestra.run.smithi176.stdout: -f run in foreground, log to usual location 2018-10-18T14:24:22.897 INFO:teuthology.orchestra.run.smithi176.stdout: 2018-10-18T14:24:22.897 INFO:teuthology.orchestra.run.smithi176.stdout: --debug_ms N set message debug level (e.g. 1) 2018-10-18T14:24:22.897 INFO:teuthology.orchestra.run.smithi176.stderr:2018-10-18 14:24:22.891515 7f1ad96d7e00 -1 unrecognized arg --no-mon-config 2018-10-18T14:24:22.898 DEBUG:teuthology.orchestra.run:got remote process result: 1
Successful test: /a/pdonnell-2018-10-17_19:40:05-fs-wip-pdonnell-testing-20181017.175152-distro-basic-smithi/3152440/
Updated by Neha Ojha over 5 years ago
The following seem to be the relevant pieces for one osd leading to the failure:
2018-10-27T10:18:03.564 INFO:tasks.ceph.osd.0.smithi184.stdout:starting osd.0 at - osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2018-10-27T10:18:04.413 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:18:04.395137 7f0c2e88ee00 -1 osd.0 0 log_to_monitors {default=true} 2018-10-27T10:18:05.818 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:18:05.794096 7f0c16a28700 -1 osd.0 0 waiting for initial osdmap 2018-10-27T10:21:17.330 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.326 7f608492e600 -1 Falling back to public interface 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xda1180d2, expected 0x3e37199, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xda1180d2, expected 0x3e37199, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xda1180d2, expected 0x3e37199, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xda1180d2, expected 0x3e37199, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2018-10-27T10:21:17.633 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:17.630 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) fsck error: #-1:7b3f43c4:::osd_superblock:0# error during read: (5) Input/output error 2018-10-27T10:21:18.157 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:18.150 7f608492e600 -1 bluestore(/var/lib/ceph/osd/ceph-0) _mount fsck found 1 errors 2018-10-27T10:21:18.157 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:18.150 7f608492e600 -1 osd.0 0 OSD:init: unable to mount object store 2018-10-27T10:21:18.157 INFO:tasks.ceph.osd.0.smithi184.stderr:2018-10-27 10:21:18.150 7f608492e600 -1 ** ERROR: osd init failed: (5) Input/output error 2018-10-27T10:21:18.312 INFO:tasks.ceph.osd.0.smithi184.stderr:daemon-helper: command failed with exit status 1
Reproducer arguments for teuthology: -s fs:upgrade --filter 'snaps/{clusters/3-mds.yaml conf/{client.yaml mds.yaml mon.yaml osd.yaml} objectstore-ec/bluestore-ec-root.yaml overrides/{frag_enable.yaml multimds/yes.yaml whitelist_health.yaml whitelist_rstat.yaml whitelist_wrongly_marked_down.yaml} tasks/{0-luminous.yaml 1-client.yaml 2-upgrade.yaml 3-sanity.yaml 4-client-upgrade/no.yaml 5-client-sanity.yaml 6-snap-upgrade.yaml 7-client-sanity.yaml'
Updated by Neha Ojha over 5 years ago
- Project changed from RADOS to bluestore
Here's my analysis:
reproducer: https://tracker.ceph.com/issues/36606#note-9
commit before https://github.com/ceph/ceph/pull/23103 merged:
https://github.com/ceph/ceph-ci/commits/wip-36606
pass: http://pulpito.ceph.com/nojha-2018-11-01_22:44:10-fs:upgrade-wip-36606-distro-basic-smithi/
commit with https://github.com/ceph/ceph/pull/23103 merged:
https://github.com/ceph/ceph-ci/commits/wip-fix-36606
fail: http://pulpito.ceph.com/nojha-2018-11-02_01:58:29-fs:upgrade-wip-fix-36606-distro-basic-smithi/
This indicates that the above PR either introduced or exposed this bug.
Updated by Igor Fedotov over 5 years ago
- Status changed from In Progress to Fix Under Review
Updated by Igor Fedotov over 5 years ago
Updated by Sage Weil over 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to mimic,luminous
Igor, we should backport this, right?
Updated by Igor Fedotov over 5 years ago
Sage, no, it's specific to Nautilus for now. We need it when/if we backport BlueFS migrate stuff.
Updated by Igor Fedotov over 5 years ago
- Status changed from Pending Backport to Resolved