Project

General

Profile

Actions

Bug #41030

open

[bug] KRBD fail when ceph set wrong compression

Added by Alibek Amaev over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
librbd
Target version:
-
% Done:

0%

Source:
Tags:
krbd cli
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Tue Jul 30 14:48:35 2019] rbd: rbd4: capacity 85899345920 features 0x1
[Tue Jul 30 14:48:36 2019] rbd: rbd5: capacity 85899345920 features 0x1
[Tue Jul 30 14:48:39 2019] EXT4-fs (rbd4): write access unavailable, skipping orphan cleanup
[Tue Jul 30 14:48:39 2019] EXT4-fs (rbd4): mounted filesystem without journal. Opts: noload
[Tue Jul 30 14:48:39 2019] EXT4-fs (rbd5): mounted filesystem with ordered data mode. Opts: (null)
[Tue Jul 30 15:08:57 2019] audit: type=1400 audit(1564488540.553:3047): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-285_</var/lib/lxc>" name="/" pid=2885736 comm="(ionclean)" flags="rw, rslave" 
[Tue Jul 30 15:33:46 2019] rbd: rbd6: capacity 85899345920 features 0x1
[Tue Jul 30 15:33:47 2019] rbd: rbd7: capacity 85899345920 features 0x1
[Tue Jul 30 15:33:50 2019] EXT4-fs (rbd6): write access unavailable, skipping orphan cleanup
[Tue Jul 30 15:33:50 2019] EXT4-fs (rbd6): mounted filesystem without journal. Opts: noload
[Tue Jul 30 15:33:50 2019] EXT4-fs (rbd7): mounted filesystem with ordered data mode. Opts: (null)
[Tue Jul 30 15:38:57 2019] audit: type=1400 audit(1564490340.755:3048): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-285_</var/lib/lxc>" name="/" pid=3411740 comm="(ionclean)" flags="rw, rslave" 
[Tue Jul 30 15:44:25 2019] EXT4-fs error (device rbd6): ext4_lookup:1575: inode #2232031: comm rsync: deleted inode referenced: 2231808
[Tue Jul 30 15:44:25 2019] 
                           Assertion failure in rbd_queue_workfn() at line 4035:

                               rbd_assert(op_type == OBJ_OP_READ || rbd_dev->spec->snap_id == CEPH_NOSNAP);

[Tue Jul 30 15:44:25 2019] ------------[ cut here ]------------
[Tue Jul 30 15:44:25 2019] kernel BUG at drivers/block/rbd.c:4035!
[Tue Jul 30 15:44:25 2019] invalid opcode: 0000 [#1] SMP NOPTI
[Tue Jul 30 15:44:25 2019] Modules linked in: ceph xt_tcpmss nf_conntrack_netlink xt_ipvs xt_REDIRECT nf_nat_redirect xt_nat vxlan ip6_udp_tunnel udp_tunnel binfmt_misc tcp_diag inet_diag rbd libceph act_police cls_basic sch_ingress sch_htb veth nfsv3 nfs fscache ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment xt_addrtype xt_multiport xt_set xt_mark ip_set_hash_net ip_set xfs iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 8021q garp mrp softdog nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zavl(PO) icp(PO) nls_iso8859_1 ipmi_ssif zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm
[Tue Jul 30 15:44:25 2019]  ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xt_conntrack xfrm_user xfrm_algo ip_vs_wrr ip_vs_wlc ip_vs_sh ip_vs_sed ip_vs_rr ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp nf_nat ip_vs_dh amd64_edac_mod edac_mce_amd kvm_amd ast kvm irqbypass ttm crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel pcbc drm aesni_intel snd_pcm aes_x86_64 i2c_algo_bit fb_sys_fops snd_timer crypto_simd snd glue_helper syscopyarea soundcore joydev input_leds sysfillrect cryptd pcspkr sysimgblt ccp shpchp k10temp ipmi_si ipmi_devintf mac_hid ipmi_msghandler 8250_dw ip_vs nfsd nf_conntrack libcrc32c auth_rpcgss overlay nfs_acl lockd bonding grace sunrpc ip_tables x_tables autofs4 hid_generic usbkbd usbmouse usbhid hid ixgbe dca qla2xxx ptp mpt3sas nvme_fc raid_class pps_core
[Tue Jul 30 15:44:25 2019]  nvme_fabrics scsi_transport_sas mdio scsi_transport_fc ahci libahci i2c_piix4
[Tue Jul 30 15:44:25 2019] CPU: 89 PID: 2547266 Comm: kworker/89:1 Tainted: P           O     4.15.18-9-pve #1
[Tue Jul 30 15:44:25 2019] Hardware name: Supermicro Super Server/H11DST-B, BIOS 1.1a 10/04/2018
[Tue Jul 30 15:44:25 2019] Workqueue: rbd rbd_queue_workfn [rbd]
[Tue Jul 30 15:44:25 2019] RIP: 0010:rbd_queue_workfn+0x462/0x4f0 [rbd]
[Tue Jul 30 15:44:25 2019] RSP: 0018:ffffbac02e65fe18 EFLAGS: 00010286
[Tue Jul 30 15:44:25 2019] RAX: 0000000000000086 RBX: ffff9f81d60d2000 RCX: 0000000000000006
[Tue Jul 30 15:44:25 2019] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9fc6dfc56490
[Tue Jul 30 15:44:25 2019] RBP: ffffbac02e65fe60 R08: 0000000000000000 R09: 0000000000068c16
[Tue Jul 30 15:44:25 2019] R10: 0000000000000198 R11: 00000000ffffffff R12: ffff9fbce2785ec0
[Tue Jul 30 15:44:25 2019] R13: ffff9ee2bce62e80 R14: 0000000000000000 R15: 0000000000001000
[Tue Jul 30 15:44:25 2019] FS:  0000000000000000(0000) GS:ffff9fc6dfc40000(0000) knlGS:0000000000000000
[Tue Jul 30 15:44:25 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Jul 30 15:44:25 2019] CR2: 00007ffc87f17000 CR3: 0000007579bf4000 CR4: 00000000003406e0
[Tue Jul 30 15:44:25 2019] Call Trace:
[Tue Jul 30 15:44:25 2019]  ? __schedule+0x3e8/0x870
[Tue Jul 30 15:44:25 2019]  process_one_work+0x1e0/0x400
[Tue Jul 30 15:44:25 2019]  worker_thread+0x4b/0x420
[Tue Jul 30 15:44:25 2019]  kthread+0x105/0x140
[Tue Jul 30 15:44:25 2019]  ? process_one_work+0x400/0x400
[Tue Jul 30 15:44:25 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Tue Jul 30 15:44:25 2019]  ret_from_fork+0x22/0x40
[Tue Jul 30 15:44:25 2019] Code: 00 48 83 78 20 fe 0f 84 6a fc ff ff 48 c7 c1 a8 68 93 c0 ba c3 0f 00 00 48 c7 c6 b0 7c 93 c0 48 c7 c7 90 5d 93 c0 e8 2e d7 3b d5 <0f> 0b 48 8b 75 d0 4d 89 d0 44 89 f1 4c 89 fa 48 89 df 4c 89 55 
[Tue Jul 30 15:44:25 2019] RIP: rbd_queue_workfn+0x462/0x4f0 [rbd] RSP: ffffbac02e65fe18
[Tue Jul 30 15:44:25 2019] ---[ end trace abac896a7f25ee00 ]---
[Tue Jul 30 16:02:41 2019] rbd: rbd8: capacity 85899345920 features 0x1
[Tue Jul 30 16:02:41 2019] EXT4-fs (rbd8): write access unavailable, skipping orphan cleanup
[Tue Jul 30 16:02:41 2019] EXT4-fs (rbd8): mounted filesystem without journal. Opts: noload
[Tue Jul 30 16:08:57 2019] audit: type=1400 audit(1564492140.992:3049): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-285_</var/lib/lxc>" name="/" pid=3994573 comm="(ionclean)" flags="rw, rslave" 
[Tue Jul 30 16:13:57 2019] EXT4-fs error (device rbd8): ext4_lookup:1575: inode #2232031: comm tar: deleted inode referenced: 2231884
[Tue Jul 30 16:13:57 2019] 
                           Assertion failure in rbd_queue_workfn() at line 4035:

                               rbd_assert(op_type == OBJ_OP_READ || rbd_dev->spec->snap_id == CEPH_NOSNAP);

[Tue Jul 30 16:13:57 2019] ------------[ cut here ]------------
[Tue Jul 30 16:13:57 2019] kernel BUG at drivers/block/rbd.c:4035!
[Tue Jul 30 16:13:57 2019] invalid opcode: 0000 [#2] SMP NOPTI
[Tue Jul 30 16:13:57 2019] Modules linked in: ceph xt_tcpmss nf_conntrack_netlink xt_ipvs xt_REDIRECT nf_nat_redirect xt_nat vxlan ip6_udp_tunnel udp_tunnel binfmt_misc tcp_diag inet_diag rbd libceph act_police cls_basic sch_ingress sch_htb veth nfsv3 nfs fscache ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment xt_addrtype xt_multiport xt_set xt_mark ip_set_hash_net ip_set xfs iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 8021q garp mrp softdog nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zavl(PO) icp(PO) nls_iso8859_1 ipmi_ssif zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm
[Tue Jul 30 16:13:57 2019]  ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xt_conntrack xfrm_user xfrm_algo ip_vs_wrr ip_vs_wlc ip_vs_sh ip_vs_sed ip_vs_rr ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp nf_nat ip_vs_dh amd64_edac_mod edac_mce_amd kvm_amd ast kvm irqbypass ttm crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel pcbc drm aesni_intel snd_pcm aes_x86_64 i2c_algo_bit fb_sys_fops snd_timer crypto_simd snd glue_helper syscopyarea soundcore joydev input_leds sysfillrect cryptd pcspkr sysimgblt ccp shpchp k10temp ipmi_si ipmi_devintf mac_hid ipmi_msghandler 8250_dw ip_vs nfsd nf_conntrack libcrc32c auth_rpcgss overlay nfs_acl lockd bonding grace sunrpc ip_tables x_tables autofs4 hid_generic usbkbd usbmouse usbhid hid ixgbe dca qla2xxx ptp mpt3sas nvme_fc raid_class pps_core
[Tue Jul 30 16:13:57 2019]  nvme_fabrics scsi_transport_sas mdio scsi_transport_fc ahci libahci i2c_piix4
[Tue Jul 30 16:13:57 2019] CPU: 104 PID: 1135782 Comm: kworker/104:2 Tainted: P      D    O     4.15.18-9-pve #1
[Tue Jul 30 16:13:57 2019] Hardware name: Supermicro Super Server/H11DST-B, BIOS 1.1a 10/04/2018
[Tue Jul 30 16:13:57 2019] Workqueue: rbd rbd_queue_workfn [rbd]
[Tue Jul 30 16:13:57 2019] RIP: 0010:rbd_queue_workfn+0x462/0x4f0 [rbd]
[Tue Jul 30 16:13:57 2019] RSP: 0018:ffffbac1cb29be18 EFLAGS: 00010286
[Tue Jul 30 16:13:57 2019] RAX: 0000000000000086 RBX: ffff9f03f5145000 RCX: 0000000000000000
[Tue Jul 30 16:13:57 2019] RDX: 0000000000000000 RSI: ffffa045dfc16498 RDI: ffffa045dfc16498
[Tue Jul 30 16:13:57 2019] RBP: ffffbac1cb29be60 R08: 0000000000000101 R09: 0000000000068c3a
[Tue Jul 30 16:13:57 2019] R10: 00000000000000e8 R11: 00000000ffffffff R12: ffffa04360da3a40
[Tue Jul 30 16:13:57 2019] R13: ffff9eea21925100 R14: 0000000000000000 R15: 0000000000001000
[Tue Jul 30 16:13:57 2019] FS:  0000000000000000(0000) GS:ffffa045dfc00000(0000) knlGS:0000000000000000
[Tue Jul 30 16:13:57 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Jul 30 16:13:57 2019] CR2: 000055e545d29108 CR3: 0000006872f78000 CR4: 00000000003406e0
[Tue Jul 30 16:13:57 2019] Call Trace:
[Tue Jul 30 16:13:57 2019]  ? __schedule+0x3e8/0x870
[Tue Jul 30 16:13:57 2019]  process_one_work+0x1e0/0x400
[Tue Jul 30 16:13:57 2019]  worker_thread+0x4b/0x420
[Tue Jul 30 16:13:57 2019]  kthread+0x105/0x140
[Tue Jul 30 16:13:57 2019]  ? process_one_work+0x400/0x400
[Tue Jul 30 16:13:57 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Tue Jul 30 16:13:57 2019]  ret_from_fork+0x22/0x40
[Tue Jul 30 16:13:57 2019] Code: 00 48 83 78 20 fe 0f 84 6a fc ff ff 48 c7 c1 a8 68 93 c0 ba c3 0f 00 00 48 c7 c6 b0 7c 93 c0 48 c7 c7 90 5d 93 c0 e8 2e d7 3b d5 <0f> 0b 48 8b 75 d0 4d 89 d0 44 89 f1 4c 89 fa 48 89 df 4c 89 55 
[Tue Jul 30 16:13:57 2019] RIP: rbd_queue_workfn+0x462/0x4f0 [rbd] RSP: ffffbac1cb29be18
[Tue Jul 30 16:13:57 2019] ---[ end trace abac896a7f25ee01 ]---

Problem covered in:

Jul 23 06:29:42 lpr11a ceph-osd[6716]: 2019-07-23 06:29:42.101815 7fa7ab8fd700 -1 load failed dlopen(): "/usr/lib/ceph/compressor/libceph_none.so: cannot open shared object file: No such file or directory" or "/usr/lib/ceph/libceph_none.so: cannot open shared object file: No such file or directory" 
Jul 23 06:29:42 lpr11a ceph-osd[6716]: 2019-07-23 06:29:42.101883 7fa7ab8fd700 -1 create cannot load compressor of type none

ceph osd pool set <POOL> compression_algorithm <ALGO>

If <ALGO> set to none, than fail KRBD, example:
ceph osd pool set bigpool compression_algorithm none

Of couse it is wrong, becouse:

# ls -1 /usr/lib/ceph/compressor/
libceph_lz4.so
libceph_lz4.so.2
libceph_lz4.so.2.0.0
libceph_snappy.so
libceph_snappy.so.2
libceph_snappy.so.2.0.0
libceph_zlib.so
libceph_zlib.so.2
libceph_zlib.so.2.0.0
libceph_zstd.so
libceph_zstd.so.2
libceph_zstd.so.2.0.0

But cli (or librbd or api pool) must be preventing to set wrong param.

Currently when rbd is fail, whole system is in wait io(sync), and only reset (hard or magic) to solve this state.

No data to display

Actions

Also available in: Atom PDF