Project

General

Profile

Bug #25050

osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write

Added by Yohay Azulay over 5 years ago. Updated about 5 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

OSD Crashed and it fails to start,

Debug Log can be download here: http://77.247.180.45/download/ceph-osd.22.debug.log.gz

-11> 2018-07-21 09:05:45.607 7f2bcae17700  0 osd.22 pg_epoch: 10660 pg[1.27as0( v 8376'14122 (7183'11117,8376'14122] local-lis/les=10659/10660 n=7102 ec=119/119 lis/c 10659/8374 les/c/f 10660/8375/0 10659/10659/10659) [22,14,11,3]p22
-10> 2018-07-21 09:05:45.607 7f2bcae17700 0 osd.22 pg_epoch: 10660 pg[1.27as0( v 8376'14122 (7183'11117,8376'14122] local-lis/les=10659/10660 n=7102 ec=119/119 lis/c 10659/8374 les/c/f 10660/8375/0 10659/10659/10659) [22,14,11,3]p22
-9> 2018-07-21 09:05:45.607 7f2bcae17700 0 osd.22 pg_epoch: 10660 pg[1.27as0( v 8376'14122 (7183'11117,8376'14122] local-lis/les=10659/10660 n=7102 ec=119/119 lis/c 10659/8374 les/c/f 10660/8375/0 10659/10659/10659) [22,14,11,3]p22
-8> 2018-07-21 09:05:45.607 7f2bcae17700 0 osd.22 pg_epoch: 10660 pg[1.27as0( v 8376'14122 (7183'11117,8376'14122] local-lis/les=10659/10660 n=7102 ec=119/119 lis/c 10659/8374 les/c/f 10660/8375/0 10659/10659/10659) [22,14,11,3]p22
-7> 2018-07-21 09:05:45.607 7f2bcae17700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: , clear_divergent_priors: 0
-6> 2018-07-21 09:05:45.607 7f2bcae17700 1 get compressor lz4 = 0
-5> 2018-07-21 09:05:45.607 7f2bcae17700 1 load compressor lz4
-4> 2018-07-21 09:05:45.607 7f2bcae17700 1 add compressor lz4 0x563414ef0210
-3> 2018-07-21 09:05:45.607 7f2bcae17700 1 get compressor lz4 = 0x563414ef0210
-2> 2018-07-21 09:05:45.607 7f2bcae17700 1 load: compressor lz4 loaded and registered
-1> 2018-07-21 09:05:45.607 7f2bcae17700 1 get compressor lz4 = 0x563414ef0210
0> 2018-07-21 09:05:45.615 7f2bcae17700 -1 /build/ceph-13.2.0/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteConte
/build/ceph-13.2.0/src/os/bluestore/BlueStore.cc: 10515: FAILED assert(r == 0)
ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f2bef7acb82]
2: (()+0x2e1d47) [0x7f2bef7acd47]
3: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x3146) [0x5633d1668646]
4: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x2ae) [0x5633d167351e]
5: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0xc6) [0x5633d16740d6]
6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0xcc8) [0x5633d1676a78]
7: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x619) [0x
8: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x82) [0x5633d126d4a2]
9: (non-virtual thunk to PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, boost::intrusive_ptr<OpRequest>)+0x4f) [0x5633d13e24af]
10: (ECBackend::dispatch_recovery_messages(RecoveryMessages&, int)+0x93e) [0x5633d151f76e]
11: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x14c) [0x5633d1526c4c]
12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x5633d1411417]
13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6ed) [0x5633d13bfdbd]
14: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b8) [0x5633d121add8]
15: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5633d1493fb2]
16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590) [0x5633d123a550]
17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7f2bef7b1abe]
18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f2bef7b3b40]
19: (()+0x76ba) [0x7f2bede386ba]
20: (clone()+0x6d) [0x7f2bed44741d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

History

#1 Updated by Yohay Azulay over 5 years ago

I disabled the LZ4 compression on both cephfs data and metadata pools and the osd is not up.

Similar symptom we had here: https://tracker.ceph.com/issues/23540, different error message.

Yohay Azulay wrote:

OSD Crashed and it fails to start,

Debug Log can be download here: http://77.247.180.45/download/ceph-osd.22.debug.log.gz

-11> 2018-07-21 09:05:45.607 7f2bcae17700 0 osd.22 pg_epoch: 10660 pg[1.27as0( v 8376'14122 (7183'11117,8376'14122] local-lis/les=10659/10660 n=7102 ec=119/119 lis/c 10659/8374 les/c/f 10660/8375/0 10659/10659/10659) [22,14,11,3]p22
-10> 2018-07-21 09:05:45.607 7f2bcae17700 0 osd.22 pg_epoch: 10660 pg[1.27as0( v 8376'14122 (7183'11117,8376'14122] local-lis/les=10659/10660 n=7102 ec=119/119 lis/c 10659/8374 les/c/f 10660/8375/0 10659/10659/10659) [22,14,11,3]p22
-9> 2018-07-21 09:05:45.607 7f2bcae17700 0 osd.22 pg_epoch: 10660 pg[1.27as0( v 8376'14122 (7183'11117,8376'14122] local-lis/les=10659/10660 n=7102 ec=119/119 lis/c 10659/8374 les/c/f 10660/8375/0 10659/10659/10659) [22,14,11,3]p22
-8> 2018-07-21 09:05:45.607 7f2bcae17700 0 osd.22 pg_epoch: 10660 pg[1.27as0( v 8376'14122 (7183'11117,8376'14122] local-lis/les=10659/10660 n=7102 ec=119/119 lis/c 10659/8374 les/c/f 10660/8375/0 10659/10659/10659) [22,14,11,3]p22
-7> 2018-07-21 09:05:45.607 7f2bcae17700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: , clear_divergent_priors: 0
-6> 2018-07-21 09:05:45.607 7f2bcae17700 1 get compressor lz4 = 0
-5> 2018-07-21 09:05:45.607 7f2bcae17700 1 load compressor lz4
-4> 2018-07-21 09:05:45.607 7f2bcae17700 1 add compressor lz4 0x563414ef0210
-3> 2018-07-21 09:05:45.607 7f2bcae17700 1 get compressor lz4 = 0x563414ef0210
-2> 2018-07-21 09:05:45.607 7f2bcae17700 1 load: compressor lz4 loaded and registered
-1> 2018-07-21 09:05:45.607 7f2bcae17700 1 get compressor lz4 = 0x563414ef0210
0> 2018-07-21 09:05:45.615 7f2bcae17700 -1 /build/ceph-13.2.0/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteConte
/build/ceph-13.2.0/src/os/bluestore/BlueStore.cc: 10515: FAILED assert(r == 0)

ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f2bef7acb82]
2: (()+0x2e1d47) [0x7f2bef7acd47]
3: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x3146) [0x5633d1668646]
4: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x2ae) [0x5633d167351e]
5: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0xc6) [0x5633d16740d6]
6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0xcc8) [0x5633d1676a78]
7: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x619) [0x
8: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x82) [0x5633d126d4a2]
9: (non-virtual thunk to PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, boost::intrusive_ptr<OpRequest>)+0x4f) [0x5633d13e24af]
10: (ECBackend::dispatch_recovery_messages(RecoveryMessages&, int)+0x93e) [0x5633d151f76e]
11: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x14c) [0x5633d1526c4c]
12: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x5633d1411417]
13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6ed) [0x5633d13bfdbd]
14: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b8) [0x5633d121add8]
15: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x5633d1493fb2]
16: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590) [0x5633d123a550]
17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7f2bef7b1abe]
18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f2bef7b3b40]
19: (()+0x76ba) [0x7f2bede386ba]
20: (clone()+0x6d) [0x7f2bed44741d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#2 Updated by Yohay Azulay over 5 years ago

I meant and now OSD is now up.

#3 Updated by Patrick Donnelly over 5 years ago

  • Project changed from Ceph to RADOS
  • Subject changed from OSD Failed to Start In function 'int BlueStore::_do_alloc_write to osd: OSD Failed to Start In function 'int BlueStore::_do_alloc_write
  • Category deleted (OSD)
  • Priority changed from Normal to High
  • Component(RADOS) BlueStore, OSD added

#4 Updated by Igor Fedotov over 5 years ago

  • Project changed from RADOS to bluestore

#5 Updated by Igor Fedotov over 5 years ago

Yohay,

it looks like LZ4 compressor is still enabled and failing. Haven't you enabled compression on per-OSD basis?

#6 Updated by Yohay Azulay over 5 years ago

Correct. The log is with LZ4 compression is enabled and the OSD fails to start.

When I disabled the lz4 it went up and stable, cluster is now running with pools compression = none for a couple of days now without any issues since compression was disabled.

Igor Fedotov wrote:

Yohay,

it looks like LZ4 compressor is still enabled and failing. Haven't you enabled compression on per-OSD basis?

#7 Updated by Igor Fedotov over 5 years ago

Mind switching to a different compression method and try again?

#8 Updated by Yohay Azulay over 5 years ago

It is already back to production without compression, I am not sure it is related to the compression type, we had similar problem with Snappy compression too.

Igor Fedotov wrote:

Mind switching to a different compression method and try again?

#9 Updated by Yohay Azulay over 5 years ago

Issue is still unresolved, OSDs keep crushing when compression is enabled, and it is not coming, only when we disable the compression and restart the osds also with Mimic 13.2.1,

We had this first with Snappy compression in Luminous 12.2.5 and when we upgraded to 13.2.0 and then to 13.2.1 and used lz4, issue was not resolved, seems like something in the compression feature is buggy unrelated to which compression is used.

This is what we were able to catch today that might help figuring out where this is coming from,

this is the message that keeps returning:
2018-09-20 02:09:49.288 7f289526e700 -1 /build/ceph-13.2.1/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7f289526e700 time 2018-09-20 02:09:49.263904
0> 2018-09-20 02:09:49.288 7f289526e700 -1 /build/ceph-13.2.1/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7f289526e700 time 2018-09-20 02:09:49.263904

and here's the full log:

2018-09-20 02:09:49.160 7f28b1caa700 1 -- 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 137528 ==== MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135878, reqid=client.5246
4828.1:1451103, at_version=1002'211422, trim_to=918'208389, roll_forward_to=1002'211420)) v2 ==== 1402736+0+0 (1916543509 0 0) 0x5607398b0800 con 0x560738096e00
158> 2018-09-20 02:09:49.160 7f289426c700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'211422, trimmed: , trimmed_dups: , clear_
divergent_priors: 0
157> 2018-09-20 02:09:49.160 7f28b2cac700 5 - 192.168.31.115:6809/1731 >> 192.168.31.113:6805/1853 conn(0x560738097500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=383 cs=37 l=0). rx
osd.16 seq 118788 0x56073e2ed800 MOSDECSubOpWrite(6.24es4 1002/987 ECSubWrite(tid=174695, reqid=client.52464828.1:1451058, at_version=0'0, trim_to=907'100836, roll_forward_to=1002'103931)) v2
156> 2018-09-20 02:09:49.160 7f28b2cac700 1 - 192.168.31.115:6809/1731 <== osd.16 192.168.31.113:6805/1853 118788 ==== MOSDECSubOpWrite(6.24es4 1002/987 ECSubWrite(tid=174695, reqid=client.524
64828.1:1451058, at_version=0'0, trim_to=907'100836, roll_forward_to=1002'103931)) v2 ==== 873+0+0 (3852236216 0 0) 0x56073e2ed800 con 0x560738097500
155> 2018-09-20 02:09:49.160 7f289fa83700 1 - 192.168.31.115:6809/1731 --> 192.168.31.113:6805/1853 -
MOSDECSubOpWriteReply(6.24es0 1002/987 ECSubWriteReply(tid=174695, last_complete=1002'103
931, committed=1, applied=1)) v2 -- 0x56071f3ab440 con 0
154> 2018-09-20 02:09:49.164 7f289426c700 1 get compressor lz4 = 0x560738442780
153> 2018-09-20 02:09:49.164 7f289fa83700 1 - 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 - MOSDECSubOpWriteReply(6.107s0 1002/980 ECSubWriteReply(tid=135878, last_complete=1002'211
422, committed=1, applied=1)) v2 -- 0x560723047200 con 0
152> 2018-09-20 02:09:49.164 7f28b24ab700 5 - 192.168.31.115:6808/1731 >> 192.168.31.11:0/1009857432 conn(0x56073a606000 :6808 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=151 cs=1 l=1).
rx client.52464828 seq 43192 0x56073dfe5040 osd_op(client.52464828.1:1451054 6.c5eabd3b 6:dcbd57a3:::10000243572.000026a1:head [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e1002) v4
151> 2018-09-20 02:09:49.164 7f28b24ab700 1 - 192.168.31.115:6808/1731 <== client.52464828 192.168.31.11:0/1009857432 43192 ==== osd_op(client.52464828.1:1451054 6.c5eabd3b 6:dcbd57a3:::100002
43572.000026a1:head [write 0~4194304 [1@-1]] snapc 1=[] ondisk+write e1002) v4 ==== 159+0+4194304 (780947299 0 4035575411) 0x56073dfe5040 con 0x56073a606000
150> 2018-09-20 02:09:49.184 7f28b2cac700 5 - 192.168.31.115:6809/1731 >> 192.168.31.113:6805/1853 conn(0x560738097500 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=383 cs=37 l=0). rx
osd.16 seq 118789 0x56073b63d800 MOSDECSubOpWrite(6.129s3 1002/980 ECSubWrite(tid=174696, reqid=client.52464828.1:1451063, at_version=1002'210278, trim_to=918'207202, roll_forward_to=1002'210277))
v2
149> 2018-09-20 02:09:49.184 7f28b2cac700 1 - 192.168.31.115:6809/1731 <== osd.16 192.168.31.113:6805/1853 118789 ==== MOSDECSubOpWrite(6.129s3 1002/980 ECSubWrite(tid=174696, reqid=client.524
64828.1:1451063, at_version=1002'210278, trim_to=918'207202, roll_forward_to=1002'210277)) v2 ==== 1402736+0+0 (2212463013 0 0) 0x56073b63d800 con 0x560738097500
148> 2018-09-20 02:09:49.184 7f2894a6d700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'210278, trimmed: , trimmed_dups: , clear_
divergent_priors: 0
147> 2018-09-20 02:09:49.184 7f2895a6f700 1 - 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 - MOSDECSubOpWrite(6.13bs4 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054,
at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x560709cab800 con 0
146> 2018-09-20 02:09:49.184 7f2895a6f700 1 - 192.168.31.115:6809/1731 --> 192.168.31.112:6809/3750 -- MOSDECSubOpWrite(6.13bs2 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054,
at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x56074152d000 con 0
145> 2018-09-20 02:09:49.184 7f2895a6f700 1 - 192.168.31.115:6809/1731 --> 192.168.31.114:6807/1001865 -- MOSDECSubOpWrite(6.13bs3 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:14510
54, at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x56073b7dc000 con 0
144> 2018-09-20 02:09:49.184 7f2895a6f700 1 - 192.168.31.115:6809/1731 --> 192.168.31.113:6804/1845 -- MOSDECSubOpWrite(6.13bs1 1002/987 ECSubWrite(tid=196644, reqid=client.52464828.1:1451054,
at_version=1002'213984, trim_to=992'210963, roll_forward_to=1002'213983)) v2 -- 0x560709d16800 con 0
143> 2018-09-20 02:09:49.188 7f2895a6f700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'213984, trimmed: , trimmed_dups: , clear_
divergent_priors: 0
142> 2018-09-20 02:09:49.188 7f2894a6d700 1 get compressor lz4 = 0x560738442780
-141> 2018-09-20 02:09:49.188 7f28b1caa700 5 - 192.168.31.115:6809/1731 >> 192.168.31.111:6801/1853 conn(0x560738096e00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=337 cs=44 l=0). rx
osd.1 seq 137529 0x56073b237000 MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135880, reqid=client.52464828.1:1451103, at_version=0'0, trim_to=918'208389, roll_forward_to=1002'211422)) v2
140> 2018-09-20 02:09:49.188 7f28b1caa700 1 - 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 137529 ==== MOSDECSubOpWrite(6.107s2 1002/980 ECSubWrite(tid=135880, reqid=client.5246
4828.1:1451103, at_version=0'0, trim_to=918'208389, roll_forward_to=1002'211422)) v2 ==== 873+0+0 (2773962764 0 0) 0x56073b237000 con 0x560738096e00
139> 2018-09-20 02:09:49.188 7f28b2cac700 5 - 192.168.31.115:6809/1731 >> 192.168.31.114:6807/1001865 conn(0x560738ce1c00 :6809 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2816 cs=34 l=0
). rx osd.15 seq 126801 0x560743144800 MOSDECSubOpWrite(6.c2s4 1002/980 ECSubWrite(tid=6776444, reqid=client.52464828.1:1451116, at_version=1002'210831, trim_to=918'207828, roll_forward_to=1002'210
830)) v2
138> 2018-09-20 02:09:49.188 7f28b2cac700 1 - 192.168.31.115:6809/1731 <== osd.15 192.168.31.114:6807/1001865 126801 ==== MOSDECSubOpWrite(6.c2s4 1002/980 ECSubWrite(tid=6776444, reqid=client.
52464828.1:1451116, at_version=1002'210831, trim_to=918'207828, roll_forward_to=1002'210830)) v2 ==== 6464+0+0 (791773677 0 0) 0x560743144800 con 0x560738ce1c00
137> 2018-09-20 02:09:49.192 7f2893a6b700 5 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 1002'210831, trimmed: , trimmed_dups: , clear_
divergent_priors: 0
-136> 2018-09-20 02:09:49.192 7f289fa83700 1 - 192.168.31.115:6809/1731 --> 192.168.31.111:6801/1853 -
MOSDECSubOpWriteReply(6.107s0 1002/980 ECSubWriteReply(tid=135880, last_complete=1002'211
422, committed=1, applied=1)) v2 -- 0x5607404e7cc0 con 0
135> 2018-09-20 02:09:49.192 7f2893a6b700 1 get compressor lz4 = 0x560738442780
134> 2018-09-20 02:09:49.192 7f2895a6f700 1 get compressor lz4 = 0x560738442780
-133> 2018-09-20 02:09:49.196 7f289fa83700 1 - 192.168.31.115:6809/1731 --> 192.168.31.114:6807/1001865 -
MOSDECSubOpWriteReply(6.c2s0 1002/980 ECSubWriteReply(tid=6776444, last_complete=1002'
210831, committed=1, applied=1)) v2 -- 0x56073a125600 con 0
132> 2018-09-20 02:09:49.200 7f289fa83700 1 - 192.168.31.115:6809/1731 --> 192.168.31.113:6805/1853 -- MOSDECSubOpWriteReply(6.129s0 1002/980 ECSubWriteReply(tid=174696, last_complete=1002'210
278, committed=1, applied=1)) v2 -- 0x5607211be900 con 0
131> 2018-09-20 02:09:49.208 7f28b24ab700 5 - 192.168.31.115:6809/1731 >> 192.168.31.113:6804/1845 conn(0x560737f6a000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=322 cs=31 l=0). rx
osd.17 seq 117211 0x560743c32900 MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_complete=1002'213984, committed=1, applied=1)) v2
130> 2018-09-20 02:09:49.208 7f28b24ab700 1 - 192.168.31.115:6809/1731 <== osd.17 192.168.31.113:6804/1845 117211 ==== MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_c
omplete=1002'213984, committed=1, applied=1)) v2 ==== 95+0+0 (3771332997 0 0) 0x560743c32900 con 0x560737f6a000
129> 2018-09-20 02:09:49.208 7f28b1caa700 5 - 192.168.31.115:6809/1731 >> 192.168.31.111:6801/1853 conn(0x560738096e00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=337 cs=44 l=0). rx
osd.1 seq 137530 0x5607404e7cc0 MOSDECSubOpWriteReply(6.13bs0 1002/987 ECSubWriteReply(tid=196644, last_complete=1002'213984, committed=1, applied=1)) v2
128> 2018-09-20 02:09:49.208 7f28b1caa700 1 - 192.168.31.115:6809/1731 <== osd.1 192.168.31.111:6801/1853 1
2018-09-20 02:09:49.288 7f289526e700 -1 /build/ceph-13.2.1/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7f289526e700 time 2018-09-20 02:09:49.263904
0> 2018-09-20 02:09:49.288 7f289526e700 -1 /build/ceph-13.2.1/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_alloc_write(BlueStore::TransContext*, BlueStore::CollectionRef, BlueStore::OnodeRef, BlueStore::WriteContext*)' thread 7f289526e700 time 2018-09-20 02:09:49.263904

Yohay Azulay wrote:

It is already back to production without compression, I am not sure it is related to the compression type, we had similar problem with Snappy compression too.

Igor Fedotov wrote:

Mind switching to a different compression method and try again?

#10 Updated by Josh Durgin over 5 years ago

  • Status changed from New to Need More Info
  • Priority changed from High to Normal

#11 Updated by Igor Fedotov about 5 years ago

  • Status changed from Need More Info to Duplicate

Also available in: Atom PDF