Project

General

Profile

Bug #10262

osd/osd_types.h: 2944: FAILED assert(rwstate.empty())

Added by Daniel Schneller over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
firefly,giant,dumpling
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

During the night of 2014-12-06 our cluster (4 nodes, 12x4TB spinning disks, Firefly 0.80.7.1 on Ubuntu 14.04.1) suffered a hefty series of OSD crashes, leading to a service outage for a while.

We had seen 6 OSDs crash two days before (during daytime) which triggered a service outage due to backfilling overloading our resources.
Once the backfilling was completed about 15 minutes later, the system returned to normal.

As a reaction we configured the cluster as follows (all values were previously on default) to dampen the effect of backfilling on clients:

"osd_max_backfills": "1",
"osd_recovery_threads": "1",
"osd_recovery_max_active": "1",
"osd_client_op_priority": "63",
"osd_recovery_op_priority": "1",
"mon_osd_down_out_interval": "3600"

Moreover, we disabled scrubbing and deep-scrubbing a few weeks ago, because that caused resource overload as well, leading to service disruptions.

For a test we manually stopped a single OSD to see if the backfilling would trigger and not max out I/O during backfill/recovery.
As this seemed to be ok, we re-added the OSD and left the system to itself with no apparent issues.
This was 2014-12-05 around 22.30h.

Later that same night apparently all OSDs but a single one crashed, most of them even multiple times in rather short succession.
Only one apparently survived.

The monitor log has the problem from about timestamp 2014-12-06 01:50 onwards.
Until roughly 3:00 you can see the OSDs come and go, several times each. A few (3 or 4) crashed just once, OSD 31 has 29 crashes, sometimes several per minute.
OSD logs have process crashes reported with an assertion failure.

Here is a snippet of a few of the OSD logs with the timestamp of the assertion failure.

? ceph-crash ack ".*0>.* osd/osd_types.h: In function"
...
ceph-osd.47.log.1
11066: 0> 2014-12-06 02:25:42.033932 7f224b3c6700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f224b3c6700 time 2014-12-06 02:25:42.029427
21615: 0> 2014-12-06 02:26:26.214972 7f2d6d5a1700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f2d6d5a1700 time 2014-12-06 02:26:26.210515
33087: 0> 2014-12-06 02:27:21.782378 7f44ec1f3700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f44ec1f3700 time 2014-12-06 02:27:21.777532
43550: 0> 2014-12-06 02:27:57.453031 7f147b5d8700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f147b5d8700 time 2014-12-06 02:27:57.447857
54389: 0> 2014-12-06 02:28:29.833710 7f9e9ec54700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f9e9ec54700 time 2014-12-06 02:28:29.829346
66085: 0> 2014-12-06 03:05:37.169162 7f6e56870700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f6e56870700 time 2014-12-06 03:05:37.165169

ceph-osd.5.log.1
12714: 0> 2014-12-06 03:04:09.834480 7f0974642700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f0974642700 time 2014-12-06 03:04:09.829061
23071: 0> 2014-12-06 03:07:04.519364 7f1cd63d8700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f1cd63d8700 time 2014-12-06 03:07:04.514771

ceph-osd.6.log.1
11475: 0> 2014-12-06 02:32:56.936653 7f639d1e0700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f639d1e0700 time 2014-12-06 02:32:56.911329
21785: 0> 2014-12-06 02:34:02.206510 7fb59bb79700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7fb59bb79700 time 2014-12-06 02:34:02.202372
32128: 0> 2014-12-06 02:35:27.505358 7f95251df700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f95251df700 time 2014-12-06 02:35:27.500700
53112: 0> 2014-12-06 02:43:32.662823 7f243ff7c700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f243ff7c700 time 2014-12-06 02:43:32.658366
63832: 0> 2014-12-06 03:04:09.816319 7fb41ad8e700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7fb41ad8e700 time 2014-12-06 03:04:09.811033
74201: 0> 2014-12-06 03:04:35.684241 7ff246094700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7ff246094700 time 2014-12-06 03:04:35.680116
84478: 0> 2014-12-06 03:05:37.196785 7fcc48a19700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7fcc48a19700 time 2014-12-06 03:05:37.192426
...

The assertion looks like this:

1 0> 2014-12-06 01:49:51.750163 7f08d6ade700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f08d6ade700 time 2014-12-06 01:49:51.605411
2 osd/osd_types.h: 2944: FAILED assert(rwstate.empty())
3
4 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
5 1: (std::tr1::_Sp_counted_base_impl<ObjectContext*, SharedPtrRegistry<hobject_t, ObjectContext>::OnRemoval, (_gnu_cxx::_Lock_policy)2>::_M_dispose()+0x207) [0x82cf37]
6 2: (std::tr1::_Sp_counted_base<(
_gnu_cxx::_Lock_policy)2>::_M_release()+0x39) [0x664449]
7 3: (ReplicatedPG::OpContext::~OpContext()+0x111) [0x824501]
8 4: (intrusive_ptr_release(ReplicatedPG::RepGather*)+0x2b) [0x7a490b]
9 5: (C_OSD_RepopApplied::~C_OSD_RepopApplied()+0x25) [0x824c25]
10 6: (ReplicatedBackend::op_applied(ReplicatedBackend::InProgressOp*)+0x1ec) [0x909e5c]
11 7: (Context::complete(int)+0x9) [0x654799]
12 8: (ReplicatedPG::BlessedContext::finish(int)+0x94) [0x818cb4]
13 9: (Context::complete(int)+0x9) [0x654799]
14 10: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x94) [0x65ca04]
15 11: (Context::complete(int)+0x9) [0x654799]
16 12: (Finisher::finisher_thread_entry()+0x1b8) [0x9a3018]
17 13: (()+0x8182) [0x7f08e3686182]
18 14: (clone()+0x6d) [0x7f08e1df9fbd]

2014-12-06 01:49:51.950615 7f08d6ade700 -1 ** Caught signal (Aborted) *
in thread 7f08d6ade700

ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
1: /usr/bin/ceph-osd() [0x980a3f]
2: (()+0x10340) [0x7f08e368e340]
3: (gsignal()+0x39) [0x7f08e1d35bb9]
4: (abort()+0x148) [0x7f08e1d38fc8]
5: (_gnu_cxx::_verbose_terminate_handler()+0x155) [0x7f08e26416b5]
6: (()+0x5e836) [0x7f08e263f836]
7: (()+0x5e863) [0x7f08e263f863]
8: (()+0x5eaa2) [0x7f08e263faa2]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1f2) [0xa63382]
10: (std::tr1::_Sp_counted_base_impl<ObjectContext*, SharedPtrRegistry<hobject_t, ObjectContext>::OnRemoval, (_gnu_cxx::_Lock_policy)2>::_M_dispose()+0x207) [0x82cf37]
11: (std::tr1::_Sp_counted_base<(
_gnu_cxx::_Lock_policy)2>::_M_release()+0x39) [0x664449]
12: (ReplicatedPG::OpContext::~OpContext()+0x111) [0x824501]
13: (intrusive_ptr_release(ReplicatedPG::RepGather*)+0x2b) [0x7a490b]
14: (C_OSD_RepopApplied::~C_OSD_RepopApplied()+0x25) [0x824c25]
15: (ReplicatedBackend::op_applied(ReplicatedBackend::InProgressOp*)+0x1ec) [0x909e5c]
16: (Context::complete(int)+0x9) [0x654799]
17: (ReplicatedPG::BlessedContext::finish(int)+0x94) [0x818cb4]
18: (Context::complete(int)+0x9) [0x654799]
19: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x94) [0x65ca04]
20: (Context::complete(int)+0x9) [0x654799]
21: (Finisher::finisher_thread_entry()+0x1b8) [0x9a3018]
22: (()+0x8182) [0x7f08e3686182]
23: (clone()+0x6d) [0x7f08e1df9fbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This is a list of all OSD sorted by their process launch time (2nd column) as of after the problem sorted itself out.

node03: 02:30 27:52 /usr/bin/ceph-osd --cluster=ceph -i 44 -f
node02: 02:30 33:56 /usr/bin/ceph-osd --cluster=ceph -i 20 -f
node03: 02:31 27:28 /usr/bin/ceph-osd --cluster=ceph -i 46 -f
node04: 02:32 30:51 /usr/bin/ceph-osd --cluster=ceph -i 33 -f
node02: 02:38 33:03 /usr/bin/ceph-osd --cluster=ceph -i 16 -f
node02: 02:39 26:37 /usr/bin/ceph-osd --cluster=ceph -i 12 -f
node04: 02:40 26:18 /usr/bin/ceph-osd --cluster=ceph -i 35 -f
node02: 02:45 32:30 /usr/bin/ceph-osd --cluster=ceph -i 22 -f
node03: 02:45 33:49 /usr/bin/ceph-osd --cluster=ceph -i 45 -f
node01: 02:49 26:50 /usr/bin/ceph-osd --cluster=ceph -i 0 -f
node02: 02:49 27:43 /usr/bin/ceph-osd --cluster=ceph -i 19 -f
node01: 03:01 22:59 /usr/bin/ceph-osd --cluster=ceph -i 8 -f
node03: 03:01 31:53 /usr/bin/ceph-osd --cluster=ceph -i 38 -f
node02: 03:01 33:21 /usr/bin/ceph-osd --cluster=ceph -i 21 -f
node01: 03:02 26:54 /usr/bin/ceph-osd --cluster=ceph -i 10 -f
node01: 03:03 31:10 /usr/bin/ceph-osd --cluster=ceph -i 11 -f
node01: 03:04 25:25 /usr/bin/ceph-osd --cluster=ceph -i 1 -f
node01: 03:04 26:02 /usr/bin/ceph-osd --cluster=ceph -i 2 -f
node03: 03:05 23:18 /usr/bin/ceph-osd --cluster=ceph -i 47 -f
node03: 03:05 26:15 /usr/bin/ceph-osd --cluster=ceph -i 43 -f
node02: 03:05 27:15 /usr/bin/ceph-osd --cluster=ceph -i 18 -f
node01: 03:05 29:25 /usr/bin/ceph-osd --cluster=ceph -i 6 -f
node04: 03:06 27:00 /usr/bin/ceph-osd --cluster=ceph -i 29 -f
node02: 03:07 22:16 /usr/bin/ceph-osd --cluster=ceph -i 23 -f
node03: 03:07 23:11 /usr/bin/ceph-osd --cluster=ceph -i 39 -f
node04: 03:07 24:01 /usr/bin/ceph-osd --cluster=ceph -i 28 -f
node03: 03:07 24:16 /usr/bin/ceph-osd --cluster=ceph -i 37 -f
node04: 03:07 24:23 /usr/bin/ceph-osd --cluster=ceph -i 32 -f
node01: 03:07 25:57 /usr/bin/ceph-osd --cluster=ceph -i 9 -f
node03: 03:07 26:09 /usr/bin/ceph-osd --cluster=ceph -i 42 -f
node01: 03:07 26:25 /usr/bin/ceph-osd --cluster=ceph -i 3 -f
node04: 03:07 26:33 /usr/bin/ceph-osd --cluster=ceph -i 24 -f
node02: 03:07 27:13 /usr/bin/ceph-osd --cluster=ceph -i 13 -f
node04: 03:07 27:41 /usr/bin/ceph-osd --cluster=ceph -i 26 -f
node01: 03:07 27:59 /usr/bin/ceph-osd --cluster=ceph -i 7 -f
node01: 03:07 28:20 /usr/bin/ceph-osd --cluster=ceph -i 5 -f
node04: 03:07 28:52 /usr/bin/ceph-osd --cluster=ceph -i 27 -f
node02: 03:07 29:31 /usr/bin/ceph-osd --cluster=ceph -i 14 -f
node03: 03:07 29:50 /usr/bin/ceph-osd --cluster=ceph -i 40 -f
node04: 03:07 30:12 /usr/bin/ceph-osd --cluster=ceph -i 31 -f
node04: 03:07 30:52 /usr/bin/ceph-osd --cluster=ceph -i 34 -f
node04: 03:07 30:55 /usr/bin/ceph-osd --cluster=ceph -i 30 -f
node03: 03:07 34:13 /usr/bin/ceph-osd --cluster=ceph -i 36 -f
node01: 03:08 21:27 /usr/bin/ceph-osd --cluster=ceph -i 4 -f
node02: 03:08 22:10 /usr/bin/ceph-osd --cluster=ceph -i 17 -f
node03: 03:08 23:41 /usr/bin/ceph-osd --cluster=ceph -i 41 -f
node04: 03:08 25:50 /usr/bin/ceph-osd --cluster=ceph -i 25 -f
node02: Nov08 1726:42 /usr/bin/ceph-osd --cluster=ceph -i 15 -f

All logs (OSDs and Monitor) are available as a ZIP here: https://public.centerdevice.de/f42bbc5d-f5e8-45f6-ae50-a56e52974aaa
Should you not be able to get it, the bug might have hit again, because this is the system hosted on this Ceph cluster ;-)

We have found a bug report with a similar assertion failure, but it might or might not be related. http://tracker.ceph.com/issues/8361

Luckily this happened during night time with low load on the system, but with clients working in the system, this could have been a total outage (as with the 6 OSDs before).

Is there any configuration error etc. that we could have made that would cause cluster-wide crashes like this?
Do the logs show any kind of “trigger” for this cascade of crashes? Something we did or that “just happened” at that time?
As far as I can tell we did not make any adjustments or changes apart from the reduced backfilling priorities mentioned above. Could this have triggered this problem? I don’t immediately see how, because we only made these changes after 6 OSDs crashed nearly simultaneously.

Any clues would be very much appreciated, as this is a production system that was running fine for a month and all of a sudden starts to exhibit this instability?

2014-12-14_osd.1.assert.zip (2.16 MB) Craig Lewis, 12/14/2014 10:53 AM


Related issues

Duplicated by Ceph - Bug #10697: dumpling: valgrind leak ReplicatedPG::get_object_context Duplicate 01/30/2015

Associated revisions

Revision 02fae9fc (diff)
Added by Sage Weil over 9 years ago

osd: handle no-op write with snapshot case

If we have a transaction that does something to the object but it !exists
both before and after, we will continue through the write path. If the
snapdir object already exists, and we try to create it again, we will
leak a snapdir obc and lock and later crash on an assert when the obc
is destroyed:

0> 2014-12-06 01:49:51.750163 7f08d6ade700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f08d6ade700 time 2014-12-06 01:49:51.605411
osd/osd_types.h: 2944: FAILED assert(rwstate.empty())

Fix is to not recreated the snapdir if it already exists.

Fixes: #10262
Signed-off-by: Sage Weil <>

Revision 8d29a4a2 (diff)
Added by Sage Weil about 9 years ago

osd: handle no-op write with snapshot case

If we have a transaction that does something to the object but it !exists
both before and after, we will continue through the write path. If the
snapdir object already exists, and we try to create it again, we will
leak a snapdir obc and lock and later crash on an assert when the obc
is destroyed:

0> 2014-12-06 01:49:51.750163 7f08d6ade700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f08d6ade700 time 2014-12-06 01:49:51.605411
osd/osd_types.h: 2944: FAILED assert(rwstate.empty())

Fix is to not recreated the snapdir if it already exists.

Fixes: #10262
Signed-off-by: Sage Weil <>
(cherry picked from commit 02fae9fc54c10b5a932102bac43f32199d4cb612)

Revision aa038684 (diff)
Added by Sage Weil about 9 years ago

osd: handle no-op write with snapshot case

If we have a transaction that does something to the object but it !exists
both before and after, we will continue through the write path. If the
snapdir object already exists, and we try to create it again, we will
leak a snapdir obc and lock and later crash on an assert when the obc
is destroyed:

0> 2014-12-06 01:49:51.750163 7f08d6ade700 -1 osd/osd_types.h: In function 'ObjectContext::~ObjectContext()' thread 7f08d6ade700 time 2014-12-06 01:49:51.605411
osd/osd_types.h: 2944: FAILED assert(rwstate.empty())

Fix is to not recreated the snapdir if it already exists.

Fixes: #10262
Signed-off-by: Sage Weil <>
(cherry picked from commit 02fae9fc54c10b5a932102bac43f32199d4cb612)

Revision a0eff325 (diff)
Added by Loic Dachary about 9 years ago

osd: handle no-op write with snapshot case

If we have a transaction that does something to the object but it !exists
both before and after, we will continue through the write path. If the
snapdir object already exists, and we try to create it again, we will
leak a snapdir obc.

Fix is to not recreate the snapdir if it already exists.

Fixes: #10262
Signed-off-by: Sage Weil <>
Signed-off-by: Loic Dachary <>

(cherry picked from commit 02fae9fc54c10b5a932102bac43f32199d4cb612)

Conflicts:
src/osd/ReplicatedPG.cc
src/test/librados/snapshots.cc

History

#1 Updated by Sage Weil over 9 years ago

  • Subject changed from All OSDs in the cluster crashing at the same time to osd/osd_types.h: 2944: FAILED assert(rwstate.empty())

#2 Updated by Sage Weil over 9 years ago

is this an erasure or replicated pool? are you using cache tiering?

#3 Updated by Daniel Schneller over 9 years ago

Replicated Pool. No cache tiering.

#4 Updated by Samuel Just over 9 years ago

Can you reproduce? The logs don't have much information, I need it reproduced with

debug osd = 20
debug filestore = 20
debug ms = 1

-Sam

#5 Updated by Daniel Schneller over 9 years ago

Unfortunately I doubt it. From what I have read, cranking up the logs so much would extremely quickly eat up available log space.
Also, we have no idea what caused it to happen two days in a row after a a month of production use and many more months before when the cluster was used already for shared home volumes (CephFS), OpenStack images and volumes for test VMs etc. etc.
Any other way I could be of help? Any other parameters, environment descriptions or queries I could run against the cluster to provide more insight?
E. g. we are creating nightly snapshots (but not currently deleting them).

#6 Updated by Craig Lewis over 9 years ago

I'm not (yet) having an outage, but I have had three different crashes in an hour. They all match this assert and stack trace. I have a couple gig available for logging.

I ran `ceph tell osd.\* injectargs '--debug_osd 20 --debug_filestore 20 --debug_ms=1'`, Unfortunately, nothing crashed, and I had to go back to normal logging after 2 hours, I continued to not have any problems after disabling logging.

Still, I think I can be somewhat helpful.

I just enabled nightly snapshots yesterday. All 17 pools have a single snapshot.

I just upgraded from kernel 3.13.0-30 to 3.13.0-43 yesterday, including a reboot on all nodes. Uptime is 19 hours.

I made the same config changes that Daniel made, a long time ago. I do have scrubbing enabled. Otherwise my setup is very similar: Ubuntu 14.04.01, 4 node cluster, 36 OSDs. 3x replicated pools, no tiering. Tunables are legacy. Traffic is low, I upgraded to firefly about 7 weeks ago, and haven't had any problems since then. The cluster originally started as cuttlefish.

noout is set, looks like I forgot to unset that after the reboots. I left it set while debugging, but I've unset it now that I'm done.

#7 Updated by Sage Weil over 9 years ago

  • Status changed from New to 12

Hi Craig, can you upload the logs to cephdrop? can email you the credentials if you don't have them yet. Thanks!

#8 Updated by Craig Lewis over 9 years ago

It happened again today. I had a mix of slow OSDs being marked down, and OSDs throwing this assertion.

I ran `ceph tell osd.\* injectargs '--debug_osd 20 --debug_filestore 20 --debug_ms=1'`, and captured an assert from osd.1. The zip is only 2 MB, so I'll just upload it here.

The assert happened at 2014-12-14 10:44:34.865418.

#9 Updated by Sage Weil over 9 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Sage Weil

#10 Updated by Sage Weil over 9 years ago

  • Status changed from In Progress to Fix Under Review

#11 Updated by Sage Weil over 9 years ago

btw it's the combination of pool snapshots and rgw that triggers this, which is why almost nobody else has seen it (including us).

#12 Updated by Craig Lewis over 9 years ago

Do you need beta testers? It looks like you don't. I'm still seeing a few asserts per day, but still no service outages.

Does the probability of triggering this bug increase as I add more snapshots? If so, I can stop taking new snaps until it's backported to Firefly.

#13 Updated by Samuel Just over 9 years ago

  • Status changed from Fix Under Review to Resolved

#14 Updated by Daniel Schneller over 9 years ago

Glad to read you have found it with Craig's logs :)

Just now (accidentally) looked into my work email on vacation. It would indeed be helpful to know, if creating or doing anything else with snapshots (like listing them etc.) increases the odds of OSDs crashing to reduce the risk potential (especially over the holidays) until the fix has been packaged.

#15 Updated by Daniel Schneller about 9 years ago

We got hit by this issue heavily last week on a production cluster where all the OSDs crashed over and over with this bug (0.80.7). As this is set to Resolved now, is there any information on when the fix will be incorporated into an update release (ideally for Firefly)? From reading the Firefly 0.80.8 release notes it does not seem to be included? Or is it and the section header "notable changes" is a hint that the list does not have each and every fixed bug in it?

A short clarification would be much appreciated.

#16 Updated by Sage Weil about 9 years ago

  • Backport set to firefly

Daniel Schneller wrote:

We got hit by this issue heavily last week on a production cluster where all the OSDs crashed over and over with this bug (0.80.7). As this is set to Resolved now, is there any information on when the fix will be incorporated into an update release (ideally for Firefly)? From reading the Firefly 0.80.8 release notes it does not seem to be included? Or is it and the section header "notable changes" is a hint that the list does not have each and every fixed bug in it?

A short clarification would be much appreciated.

Thanks for the note. I've backported it to the firefly branch. It will be in 0.80.9 (provided there aren't issues that come up in QA).

#17 Updated by Loïc Dachary about 9 years ago

  • Status changed from Resolved to Pending Backport
  • Assignee changed from Sage Weil to Samuel Just
  • Backport changed from firefly to firefly,giant

#18 Updated by Loïc Dachary about 9 years ago

  • Backport changed from firefly,giant to firefly,giant,dumpling

#19 Updated by Loïc Dachary about 9 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF