Project

General

Profile

Actions

Bug #4851

closed

leveldb: hang on leveldb::DBImpl::MakeRoomForWrite(bool)

Added by Wido den Hollander about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Spent time:
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While testing with the next branch (50e58b9f49382d690f5a22af80f6981f1c12d4c3) I stumbled upon the problem that creating for example a auth key took a long time:

ceph auth get-or-create XXXX

That took 5 minutes, but I wasn't able to record it with logs, so later on I tried to remove the same key which took 4 minutes:

ceph auth del client.openstack

In this case I have 3 monitors where this command went to "mon2":

2013-04-29 14:22:15.299854 7fd523711700  1 -- 10.23.24.9:6789/0 <== client.? 10.23.24.4:0/29190 5 ==== mon_command(auth del client.openstack v 0) v1 ==== 73+0+0 (2842674342 0 0) 0xc2af000 con 0xad8a580
2013-04-29 14:22:15.299991 7fd523711700  0 mon.mon2@1(peon) e1 handle_command mon_command(auth del client.openstack v 0) v1
2013-04-29 14:22:15.376928 7fd523711700 10 mon.mon2@1(peon).auth v764 preprocess_query mon_command(auth del client.openstack v 0) v1 from client.? 10.23.24.4:0/29190
2013-04-29 14:22:15.396140 7fd523711700 10 mon.mon2@1(peon) e1 forward_request 828 request mon_command(auth del client.openstack v 0) v1
2013-04-29 14:22:15.396161 7fd523711700  1 -- 10.23.24.9:6789/0 --> mon.0 10.23.24.8:6789/0 -- forward(mon_command(auth del client.openstack v 0) v1) to leader v1 -- ?+0 0xc2ab300
2013-04-29 14:26:09.397449 7fd523f12700 10 mon.mon2@1(leader) e1  requeue for self tid 828 mon_command(auth del client.openstack v 0) v1
2013-04-29 14:26:09.432538 7fd523f12700  0 mon.mon2@1(leader) e1 handle_command mon_command(auth del client.openstack v 0) v1
2013-04-29 14:26:12.245758 7fd523711700 10 mon.mon2@1(leader).auth v764 preprocess_query mon_command(auth del client.openstack v 0) v1 from client.? 10.23.24.4:0/29190
2013-04-29 14:26:12.261239 7fd523711700 10 mon.mon2@1(leader).auth v764 prepare_update mon_command(auth del client.openstack v 0) v1 from client.? 10.23.24.4:0/29190

Here you see that "mon2" forwarded the request to mon1 (10.23.24.8).

Looking in mon1's logs I see:

2013-04-29 14:27:19.907293 7fdf507ac700  1 -- 10.23.24.8:6789/0 <== mon.1 10.23.24.9:6789/0 5337 ==== forward(mon_command(auth del client.openstack v 0) v1) to leader v1 ==== 343+0+0 (3205198275 0 0) 0x116d9000 con 0x2e409a0
2013-04-29 14:27:19.907431 7fdf507ac700  0 mon.mon1@0(leader) e1 handle_command mon_command(auth del client.openstack v 0) v1

Looking at the timestamps this is very weird. How can mon1 report that this happened at 14:27 while mon2 reports it happend at 14:22 ~ 14:26?

I verified, the clocks on all 3 monitors are synced, there is no difference between them.

The logs from both mon1 and mon2 are attached.


Files

mon1.log.gz (911 KB) mon1.log.gz Wido den Hollander, 04/29/2013 06:01 AM
mon2.log.gz (840 KB) mon2.log.gz Wido den Hollander, 04/29/2013 06:01 AM
ceph-mon.londensekaai.log.gz (540 KB) ceph-mon.londensekaai.log.gz Single monitor setup logs Wido den Hollander, 05/01/2013 01:30 PM
bt.txt (21.7 KB) bt.txt Wido den Hollander, 05/01/2013 02:18 PM
bt-mon2.txt (24.3 KB) bt-mon2.txt Backtrace mon2 02-05-2013 Wido den Hollander, 05/02/2013 10:56 AM
bt-mon3.txt (15.8 KB) bt-mon3.txt Backtrace mon3 02-05-2013 Wido den Hollander, 05/02/2013 10:56 AM
bt-mon1.txt (8.52 KB) bt-mon1.txt Backtrace mon1 03-05-2013 Wido den Hollander, 05/03/2013 02:25 AM

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #4896: leveldb: stuck on leveldb::DBImpl::CompactRange()Resolved05/02/2013

Actions
Actions

Also available in: Atom PDF