Bug #2393: objecter: dropping messages (old connection being used) - Ceph - Ceph

Actions

Copy link

Bug #2393

closed

objecter: dropping messages (old connection being used)

Added by Yehuda Sadeh almost 12 years ago. Updated almost 12 years ago.

Status:

Duplicate

Priority:

Urgent

Assignee:

Category:

Target version:

% Done:

Source:

Community (dev)

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2012-05-10 02:01:02.593957 7f9cda1b0700  0 -- [2607:f298:4:2243::5795]:0/1019484 send_message dropped message ping v1 because of no pipe on con 0x7f9cc459e660
2012-05-10 02:01:02.593970 7f9cda1b0700  1 -- [2607:f298:4:2243::5795]:0/1019484 --> [2607:f298:4:2243::7066]:6807/2644 -- ping v1 -- ?+0 0x7f9cbc3d2f40 con 0x7f9cc4144db0

2012-05-10 02:01:02.668326 7f9cdc2b5700  1 -- [2607:f298:4:2243::5795]:0/1019484 <== osd.290 [2607:f298:4:2243::7066]:6807/2644 46191 ==== watch-notify(c=1 v=1 i=740219 opcode=1) v1 ==== 543+0+0 (3452665825 0 0) 0x7f9cb43ac330 con 0x7f9cc4144db0

...
2012-05-10 02:01:04.271134 7f9c9cff9700  0 -- [2607:f298:4:2243::5795]:0/1019484 send_message dropped message osd_op(client.92928.0:753509 .dir.93029.10938 [call rgw.bucket_prepare_op] 5.6182dfe9) v4 because of no pipe on con 0x7f9cc459e660

One point to note is that we do have an open connection to that osd (where we got the watch-notify msg from). However, we still try to send requests over the broken connection.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Yehuda Sadeh almost 12 years ago

One more point to note is that all the following ping messages show the same issue (dropped message).

Actions

Copy link

Updated by Sage Weil almost 12 years ago

Priority changed from Normal to Urgent
Source changed from Development to Community (dev)

Actions

Copy link

Updated by Yehuda Sadeh almost 12 years ago

I think the ping is a red herring. In tick() we go over all the regular sessions, and then over all the lingering sessions for determining the ping destinations. The only option for both getting the "dropped message" message and the ping message is by having two different connection objects to the same destination: one regular and another lingering. Since we see the ping going out (at least the message), I assume the regular is broken.

Actions

Copy link

Updated by Yehuda Sadeh almost 12 years ago

Ah, need to look at the logs again. There's nothing in this excerpt to say that both requests were supposed to go to the same osd.

Actions

Copy link

Updated by Yehuda Sadeh almost 12 years ago

Objecter::ms_handle_reset() was not acquiring a lock. ms_handle_reset() racing with any operation that grabs the session may lead to such issues, where the session->con is bad.

Actions

Copy link