Project

General

Profile

Actions

Bug #50503

closed

tasks/libcephsqlite throws "std::out_of_range"

Added by Kefu Chai about 3 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Target version:
% Done:

0%

Source:
Q/A
Tags:
backport_processed
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

rados/basic/{ceph clusters/{fixed-2 openstack} mon_election/classic msgr-failures/many msgr/async-v1only objectstore/bluestore-low-osd-mem-target rados supported-random-distro$/{rhel_8} tasks/libcephsqlite}

2021-04-23T19:54:40.360 INFO:tasks.workunit.client.0.smithi092.stderr:+ sqlite3 -cmd '.output /dev/null' -cmd '.load libcephsqlite.so' -cmd 'pragma journal_mode = PERSIST' -cmd '.open file:///cephsqlite:test_libcephsqlite.sh/baz.db?vfs=ceph' -cmd '.output stdout'
2021-04-23T19:55:00.357 INFO:tasks.workunit.client.0.smithi092.stdout:[1]+ 38807 Running                 sqlite3 -cmd '.output /dev/null' -cmd '.load libcephsqlite.so' -cmd 'pragma journal_mode = PERSIST' -cmd ".open file:///$pool:$ns/baz.db?vfs=ceph" -cmd '.output stdout
' <<< "$a" &
2021-04-23T19:55:00.358 INFO:tasks.workunit.client.0.smithi092.stderr:+ jobs -l
2021-04-23T19:55:00.359 INFO:tasks.workunit.client.0.smithi092.stderr:++ jobs -p
2021-04-23T19:55:00.360 INFO:tasks.workunit.client.0.smithi092.stderr:+ kill -STOP -- 38807
2021-04-23T19:55:00.360 INFO:tasks.workunit.client.0.smithi092.stderr:+ sleep 45
2021-04-23T19:55:45.362 INFO:tasks.workunit.client.0.smithi092.stderr:+ date
2021-04-23T19:55:45.362 INFO:tasks.workunit.client.0.smithi092.stdout:Fri Apr 23 19:55:45 UTC 2021
2021-04-23T19:55:45.363 INFO:tasks.workunit.client.0.smithi092.stderr:++ jobs -p
2021-04-23T19:55:45.363 INFO:tasks.workunit.client.0.smithi092.stderr:+ kill -CONT -- 38807
2021-04-23T19:55:45.363 INFO:tasks.workunit.client.0.smithi092.stderr:+ sleep 10
2021-04-23T19:55:45.373 INFO:tasks.workunit.client.0.smithi092.stderr:Error: near line 3: disk I/O error
2021-04-23T19:55:45.374 INFO:tasks.workunit.client.0.smithi092.stderr:terminate called after throwing an instance of 'std::out_of_range'
2021-04-23T19:55:45.374 INFO:tasks.workunit.client.0.smithi092.stderr:  what():  map::at

/a/kchai-2021-04-23_14:12:53-rados-wip-kefu-testing-2021-04-23-2026-distro-basic-smithi/6068437

i reran the failed test in hope to reproduce it, but it didn't surface again: https://pulpito.ceph.com/kchai-2021-04-24_04:07:09-rados-wip-kefu-testing-2021-04-23-2026-distro-basic-smithi/6070020/


Related issues 2 (0 open2 closed)

Has duplicate cephsqlite - Bug #51372: pacific: libcephsqlite: segmentation faultDuplicatePatrick Donnelly

Actions
Copied to cephsqlite - Backport #51431: pacific: tasks/libcephsqlite throws "std::out_of_range"RejectedActions
Actions #1

Updated by Kefu Chai about 3 years ago

Patrick, could you take a look? not sure how reproducible this issue is though.

Actions #2

Updated by Patrick Donnelly about 3 years ago

So the test "succeeded" in that we expect things errors but not an Abort/core dump:

2021-04-23T19:55:45.373 INFO:tasks.workunit.client.0.smithi092.stderr:Error: near line 3: disk I/O error
2021-04-23T19:55:45.374 INFO:tasks.workunit.client.0.smithi092.stderr:terminate called after throwing an instance of 'std::out_of_range'
2021-04-23T19:55:45.374 INFO:tasks.workunit.client.0.smithi092.stderr:  what():  map::at
2021-04-23T19:55:55.365 INFO:tasks.workunit.client.0.smithi092.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test_libcephsqlite.sh: line 129: 38807 Aborted                 (core dumped) sqlite3 -cmd '.output /dev/null' -cmd '.load libcephsqlite.so' -cmd 'pragma journal_mode = PERSIST' -cmd ".open file:///$pool:$ns/baz.db?vfs=ceph" -cmd '.output stdout' <<< "$a" 

The client log has:

2021-04-23T19:55:45.370+0000 7f02c77fe700 -1 client.4614: SimpleRADOSStriper: lock_keeper_main: baz.db: lock renewal failed: (2) No such file or directory
2021-04-23T19:55:45.370+0000 7f02edf02740  1 -- 172.21.15.92:0/1739936527 --> v1:172.21.15.92:6812/33373 -- osd_op(unknown.0.0:16177 3.19 3:9d4380ad:test_libcephsqlite.sh::baz.db.000000000000000f:head [write 2797568~4096 in=4096b] snapc 0=[] ondisk+write+known_if_redirected e21) v8 -- 0x55d703bef830 con 0x7f02cc073350
2021-04-23T19:55:45.370+0000 7f02edf02740 10 client.4614: SimpleRADOSStriper: write: baz.db:  dirty size: 65716224                                                                             2021-04-23T19:55:45.370+0000 7f02edf02740  5 cephsqlite: Write: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db] = 4096
2021-04-23T19:55:45.370+0000 7f02edf02740  5 cephsqlite: Write: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db] 65716224~4096                                                          2021-04-23T19:55:45.370+0000 7f02edf02740  5 client.4614: SimpleRADOSStriper: write: baz.db: 65716224~4096
2021-04-23T19:55:45.370+0000 7f02edf02740  5 cephsqlite: Write: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db] write failed: (108) Cannot send after transport endpoint shutdown
2021-04-23T19:55:45.371+0000 7f02d5ffb700  1 -- 172.21.15.92:0/1739936527 <== mon.1 v1:172.21.15.194:6789/0 1 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (unknown 2172502797 0 0) 0x7f02bc010080 con 0x55d7034d36e0                                                                                                                                                                2021-04-23T19:55:45.371+0000 7f02d5ffb700  1 -- 172.21.15.92:0/1739936527 --> v1:172.21.15.194:6789/0 -- auth(proto 2 148 bytes epoch 0) v1 -- 0x55d7034cd110 con 0x55d7034d36e0
2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: DeviceCharacteristics: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db]
2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: Close: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db-journal]                                                                2021-04-23T19:55:45.371+0000 7f02edf02740  5 client.4614: SimpleRADOSStriper: ~SimpleRADOSStriper: baz.db-journal:
2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: Unlock: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db] 0                                                                     2021-04-23T19:55:45.371+0000 7f02edf02740  5 client.4614: SimpleRADOSStriper: unlock: baz.db:
2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: Unlock: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db] failed: -108                                                          2021-04-23T19:55:45.371+0000 7f02d5ffb700  1 -- 172.21.15.92:0/1739936527 <== mon.1 v1:172.21.15.194:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 249+0+0 (unknown 304898747 0 0) 0x55d7034cd110 con 0x55d7034d36e0
2021-04-23T19:55:45.371+0000 7f02d5ffb700  1 -- 172.21.15.92:0/1739936527 >> v1:172.21.15.92:6790/0 conn(0x55d7034d6ff0 legacy=0x7f02bc0053e0 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2021-04-23T19:55:45.371+0000 7f02d5ffb700  1 -- 172.21.15.92:0/1739936527 >> v1:172.21.15.92:6789/0 conn(0x7f02bc007020 legacy=0x7f02bc0094b0 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2021-04-23T19:55:45.371+0000 7f02d5ffb700  1 -- 172.21.15.92:0/1739936527 --> v1:172.21.15.194:6789/0 -- mon_subscribe({config=0+,mgrmap=0+,monmap=2+,osdmap=22}) v3 -- 0x7f02bc012620 con 0x55d7034d36e0
2021-04-23T19:55:45.371+0000 7f02d5ffb700  1 -- 172.21.15.92:0/1739936527 <== mon.1 v1:172.21.15.194:6789/0 3 ==== mon_map magic: 0 v1 ==== 308+0+0 (unknown 2178755549 0 0) 0x7f02c8001400 con 0x55d7034d36e0                                                                                                                                                                                2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: FileControl: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db] 20, 0x7ffc1ef2a684                                               2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: DeviceCharacteristics: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db]
2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: Unlock: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db] 0                                                                     2021-04-23T19:55:45.371+0000 7f02edf02740  5 client.4614: SimpleRADOSStriper: unlock: baz.db:
2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: Unlock: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db] failed: -108
2021-04-23T19:55:45.371+0000 7f02edf02740  5 cephsqlite: Close: (client.4614) [cephsqlite:test_libcephsqlite.sh/baz.db]
2021-04-23T19:55:45.371+0000 7f02edf02740  5 client.4614: SimpleRADOSStriper: ~SimpleRADOSStriper: baz.db:
2021-04-23T19:55:45.371+0000 7f02edf02740  5 client.4614: SimpleRADOSStriper: unlock: baz.db:
2021-04-23T19:55:45.371+0000 7f02d5ffb700  1 -- 172.21.15.92:0/1739936527 <== mon.1 v1:172.21.15.194:6789/0 4 ==== config(0 keys) v1 ==== 4+0+0 (unknown 0 0 0) 0x7f02c80057c0 con 0x55d7034d36e0

So the lock keeper noticed it lost the lock and "self blocklists". The weird thing here is libcephsqlite.cc/SimpleRADOSStriper.* do not use std::map so this failure must be in librados somewhere? Unfortunately I suspect sqlite3 installs a signal handler for SIGABRT so we don't get the typical dump in the client log. Hard to tell...

Actions #3

Updated by Patrick Donnelly almost 3 years ago

  • Status changed from New to Fix Under Review
  • Target version set to v17.0.0
  • Source set to Q/A
  • Backport set to pacific
  • Pull request ID set to 42035

Probably fixed by this PR.

Actions #4

Updated by Patrick Donnelly almost 3 years ago

  • Related to Bug #51372: pacific: libcephsqlite: segmentation fault added
Actions #5

Updated by Kefu Chai almost 3 years ago

  • Related to deleted (Bug #51372: pacific: libcephsqlite: segmentation fault)
Actions #6

Updated by Kefu Chai almost 3 years ago

  • Has duplicate Bug #51372: pacific: libcephsqlite: segmentation fault added
Actions #7

Updated by Patrick Donnelly almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #8

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #51431: pacific: tasks/libcephsqlite throws "std::out_of_range" added
Actions #9

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #10

Updated by Patrick Donnelly about 1 year ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF