Project

General

Profile

Actions

Bug #11204

closed

ceph-objectstore-tool: "error (24) Too many open files not handled on operation" during import of a large number of objects

Added by Yuri Weinstein about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
teuthology
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.front.sepia.ceph.com/teuthology-2015-03-22_09:17:25-powercycle-hammer-distro-basic-multi/
Jobs: ['815837', '815865']
Logs for one: http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-22_09:17:25-powercycle-hammer-distro-basic-multi/815865/

 015-03-22T09:29:11.936 INFO:teuthology.orchestra.run.burnupi10.stdout:Write 4df9daee/100000001a6.00000000/head//2
2015-03-22T09:29:11.937 INFO:teuthology.orchestra.run.burnupi10.stdout:Write e96b4bee/10000002010.00000000/head//2
2015-03-22T09:29:11.938 INFO:tasks.workunit.client.0.plana80.stdout:linux-3.2.9/arch/powerpc/include/asm/atomic.h
2015-03-22T09:29:11.939 INFO:teuthology.orchestra.run.burnupi10.stderr:os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f3a863f7700 time 2015-03-22 09:29:11.938804
2015-03-22T09:29:11.939 INFO:teuthology.orchestra.run.burnupi10.stderr:os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")
2015-03-22T09:29:11.939 INFO:teuthology.orchestra.run.burnupi10.stderr: ceph version 0.93-164-g120300f (120300febebdc34c23fc24633edfdc8faf65e93c)
2015-03-22T09:29:11.939 INFO:teuthology.orchestra.run.burnupi10.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbd239b]
2015-03-22T09:29:11.939 INFO:teuthology.orchestra.run.burnupi10.stderr: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x99aeb2]
2015-03-22T09:29:11.940 INFO:teuthology.orchestra.run.burnupi10.stderr: 3: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x9a1544]
2015-03-22T09:29:11.940 INFO:teuthology.orchestra.run.burnupi10.stderr: 4: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x9a16ca]
2015-03-22T09:29:11.940 INFO:teuthology.orchestra.run.burnupi10.stderr: 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbc36ce]
2015-03-22T09:29:11.940 INFO:teuthology.orchestra.run.burnupi10.stderr: 6: (ThreadPool::WorkThread::entry()+0x10) [0xbc4770]
2015-03-22T09:29:11.941 INFO:teuthology.orchestra.run.burnupi10.stderr: 7: (()+0x8182) [0x7f3a90848182]
2015-03-22T09:29:11.941 INFO:teuthology.orchestra.run.burnupi10.stderr: 8: (clone()+0x6d) [0x7f3a8c73a38d]
2015-03-22T09:29:11.941 INFO:teuthology.orchestra.run.burnupi10.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-03-22T09:29:11.942 INFO:tasks.workunit.client.0.plana80.stdout:linux-3.2.9/arch/powerpc/include/asm/auxvec.h
2015-03-22T09:29:11.942 INFO:teuthology.orchestra.run.burnupi10.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
2015-03-22T09:29:11.942 INFO:teuthology.orchestra.run.burnupi10.stderr:*** Caught signal (Aborted) **
2015-03-22T09:29:11.942 INFO:teuthology.orchestra.run.burnupi10.stderr: in thread 7f3a863f7700
2015-03-22T09:29:11.944 INFO:teuthology.orchestra.run.burnupi10.stderr: ceph version 0.93-164-g120300f (120300febebdc34c23fc24633edfdc8faf65e93c)
2015-03-22T09:29:11.944 INFO:teuthology.orchestra.run.burnupi10.stderr: 1: ceph-objectstore-tool() [0xaf09ea]
2015-03-22T09:29:11.944 INFO:teuthology.orchestra.run.burnupi10.stderr: 2: (()+0x10340) [0x7f3a90850340]
2015-03-22T09:29:11.944 INFO:teuthology.orchestra.run.burnupi10.stderr: 3: (gsignal()+0x39) [0x7f3a8c675f89]
2015-03-22T09:29:11.945 INFO:teuthology.orchestra.run.burnupi10.stderr: 4: (abort()+0x148) [0x7f3a8c679398]
2015-03-22T09:29:11.945 INFO:teuthology.orchestra.run.burnupi10.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f3a8cf816b5]
2015-03-22T09:29:11.945 INFO:teuthology.orchestra.run.burnupi10.stderr: 6: (()+0x5e836) [0x7f3a8cf7f836]
2015-03-22T09:29:11.945 INFO:teuthology.orchestra.run.burnupi10.stderr: 7: (()+0x5e863) [0x7f3a8cf7f863]
2015-03-22T09:29:11.945 INFO:teuthology.orchestra.run.burnupi10.stderr: 8: (()+0x5eaa2) [0x7f3a8cf7faa2]
2015-03-22T09:29:11.945 INFO:teuthology.orchestra.run.burnupi10.stderr: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x278) [0xbd2588]
2015-03-22T09:29:11.945 INFO:teuthology.orchestra.run.burnupi10.stderr: 10: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0xa52) [0x99aeb2]
2015-03-22T09:29:11.945 INFO:teuthology.orchestra.run.burnupi10.stderr: 11: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x64) [0x9a1544]
2015-03-22T09:29:11.946 INFO:teuthology.orchestra.run.burnupi10.stderr: 12: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x16a) [0x9a16ca]
2015-03-22T09:29:11.946 INFO:teuthology.orchestra.run.burnupi10.stderr: 13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0xbc36ce]
2015-03-22T09:29:11.946 INFO:teuthology.orchestra.run.burnupi10.stderr: 14: (ThreadPool::WorkThread::entry()+0x10) [0xbc4770]
2015-03-22T09:29:11.946 INFO:teuthology.orchestra.run.burnupi10.stderr: 15: (()+0x8182) [0x7f3a90848182]
2015-03-22T09:29:11.946 INFO:teuthology.orchestra.run.burnupi10.stderr: 16: (clone()+0x6d) [0x7f3a8c73a38d]

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #11594: error (24) Too many open files not handled on operationResolvedDavid Zafman05/12/2015

Actions
Actions #1

Updated by David Zafman about 9 years ago

The last line of the remote/burnupi10/log/ceph-osd.1.log.gz is 30 seconds before the assert:

2015-03-22 09:28:37.797638 7fbe13bc8700 10 osd.1 pg_epoch: 63 pg[1.0( v 63'5135 (43'2108,63'5135] local-les=59 n=38 ec=4 les/c 59/59 58/58/58) [1,0] r=0 lpr=58 luod=63'5055 lua=63'5134 crt=63'5054 lcod 63'5054 mlcod 63'5054 active+clean] op_applied version 63'5135

Possibly a power cycle caused unflushed log data to be lost. We need to see this again with logs.

Actions #2

Updated by David Zafman about 9 years ago

  • Status changed from New to Need More Info
  • Assignee set to David Zafman
Actions #4

Updated by David Zafman about 9 years ago

Can there be a configuration issue? I saw "error (24) Too many open files not handled on operation" during an import. This is new. So while there could have a descriptor leak, I have to wonder if the open descriptor limit is lower on some particular test machine(s).

During import

2015-03-24T06:48:29.022 INFO:teuthology.orchestra.run.plana49:Running: 'sudo ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --journal-path /var/lib/ceph/osd/ceph-0/journal --log-file=/var/log/ceph/objectstore_tool.\\$pid.log --op import --file /home/ubuntu/cephtest/data/exp.2.5.0'

    -7> 2015-03-24 06:48:31.649259 7f5b5935f700 10 filestore(/var/lib/ceph/osd/ceph-0) _do_transaction on 0x7ffcc836f840
    -6> 2015-03-24 06:48:31.649267 7f5b5935f700 15 filestore(/var/lib/ceph/osd/ceph-0) touch 2.5_head/135915b5/10000000589.00000000/head//2
    -5> 2015-03-24 06:48:31.649339 7f5b5935f700 10 filestore(/var/lib/ceph/osd/ceph-0) error opening file /var/lib/ceph/osd/ceph-0/current/2.5_head/DIR_5/10000000589.00000000__head_135915B5__2 with flags=66: (24) Too many open files
    -4> 2015-03-24 06:48:31.649361 7f5b5935f700  0 filestore(/var/lib/ceph/osd/ceph-0)  error (24) Too many open files not handled on operation 0x36be000 (24999.0.0, or op 0, counting from 0)
    -3> 2015-03-24 06:48:31.649367 7f5b5935f700  0 filestore(/var/lib/ceph/osd/ceph-0) unexpected error code
    -2> 2015-03-24 06:48:31.649368 7f5b5935f700  0 filestore(/var/lib/ceph/osd/ceph-0)  transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "touch",
            "collection": "2.5_head",
            "oid": "135915b5\/10000000589.00000000\/head\/\/2" 
        },
        {
            "op_num": 1,
            "op_name": "write",
            "collection": "2.5_head",
            "oid": "135915b5\/10000000589.00000000\/head\/\/2",
            "length": 3106,
            "offset": 0,
            "bufferlist length": 3106
        },
        {
            "op_num": 2,
            "op_name": "setattrs",
            "collection": "2.5_head",
            "oid": "135915b5\/10000000589.00000000\/head\/\/2",
            "attr_lens": {
                "_": 255,
                "_parent": 271,
                "snapset": 31
            }
        },
        {
            "op_num": 3,
            "op_name": "omap_setheader",
            "collection": "2.5_head",
            "oid": "135915b5\/10000000589.00000000\/head\/\/2",
            "header_length": "0" 
        }
    ]
}
Actions #5

Updated by David Zafman about 9 years ago

  • Subject changed from "Assertion: os/FileStore.cc: 2757: FAILED assert(0 == "unexpected error")" in powercycle-hammer-distro-basic-multi run to ceph-objectstore-tool: "error (24) Too many open files not handled on operation" during import of a large number of objects
  • Status changed from Need More Info to 4
  • Assignee changed from David Zafman to Yuri Weinstein

Verify that this isn't a configuration issue.

Actions #6

Updated by Yuri Weinstein about 9 years ago

  • Status changed from 4 to New
  • Assignee changed from Yuri Weinstein to David Zafman

the configuration in question above (from http://qa-proxy.ceph.com/teuthology/teuthology-2015-03-23_16:52:36-powercycle-hammer-testing-basic-multi/818339/teuthology.log):

powercycle/osd/{clusters/3osd-1per-target.yaml fs/btrfs.yaml powercycle/default.yaml tasks/cfuse_workunit_kernel_untar_build.yaml}

passed on giant branch, run http://pulpito.front.sepia.ceph.com/teuthology-2014-12-20_14:20:41-powercycle-giant-distro-basic-multi/

job [670877]:

teuthology@teuthology:/a/teuthology-2014-12-20_14:20:41-powercycle-giant-distro-basic-multi$ grep -r "powercycle/osd/{clusters/3osd-1per-target.yaml fs/btrfs.yaml powercycle/default.yaml tasks/cfuse_workunit_kernel_untar_build.yaml}" *.log
results.log:[670877] powercycle/osd/{clusters/3osd-1per-target.yaml fs/btrfs.yaml powercycle/default.yaml tasks/cfuse_workunit_kernel_untar_build.yaml

and powercycle suite had no commits in last ~6 months

Actions #7

Updated by David Zafman about 9 years ago

I see on burnupi10 the default maximum open files is 1024. A teuthology task bumps this up to 16384. Also, I see lots of places running "adjust-ulimits." I guess after power cycles the test user maximum open file limit goes back to 1024. So maybe everytime ceph-objectstore-tool is executed it needs to "adjust-ulimits."

Actions #8

Updated by Samuel Just about 9 years ago

I think you are right.

Actions #9

Updated by David Zafman about 9 years ago

  • Status changed from New to In Progress
Actions #10

Updated by David Zafman about 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #11

Updated by David Zafman about 9 years ago

  • Category set to teuthology
  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF