Project

General

Profile

Actions

Bug #61560

open

upon concatenating a 2GB file several times, the s3select failed to parse(CSV) the input object

Added by Gal Salomon 11 months ago. Updated 6 months ago.

Status:
Pending Backport
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
s3select backport_processed
Backport:
reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

it seems the embedded CSV parser, identify a wrong CSV-parsing error event (a non-close quote)
(upon working with Trino, there is no need to concatenate for the purpose of appending. Trino is able to do that "on the fly")


Related issues 1 (1 open0 closed)

Copied to rgw - Backport #63297: reef: upon concatenating a 2GB file several times, the s3select failed to parse(CSV) the input object In ProgressGal SalomonActions
Actions #1

Updated by Gal Salomon 11 months ago

  • Assignee set to Gal Salomon
Actions #2

Updated by Mark Kogan 10 months ago

is this happening with the parking citations csv?

Actions #3

Updated by Gal Salomon 10 months ago

Mark Kogan wrote:

is this happening with the parking citations csv?

yes, i had tried only with parking citations csv.

Actions #4

Updated by Mark Kogan 10 months ago

Gal Salomon wrote:

Mark Kogan wrote:

is this happening with the parking citations csv?

yes, i had tried only with parking citations csv.

in my tests noticed a problem with the concatenation as a result of the 1st line being a header:
that had to be remove at concatenation time

❯ head parking-citations.csv
Ticket number,Issue Date,Issue time,Meter Id,Marked Time,RP State Plate,Plate Expiry Date,VIN,Make,Body Style,Color,Location,Route,Agency,Violation code,Violation Description,Fine amount,Latitude,Longitude,Agency Description,Color Description,Body Style Description
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^>>>
1103341116,2015-12-21T00:00:00.000,1251,,,CA,200304,,HOND,PA,GY,13147 WELBY WAY,01521,1,4000A1,NO EVIDENCE OF REG,50,99999,99999,,,
1103700150,2015-12-21T00:00:00.000,1435,,,CA,201512,,GMC,VN,WH,525 S MAIN ST,1C51,1,4000A1,NO EVIDENCE OF REG,50,99999,99999,,,
1104803000,2015-12-21T00:00:00.000,2055,,,CA,201503,,NISS,PA,BK,200 WORLD WAY,2R2,2,8939,WHITE CURB,58,6439997.9,1802686.4,,,
1104820732,2015-12-26T00:00:00.000,1515,,,CA,,,ACUR,PA,WH,100 WORLD WAY,2F11,2,000,17104h,,6440041.1,1802686.2,,,

Actions #5

Updated by Gal Salomon 10 months ago

Mark Kogan wrote:

Gal Salomon wrote:

Mark Kogan wrote:

is this happening with the parking citations csv?

yes, i had tried only with parking citations csv.

in my tests noticed a problem with the concatenation as a result of the 1st line being a header:
that had to be remove at concatenation time
[...]

yes, the first line should be remove (one header per single object)
but
i don't see how it cause bad CSV parsing.

Actions #6

Updated by Gal Salomon 6 months ago

  • Pull request ID set to 53351
Actions #7

Updated by Gal Salomon 6 months ago

  • Status changed from New to Pending Backport
  • Tags set to s3select backport_processed
  • Backport set to reef
Actions #8

Updated by Gal Salomon 6 months ago

  • Copied to Backport #63297: reef: upon concatenating a 2GB file several times, the s3select failed to parse(CSV) the input object added
Actions #9

Updated by Gal Salomon 6 months ago

  • Assignee deleted (Gal Salomon)
  • Target version deleted (v17.2.7)
Actions

Also available in: Atom PDF