Bug #61560
openupon concatenating a 2GB file several times, the s3select failed to parse(CSV) the input object
0%
Description
it seems the embedded CSV parser, identify a wrong CSV-parsing error event (a non-close quote)
(upon working with Trino, there is no need to concatenate for the purpose of appending. Trino is able to do that "on the fly")
Updated by Mark Kogan 10 months ago
is this happening with the parking citations csv?
Updated by Gal Salomon 10 months ago
Mark Kogan wrote:
is this happening with the parking citations csv?
yes, i had tried only with parking citations csv.
Updated by Mark Kogan 10 months ago
Gal Salomon wrote:
Mark Kogan wrote:
is this happening with the parking citations csv?
yes, i had tried only with parking citations csv.
in my tests noticed a problem with the concatenation as a result of the 1st line being a header:
that had to be remove at concatenation time
❯ head parking-citations.csv Ticket number,Issue Date,Issue time,Meter Id,Marked Time,RP State Plate,Plate Expiry Date,VIN,Make,Body Style,Color,Location,Route,Agency,Violation code,Violation Description,Fine amount,Latitude,Longitude,Agency Description,Color Description,Body Style Description ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^>>> 1103341116,2015-12-21T00:00:00.000,1251,,,CA,200304,,HOND,PA,GY,13147 WELBY WAY,01521,1,4000A1,NO EVIDENCE OF REG,50,99999,99999,,, 1103700150,2015-12-21T00:00:00.000,1435,,,CA,201512,,GMC,VN,WH,525 S MAIN ST,1C51,1,4000A1,NO EVIDENCE OF REG,50,99999,99999,,, 1104803000,2015-12-21T00:00:00.000,2055,,,CA,201503,,NISS,PA,BK,200 WORLD WAY,2R2,2,8939,WHITE CURB,58,6439997.9,1802686.4,,, 1104820732,2015-12-26T00:00:00.000,1515,,,CA,,,ACUR,PA,WH,100 WORLD WAY,2F11,2,000,17104h,,6440041.1,1802686.2,,,
Updated by Gal Salomon 10 months ago
Mark Kogan wrote:
Gal Salomon wrote:
Mark Kogan wrote:
is this happening with the parking citations csv?
yes, i had tried only with parking citations csv.
in my tests noticed a problem with the concatenation as a result of the 1st line being a header:
that had to be remove at concatenation time
[...]
yes, the first line should be remove (one header per single object)
but
i don't see how it cause bad CSV parsing.
Updated by Gal Salomon 6 months ago
- Status changed from New to Pending Backport
- Tags set to s3select backport_processed
- Backport set to reef
Updated by Gal Salomon 6 months ago
- Copied to Backport #63297: reef: upon concatenating a 2GB file several times, the s3select failed to parse(CSV) the input object added
Updated by Gal Salomon 6 months ago
- Assignee deleted (
Gal Salomon) - Target version deleted (
v17.2.7)