Post by Dave Rolsky Post by Gordon Henriksen
It also seems extremely heavyweight (all sorts of pointless autoloading
and otherwise bloat) for something that could easily be accomplished with
a regular expression and a loop. I don't see any reason whatsoever to
My concern is that your code does not really handle all the possibilities.
To pick one obvious case, you don't allow for single quotes as a quote
delimiter. Since Apache expects double quotes around longer strings and
will strip them by itself, users may want to use single quotes.
Most likely. The code wasn't meant to be taken as gospel. Change the four
double quotes to single quotes and that's done. CSV[_XS] only allow for
one quoting character, which defaults to be a double quote; I used the
Post by Dave Rolsky
OTOH, if you can write a spec of exactly what your code does and does not
handle and people agree that that is good enough then I'm happy to use it.
The subroutine handles an arbitrary sequence sequence of words, quoted
strings, and whitespace.
A word may contain anything but whitespace or quotes.
A quoted string may contain anything but a quote.
Whitespace was defined to include commas.
Characters within words and strings can be escaped with backslashes.
Escape pairs are simply replaced with the escaped character, so the only
characters which are meaningful to escape are " and \.
'word, word, word' -> ("word", "word", "word")
'word word word' -> ("word", "word", "word")
'"word" "word" "word"' -> ("word", "word", "word")
'"\r\n", \ \, ' -> ("rn", " ,")
The only weird-looking thing this does is to not require whitespace at the
boundaries of quoted strings.
'word"quoted"' -> ("word", "quoted")
'"word""word"' -> ("word", "quoted")
Its only error cases are unbalanced quotes and escapes at EOS:
'this is, too: \\' -> error!
'this is an error: "' -> error!
The specific quote and escape characters can be changed trivially; it's
only 20 lines of code, after all.