[Variant] Enahcne bracket access for VariantPath#9479
[Variant] Enahcne bracket access for VariantPath#9479klion26 wants to merge 2 commits intoapache:mainfrom
Conversation
scovich
left a comment
There was a problem hiding this comment.
Several comments, but only one that would block merge (error handling)
| .and_then(|s| s.strip_suffix('\'')) | ||
| { | ||
| // Quoted field name, e.g., ['field'] or ['123'] | ||
| VariantPathElement::field(inner.to_string()) |
There was a problem hiding this comment.
What are the string-escape rules here? Can the user embed a single quote as \'? as ''?
Or do they have to uri-encode it as %27? (do we even handle that?)
There was a problem hiding this comment.
Update: Here's the relevant grammar from JSONpath spec:
https://www.rfc-editor.org/rfc/rfc9535#name-selector
AI web search overview suggests the actual handling of delimiters and escapes tends to be vendor-specific (e.g. many database engines double up the delimiter as an escape)
There was a problem hiding this comment.
Ah! This function already handles \' escape syntax, L247 above.
parquet-variant/src/utils.rs
Outdated
| } else { | ||
| match unescaped.parse() { | ||
| Ok(idx) => VariantPathElement::index(idx), | ||
| Err(_) => VariantPathElement::field(unescaped), |
There was a problem hiding this comment.
This seems awkward? Especially in a fallible function? I don't think we should accept an unquoted string like [abc] or an invalid number like [123x]?
(I thought the jsonpath spec specifically forbade it, but I can't find the reference now)
There was a problem hiding this comment.
Thanks for pointing this out. At the time, I hadn't shifted my perspective from the previous one and thought Err(_) would be treated as a field. Double-checked the JSON web and Databricks web; the fields are both quoted. This has since been adjusted.
| Ok(idx) => VariantPathElement::index(idx), | ||
| Err(_) => VariantPathElement::field(unescaped), | ||
| let element = if let Some(inner) = unescaped | ||
| .strip_prefix('\'') |
There was a problem hiding this comment.
JSONpath allows either ' or " as the string delimiter. Do we want to support both? Or take the intersection between SQL and JSONpath and only support single quotes?
There was a problem hiding this comment.
Followed the examples on databricks web previously, this is a valid json syntax, not don't add much more complexity, added " now.
e1d08e1 to
5f8582e
Compare
| Ok(idx) => VariantPathElement::index(idx), | ||
| Err(_) => VariantPathElement::field(unescaped), | ||
| let element = if let Some(inner) = unescaped | ||
| .strip_prefix('\'') |
There was a problem hiding this comment.
Followed the examples on databricks web previously, this is a valid json syntax, not don't add much more complexity, added " now.
parquet-variant/src/utils.rs
Outdated
| } else { | ||
| match unescaped.parse() { | ||
| Ok(idx) => VariantPathElement::index(idx), | ||
| Err(_) => VariantPathElement::field(unescaped), |
There was a problem hiding this comment.
Thanks for pointing this out. At the time, I hadn't shifted my perspective from the previous one and thought Err(_) would be treated as a field. Double-checked the JSON web and Databricks web; the fields are both quoted. This has since been adjusted.
|
Close and reopen to trigger the CI, CI failed with |
|
manually download the |
scovich
left a comment
There was a problem hiding this comment.
Looks great! I'll merge later today if nobody else has any comments.
| match unescaped.parse() { | ||
| Ok(idx) => VariantPathElement::index(idx), | ||
| Err(_) => VariantPathElement::field(unescaped), | ||
| Err(_) => { | ||
| return Err(ArrowError::ParseError(format!( | ||
| "Invalid token in bracket request: `{unescaped}`. Expected a quoted string or a number(e.g., `['field']` or `[123]`)" | ||
| ))); | ||
| } |
There was a problem hiding this comment.
nit: consider using let-else?
let Ok(id) = unescaped.parse() else {
return Err(...);
};
VariantPathElement::index(idx)
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No