Skip to content

[Variant] Enahcne bracket access for VariantPath#9479

Open
klion26 wants to merge 2 commits intoapache:mainfrom
klion26:variant-path-followup
Open

[Variant] Enahcne bracket access for VariantPath#9479
klion26 wants to merge 2 commits intoapache:mainfrom
klion26:variant-path-followup

Conversation

@klion26
Copy link
Member

@klion26 klion26 commented Feb 25, 2026

Which issue does this PR close?

What changes are included in this PR?

  • Fix the typo
  • Enhance the bracket access for the variant path

Are these changes tested?

  • Add some tests to cover the logic

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Feb 25, 2026
@klion26
Copy link
Member Author

klion26 commented Feb 25, 2026

@scovich Please help review this when you're free. Thanks. This is the follow-up for #9276

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments, but only one that would block merge (error handling)

.and_then(|s| s.strip_suffix('\''))
{
// Quoted field name, e.g., ['field'] or ['123']
VariantPathElement::field(inner.to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the string-escape rules here? Can the user embed a single quote as \'? as ''?
Or do they have to uri-encode it as %27? (do we even handle that?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Here's the relevant grammar from JSONpath spec:
https://www.rfc-editor.org/rfc/rfc9535#name-selector

AI web search overview suggests the actual handling of delimiters and escapes tends to be vendor-specific (e.g. many database engines double up the delimiter as an escape)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! This function already handles \' escape syntax, L247 above.

} else {
match unescaped.parse() {
Ok(idx) => VariantPathElement::index(idx),
Err(_) => VariantPathElement::field(unescaped),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems awkward? Especially in a fallible function? I don't think we should accept an unquoted string like [abc] or an invalid number like [123x]?

(I thought the jsonpath spec specifically forbade it, but I can't find the reference now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. At the time, I hadn't shifted my perspective from the previous one and thought Err(_) would be treated as a field. Double-checked the JSON web and Databricks web; the fields are both quoted. This has since been adjusted.

Ok(idx) => VariantPathElement::index(idx),
Err(_) => VariantPathElement::field(unescaped),
let element = if let Some(inner) = unescaped
.strip_prefix('\'')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSONpath allows either ' or " as the string delimiter. Do we want to support both? Or take the intersection between SQL and JSONpath and only support single quotes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed the examples on databricks web previously, this is a valid json syntax, not don't add much more complexity, added " now.

@klion26 klion26 force-pushed the variant-path-followup branch from e1d08e1 to 5f8582e Compare February 27, 2026 11:54
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich Thanks for the review. I've addressed the comments, please take another look.

Ok(idx) => VariantPathElement::index(idx),
Err(_) => VariantPathElement::field(unescaped),
let element = if let Some(inner) = unescaped
.strip_prefix('\'')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed the examples on databricks web previously, this is a valid json syntax, not don't add much more complexity, added " now.

} else {
match unescaped.parse() {
Ok(idx) => VariantPathElement::index(idx),
Err(_) => VariantPathElement::field(unescaped),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. At the time, I hadn't shifted my perspective from the previous one and thought Err(_) would be treated as a field. Double-checked the JSON web and Databricks web; the fields are both quoted. This has since been adjusted.

@klion26
Copy link
Member Author

klion26 commented Feb 27, 2026

Close and reopen to trigger the CI, CI failed with protoc download

@klion26 klion26 closed this Feb 27, 2026
@klion26 klion26 reopened this Feb 27, 2026
@klion26
Copy link
Member Author

klion26 commented Feb 27, 2026

manually download the protoc-21.4-win64.zip successfully from ur in the action yml, try to reopen this to see the result

@klion26 klion26 closed this Feb 27, 2026
@klion26 klion26 reopened this Feb 27, 2026
Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'll merge later today if nobody else has any comments.

Comment on lines 282 to 288
match unescaped.parse() {
Ok(idx) => VariantPathElement::index(idx),
Err(_) => VariantPathElement::field(unescaped),
Err(_) => {
return Err(ArrowError::ParseError(format!(
"Invalid token in bracket request: `{unescaped}`. Expected a quoted string or a number(e.g., `['field']` or `[123]`)"
)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider using let-else?

let Ok(id) = unescaped.parse() else {
    return Err(...);
};
VariantPathElement::index(idx)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Followup for support ['fieldName'] in VariantPath

2 participants