URL-encode partition field names in file locations#1457
URL-encode partition field names in file locations#1457kevinjqliu merged 13 commits intoapache:mainfrom
Conversation
|
BTW #1462 is merged, could you rebase this PR? |
999d753 to
f5a35de
Compare
| spec_id=3, | ||
| ) | ||
|
|
||
| record = Record(**{"my#str%bucket": "my+str", "other str+bucket": "( )", "my!int:bucket": 10}) # type: ignore |
There was a problem hiding this comment.
mypy complains here and elsewhere but I think it's fine
|
|
||
| # Both partition names fields and values should be URL encoded, with spaces mapping to plus signs, to match the Java | ||
| # behaviour: https://github.com/apache/iceberg/blob/ca3db931b0f024f0412084751ac85dd4ef2da7e7/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L198-L204 | ||
| assert spec.partition_to_path(record, schema) == "my%23str%25bucket=my%2Bstr/other+str%2Bbucket=%28+%29/my%21int%3Abucket=10" |
There was a problem hiding this comment.
Cross-checked with Java implementation (integration tests will do this eventually), in particular WRT to ' ' and '+' encoding. It is consistent.
|
Done, @kevinjqliu. Fails due to #1457 (comment) but will think over it. FYI, am away for a little bit now so will pick this back up shortly in the new year! (Feel free to take over if v urgent 😄) |
|
Thanks for the PR! I've dug into the test failure a bit. Heres what I found. There's a subtle difference between
You can verify this by looking up the table partition spec. The integration test assumes that the value for After |
|
Thanks a lot for this explanation and suggestion @kevinjqliu! It sounds good. Had some time so I've made this change so tests pass - using |
| value_str = partition_field.transform.to_human_string(field_types[pos].field_type, value=data[pos]) | ||
|
|
||
| value_str = quote(value_str, safe="") | ||
| value_str = quote_plus(value_str, safe="") |
There was a problem hiding this comment.
It defaults to utf-8, so that's good 👍
| field_str = quote_plus(partition_field.name, safe="") | ||
| field_strs.append(field_str) |
There was a problem hiding this comment.
Nit, I would just collapse these:
| field_str = quote_plus(partition_field.name, safe="") | |
| field_strs.append(field_str) | |
| field_strs.append(quote_plus(partition_field.name, safe="")) |
Fokko
left a comment
There was a problem hiding this comment.
Thanks @smaheshwar-pltr for picking this up, and thanks for @kevinjqliu the review. I'll leave this one open in case Kevin has any further comments.
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! Thanks for working on this @smaheshwar-pltr
A few nit comments left on this PR but not blocking. Going to merge this PR as is and we can deal with nit comment as a followup
|
Sounds good, thanks Kevin for your excellent review here! |
I've put up #1499 for the nits. |
Closes #1458
Closes #175