Add flag to allow disabling creation of catalog tables#1155
Add flag to allow disabling creation of catalog tables#1155sungwy merged 5 commits intoapache:mainfrom
Conversation
… default behavior of Metadata.create_all()
kevinjqliu
left a comment
There was a problem hiding this comment.
Thanks for the PR! Do you mind adding a test to show that this works for when
- tables already exist
- some tables exist, but not all
- none of the tables exist yet
|
@isc-patrick do you know how it will check if the table exists? I'm asking since a |
|
I can certainly add tests, but that is really testing the Metadata.create_all() function in SQLAlchemy and not pyiceberg code. I think that CREATE TABLE IF NOT EXISTS requires same permissions as CREATE TABLE. How this is executed will be determined by the Dialect. |
|
please do! we want to ensure that this change does not break new and existing DB integrations. |
|
It does not look like CREATE TABLE IF NOT EXISTS is what is used. It is specific to each dialect and how they implement the has_table method, but the ones I looked at use metadata on the schema to determine the existence of the table. |
sungwy
left a comment
There was a problem hiding this comment.
Hi @isc-patrick - thank you very much for putting up this PR, and for leading the thorough discussion on #1148.
I have added some nits, and in addition could we:
- update the PR description to fit the updated scope of work for this PR
- add a line to the SqlCatalog property so that users will be able to find this property?
iceberg-python/mkdocs/docs/configuration.md
Lines 261 to 265 in 0dc5408
tests/catalog/test_sql.py
Outdated
There was a problem hiding this comment.
| def confirm_no_tables_exist(alchemy_engine: Engine) -> list[str]: | |
| def confirm_no_tables_exist(alchemy_engine: Engine) -> List[str]: |
for python3.8 compatibility (which will be deprecated soon), but this is currently needed for our CI to pass
There was a problem hiding this comment.
Not returning this anymore, so removed, CATALOG_TABLES is no a constant. PR title updated and flag added to docs
tests/catalog/test_sql.py
Outdated
There was a problem hiding this comment.
This function seems to serve two purposes:
- confirm that no tables exist
- return the list of
SqlCatalogBaseTable.__subclasses__()
(2) is a static list of tables required for the SqlCatalog. Would it be cleaner to separate out (2) to a global variable in this module, instead of returning it and passing it as an input to confirm_all_tables_exist ?
There was a problem hiding this comment.
Made it a constant
tests/catalog/test_sql.py
Outdated
There was a problem hiding this comment.
It looks like we are passing the catalog here just to assert its type - is this necessary for this check? Was the intention to check the tables through the catalog's engine instead like:
inspect(catalog.engine).get_table_names()
There was a problem hiding this comment.
Not passing the inspector anymore. Also needed to cast the Catalog to use the engine. Made the uri a fixture to insure using same source between catalog and inspector that was creating tables.
|
Thank you again for working on this PR @isc-patrick ! |
* Remove unnecessary _ensure_tables_exist method as this is already the default behavior of Metadata.create_all() * Added tests for creating Catalog tables when no, some or all tables already exist * add init_catalog_tables flag to SQLCatalog * add _ensure_tables_exists back until postgres integration tests completed * fixed tests, added flag to docs
* Remove unnecessary _ensure_tables_exist method as this is already the default behavior of Metadata.create_all() * Added tests for creating Catalog tables when no, some or all tables already exist * add init_catalog_tables flag to SQLCatalog * add _ensure_tables_exists back until postgres integration tests completed * fixed tests, added flag to docs
Remove unnecessary _ensure_tables_exist method as this is already the default behavior of Metadata.create_all()