Skip to content

Fix COPY TO returning 0 rows during concurrent reorganize#1605

Open
gfphoenix78 wants to merge 1 commit intoapache:mainfrom
gfphoenix78:fix-reorg
Open

Fix COPY TO returning 0 rows during concurrent reorganize#1605
gfphoenix78 wants to merge 1 commit intoapache:mainfrom
gfphoenix78:fix-reorg

Conversation

@gfphoenix78
Copy link
Contributor

When ALTER TABLE ... SET WITH (reorganize=true) runs concurrently with COPY TO, COPY may return 0 rows instead of all rows. The root cause is a snapshot/lock ordering problem: PortalRunUtility() pushes the active snapshot before calling DoCopy(), so the snapshot predates any concurrent reorganize that had not yet committed. After COPY TO blocks on AccessExclusiveLock and the reorganize commits, the stale snapshot cannot see the new physical files (xmin = reorganize_xid is invisible) while the old physical files have already been removed, yielding 0 rows.

Three code paths are fixed:

  1. Relation-based COPY TO (copy.c, DoCopy): After table_openrv() acquires AccessShareLock — which blocks until any concurrent reorganize commits — pop and re-push the active snapshot so it reflects all committed data at lock-grant time.

  2. Query-based COPY TO, RLS COPY TO, and CTAS (copyto.c, BeginCopy): After pg_analyze_and_rewrite() -> AcquireRewriteLocks() acquires all direct relation locks, refresh the snapshot. This covers COPY (SELECT ...) TO, COPY on RLS-protected tables (internally rewritten to a query), and CREATE TABLE AS SELECT.

  3. Partitioned table COPY TO (copy.c, DoCopy): Before entering BeginCopy, call find_all_inheritors() to eagerly acquire AccessShareLock on all child partitions. Child partition locks are normally acquired later in ExecutorStart -> ExecInitAppend, after PushCopiedSnapshot has already embedded a stale snapshot. Locking all children upfront ensures the snapshot refresh in fixes 1 and 2 covers all concurrent child-partition reorganize commits.

In REPEATABLE READ or SERIALIZABLE isolation, GetTransactionSnapshot() returns the same transaction-level snapshot, so the Pop/Push is a harmless no-op.

Tests added:

  • src/test/isolation2/sql/copy_to_concurrent_reorganize.sql Tests 2.1-2.5 for relation-based, query-based, partitioned, RLS, and CTAS paths across heap, AO row, and AO column storage.
  • contrib/pax_storage/src/test/isolation2/sql/pax/ copy_to_concurrent_reorganize.sql Same coverage for PAX columnar storage.

See: Issue#1545 #1545

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


When ALTER TABLE ... SET WITH (reorganize=true) runs concurrently with
COPY TO, COPY may return 0 rows instead of all rows.  The root cause is
a snapshot/lock ordering problem: PortalRunUtility() pushes the active
snapshot before calling DoCopy(), so the snapshot predates any
concurrent reorganize that had not yet committed.  After COPY TO blocks
on AccessExclusiveLock and the reorganize commits, the stale snapshot
cannot see the new physical files (xmin = reorganize_xid is invisible)
while the old physical files have already been removed, yielding 0 rows.

Three code paths are fixed:

1. Relation-based COPY TO (copy.c, DoCopy):
   After table_openrv() acquires AccessShareLock — which blocks until
   any concurrent reorganize commits — pop and re-push the active
   snapshot so it reflects all committed data at lock-grant time.

2. Query-based COPY TO, RLS COPY TO, and CTAS (copyto.c, BeginCopy):
   After pg_analyze_and_rewrite() -> AcquireRewriteLocks() acquires
   all direct relation locks, refresh the snapshot.  This covers
   COPY (SELECT ...) TO, COPY on RLS-protected tables (internally
   rewritten to a query), and CREATE TABLE AS SELECT.

3. Partitioned table COPY TO (copy.c, DoCopy):
   Before entering BeginCopy, call find_all_inheritors() to eagerly
   acquire AccessShareLock on all child partitions.  Child partition
   locks are normally acquired later in ExecutorStart -> ExecInitAppend,
   after PushCopiedSnapshot has already embedded a stale snapshot.
   Locking all children upfront ensures the snapshot refresh in fixes
   1 and 2 covers all concurrent child-partition reorganize commits.

In REPEATABLE READ or SERIALIZABLE isolation, GetTransactionSnapshot()
returns the same transaction-level snapshot, so the Pop/Push is a
harmless no-op.

Tests added:
- src/test/isolation2/sql/copy_to_concurrent_reorganize.sql
  Tests 2.1-2.5 for relation-based, query-based, partitioned, RLS,
  and CTAS paths across heap, AO row, and AO column storage.
- contrib/pax_storage/src/test/isolation2/sql/pax/
  copy_to_concurrent_reorganize.sql
  Same coverage for PAX columnar storage.

See: Issue#1545 <apache#1545>
Copy link
Contributor

@avamingli avamingli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants