WIP: Source Links Fix by MaximilianSoerenPollak · Pull Request #440 · eclipse-score/docs-as-code

MaximilianSoerenPollak · 2026-03-11T21:07:55Z

Still in progress, do not merge.
It works but need to fix tests and clean it up a lot.

📌 Description

🚨 Impact Analysis

This change does not violate any tool requirements and is covered by existing tool requirements
This change does not violate any design decisions
Otherwise I have created a ticket for new tool qualification

✅ Checklist

Added/updated documentation for new or changed features
Added/updated tests to cover the changes
Followed project coding standards and guidelines

github-actions · 2026-03-11T21:08:33Z

License Check Results

🚀 The license check job ran with the Bazel command:

bazel run --lockfile_mode=error //src:license-check

Status: ⚠️ Needs Review

Click to expand output

[License Check Output]
Extracting Bazel installation...
Starting local Bazel server (8.3.0) and connecting to it...
INFO: Invocation ID: edc87323-e0ef-4647-8515-b103e90b355e
Computing main repo mapping: 
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
    currently loading: src
Loading: 0 packages loaded
    currently loading: src
Analyzing: target //src:license-check (1 packages loaded, 0 targets configured)
Analyzing: target //src:license-check (1 packages loaded, 0 targets configured)

Analyzing: target //src:license-check (68 packages loaded, 9 targets configured)

Analyzing: target //src:license-check (72 packages loaded, 9 targets configured)

Analyzing: target //src:license-check (128 packages loaded, 1218 targets configured)

Analyzing: target //src:license-check (132 packages loaded, 2516 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2574 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2576 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2576 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

INFO: Analyzed target //src:license-check (145 packages loaded, 4715 targets configured).
[12 / 16] JavaToolchainCompileClasses external/rules_java+/toolchains/platformclasspath_classes; 0s disk-cache, processwrapper-sandbox ... (2 actions, 1 running)
[14 / 16] JavaToolchainCompileBootClasspath external/rules_java+/toolchains/platformclasspath.jar; 0s disk-cache, processwrapper-sandbox
[15 / 16] Building src/license.check.license_check.jar (); 0s disk-cache, multiplex-worker
INFO: Found 1 target...
Target //src:license.check.license_check up-to-date:
  bazel-bin/src/license.check.license_check
  bazel-bin/src/license.check.license_check.jar
INFO: Elapsed time: 24.894s, Critical Path: 2.92s
INFO: 16 processes: 12 internal, 3 processwrapper-sandbox, 1 worker.
INFO: Build completed successfully, 16 total actions
INFO: Running command line: bazel-bin/src/license.check.license_check src/formatted.txt <args omitted>
usage: org.eclipse.dash.licenses.cli.Main [-batch <int>] [-cd <url>]
       [-confidence <int>] [-ef <url>] [-excludeSources <sources>] [-help] [-lic
       <url>] [-project <shortname>] [-repo <url>] [-review] [-summary <file>]
       [-timeout <seconds>] [-token <token>]

github-actions · 2026-03-11T21:13:18Z

The created documentation from the pull request is available at: docu-html

MaximilianSoerenPollak · 2026-03-11T21:18:27Z

src/extensions/score_source_code_linker/metadata.py

@@ -0,0 +1,23 @@
+# *******************************************************************************


I probably want to delete this file to make everything easier

Copilot

Pull request overview

This PR reworks how “source code links” are generated and injected into Sphinx needs, aiming to support both local builds (derive Git hash from the current repo) and combo/ref-integration builds (use per-module metadata such as repo URL + commit hash).

Changes:

Add metadata support (module name / repo URL / commit hash) to NeedLink JSON handling and Bazel CLI generation/merge scripts.
Introduce grouping of needs by module and a new module-grouped cache JSON.
Update Sphinx extension flow to read the new caches and generate GitHub links using either git-derived or metadata-derived refs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
src/helper_lib/additional_functions.py	Changes GitHub link generation API to require module metadata.
src/extensions/score_source_code_linker/needlinks.py	Extends NeedLink with metadata fields; adds new metadata-aware JSON format/load path.
src/extensions/score_source_code_linker/need_source_links.py	Moves `group_by_need` into this module.
src/extensions/score_source_code_linker/module_source_links.py	New module-grouped cache format and grouping logic.
src/extensions/score_source_code_linker/metadata.py	New TypedDict + TypeGuard for metadata records.
src/extensions/score_source_code_linker/generate_source_code_links_json.py	Refactors extraction helper signature and logging (currently inconsistent call sites).
src/extensions/score_source_code_linker/init.py	Adds module-linker stage and changes injection to use module-grouped cache + metadata-based link generation.
scripts_bazel/merge_sourcelinks.py	Merges per-module sourcelinks and enriches with known-good repo/hash metadata.
scripts_bazel/generate_sourcelinks_cli.py	Emits sourcelinks JSON with a leading metadata dict and uses updated extraction helper signature.
docs.bzl	Adds optional `known_good` wiring into the sourcelinks merge genrule and the public `docs()` macro.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T00:15:36Z

scripts_bazel/merge_sourcelinks.py

            data = json.load(f)
+            metadata = data[0]
+            if metadata["module_name"] and metadata["module_name"] != "local_module":
+                hash, repo = parse_info_from_known_good(
+                    known_good_json=args.known_good, module_name=metadata["module_name"]
+                )


--known_good is optional, but when a module has a non-empty metadata["module_name"] this code unconditionally calls parse_info_from_known_good(known_good_json=args.known_good, ...). If --known_good is omitted, args.known_good is None and open(None, ...) will fail. Either make --known_good required when external modules are present, or guard this branch and emit a clear error explaining how to provide the known-good file.

Copilot · 2026-03-12T00:15:37Z

scripts_bazel/merge_sourcelinks.py

+    print("===THIS IS MODULE NAME WE LOOK FOR===========")
+    print(module_name)
+    with open(known_good_json, "r") as f:
+        kg_json = json.load(f)
+    for category in kg_json["modules"].values():
+        print("===THIS IS CATEGORY=========")
+        print(category)
+        if module_name in category:
+            print("===THIS IS MODULE NAME INSIDe CATEGORY===========")
+            print(module_name)


This helper contains multiple print() debug statements that will pollute build logs (and potentially leak large JSON fragments via print(category)). Prefer using the configured logger at DEBUG level (or remove these statements entirely) once the behavior is validated.

Suggested change

print("===THIS IS MODULE NAME WE LOOK FOR===========")

print(module_name)

with open(known_good_json, "r") as f:

kg_json = json.load(f)

for category in kg_json["modules"].values():

print("===THIS IS CATEGORY=========")

print(category)

if module_name in category:

print("===THIS IS MODULE NAME INSIDe CATEGORY===========")

print(module_name)

logger.debug("Looking up module name %r in known_good_json %s", module_name, known_good_json)

with open(known_good_json, "r") as f:

kg_json = json.load(f)

for category in kg_json["modules"].values():

logger.debug("Inspecting category while searching for module %r: %s", module_name, category)

if module_name in category:

logger.debug("Found module name %r inside category", module_name)

Copilot · 2026-03-12T00:15:37Z

scripts_bazel/generate_sourcelinks_cli.py

+    print("======== THIs IS PATH we PARSIGN FOr MODULE NAME")
+    print(path)
+    if str(path).startswith("external/"):
+        module_raw = str(path).removeprefix("external/")
+        filepath_split = str(module_raw).split("/", maxsplit=1)
+        module_name = str(filepath_split[0].removesuffix("+"))
+        return module_name
+    return "local_module"


parse_module_name_from_path currently uses print() for debugging. This will add noise to Bazel build output; use the module logger at DEBUG level or remove these statements.

Copilot · 2026-03-12T00:15:37Z

scripts_bazel/generate_sourcelinks_cli.py

+    store_source_code_links_with_metadata_json(
+        file=args.output, metadata=metadata, needlist=all_need_references
+    )


This switches the generated JSON format from a plain list of NeedLinks to a list whose first element is a metadata dict. Any existing consumers/tests that expect the old schema will now fail. Consider either keeping the old format as default (with an opt-in flag for metadata), or updating all in-repo consumers and tests in the same PR to avoid a partially-migrated state.

Copilot · 2026-03-12T00:15:38Z

src/extensions/score_source_code_linker/__init__.py

+    gruped_cache_exists = grouped_cache.exists()
+    # TODO this cache should be done via Bazel
+    if not gruped_cache_exists or not app.config.skip_rescanning_via_source_code_linker:


Typo in variable name: gruped_cache_exists should be grouped_cache_exists for readability and consistency with grouped_cache.

Copilot · 2026-03-12T00:15:39Z

src/helper_lib/additional_functions.py

+    metadata: moduleInfo,
+    link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
+) -> str:
+    if link is None:
+        link = DefaultNeedLink()
+    if not metadata.hash:
+        # Local path (//:docs)
+        return get_github_link_from_git(link)
+    # Ref-Integration path (//:docs_combo..)
+    return get_github_link_from_json(metadata, link)


get_github_link now requires a metadata argument, but there are existing call sites in the repo that still call get_github_link(link) (e.g. in tests and xml_parser.py). This is a breaking API change that will raise TypeError. Consider keeping backwards compatibility by making metadata optional with a sensible default (e.g. infer local/git behavior when metadata is omitted) or providing a new function name for the metadata-based behavior.

Suggested change

metadata: moduleInfo,

link: NeedLink | DataForTestLink | DataOfTestCase | None = None,

) -> str:

if link is None:

link = DefaultNeedLink()

if not metadata.hash:

# Local path (//:docs)

return get_github_link_from_git(link)

# Ref-Integration path (//:docs_combo..)

return get_github_link_from_json(metadata, link)

metadata: moduleInfo

| NeedLink

| DataForTestLink

| DataOfTestCase

| None = None,

link: NeedLink | DataForTestLink | DataOfTestCase | None = None,

) -> str:

"""

Return a GitHub link for the given location.

Backwards compatible calling conventions:

- New style: get_github_link(metadata, link)

- Old style: get_github_link(link) # metadata omitted

"""

# Distinguish between new-style and old-style calls.

if isinstance(metadata, moduleInfo):

actual_metadata: moduleInfo | None = metadata

actual_link = link

else:

# Old-style: first argument is actually the link (or None)

actual_metadata = None

# If both metadata and link are provided but metadata is not a moduleInfo,

# prefer the explicitly provided link argument.

actual_link = link if link is not None else metadata

if actual_link is None:

actual_link = DefaultNeedLink()

# If no metadata is available or metadata.hash is falsy, fall back to git-based link.

if actual_metadata is None or not getattr(actual_metadata, "hash", None):

# Local path (//:docs)

return get_github_link_from_git(actual_link)

# Ref-Integration path (//:docs_combo..)

return get_github_link_from_json(actual_metadata, actual_link)

Copilot · 2026-03-12T00:15:39Z

src/extensions/score_source_code_linker/need_source_links.py

+    Returns a nested dictionary structure with 'CodeLink' and 'TestLink' categories.
+    Example output:
+
+
+      {
+        "need": "<need_id>",
+        "module_name": <module_name>,
+        "hash": <git hash>,
+        "url": <github base url>,
+        "links": {
+          "CodeLinks": [NeedLink, NeedLink, ...],
+          "TestLinks": [testlink, testlink, ...]


The group_by_need docstring/documented example includes top-level module_name/hash/url fields, but SourceCodeLinks only contains need and links and this function does not populate metadata. Please update the docstring/example to match the actual returned structure to avoid confusion for future callers.

Suggested change

Returns a nested dictionary structure with 'CodeLink' and 'TestLink' categories.

Example output:

{

"need": "<need_id>",

"module_name": <module_name>,

"hash": <git hash>,

"url": <github base url>,

"links": {

"CodeLinks": [NeedLink, NeedLink, ...],

"TestLinks": [testlink, testlink, ...]

Returns a list of SourceCodeLinks objects, each containing a 'need' ID and

associated 'CodeLinks' and 'TestLinks'.

Example output element:

{

"need": "<need_id>",

"links": {

"CodeLinks": [NeedLink, NeedLink, ...],

"TestLinks": [DataForTestLink, DataForTestLink, ...]

Copilot · 2026-03-12T00:15:40Z

src/extensions/score_source_code_linker/needlinks.py

+    def to_dict_full(self) -> dict[str, str | Path]:
+        return asdict(self)
+
+    def to_dict_without_metadata(self) -> dict[str, str | Path]:
+        d = asdict(self)
+        d.pop("module_name", None)
+        d.pop("hash", None)
+        d.pop("url", None)
+        return d


The return type annotations for to_dict_full() / to_dict_without_metadata() are too narrow: asdict(self) includes an int (line) and the dict may contain str, int, and Path values. Update the return type to reflect the actual value types (or use dict[str, object]).

Copilot · 2026-03-12T00:15:40Z

src/extensions/score_source_code_linker/module_source_links.py

+@dataclass
+class moduleInfo:
+    name: str
+    hash: str
+    url: str


moduleInfo is a class but is named in lowerCamelCase. This is inconsistent with the rest of this package (e.g. NeedLink, ModuleSourceLinks) and makes type usage harder to read. Rename it to ModuleInfo (and update imports/usages).

Copilot · 2026-03-12T00:15:40Z

src/extensions/score_source_code_linker/needlinks.py

+def store_source_code_links_json(file: Path, needlist: list[NeedLink]) -> None:
+    """
+    Writes a JSON array:
+      [ meta_dict, needlink1, needlink2, ... ]
+
+    meta_dict must include:
+      module_name, hash, url
+    """
+
+    file.parent.mkdir(exist_ok=True)
+    with open(file, "w", encoding="utf-8") as f:
+        json.dump(needlist, f, cls=NeedLinkEncoder, indent=2, ensure_ascii=False)


The docstring for store_source_code_links_json claims it writes [meta_dict, needlink1, ...], but the implementation writes only needlist. This is misleading for callers and makes it unclear which on-disk format they should expect. Either update the docstring to match the actual output, or change the function to include metadata (and keep the non-metadata format under a different name for backwards compatibility).

MaximilianSoerenPollak added 2 commits March 11, 2026 20:48

WIP: Local Works somewhat

7dece30

WIP: seems to work in ref & local

ebdb40f

github-project-automation bot added this to INF - Infrastructure Community Mar 11, 2026

github-project-automation bot moved this to Backlog in INF - Infrastructure Community Mar 11, 2026

MaximilianSoerenPollak commented Mar 11, 2026

View reviewed changes

MaximilianSoerenPollak mentioned this pull request Mar 11, 2026

docs testlinks should point to version from bazel_dep #412

Open

MaximilianSoerenPollak requested a review from Copilot March 12, 2026 00:08

Copilot started reviewing on behalf of MaximilianSoerenPollak March 12, 2026 00:08 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

		@@ -0,0 +1,23 @@
		# *******************************************************************************

-    metadata: moduleInfo,
-    link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
-) -> str:
-    if link is None:
-        link = DefaultNeedLink()
-    if not metadata.hash:
-        # Local path (//:docs)
-        return get_github_link_from_git(link)
-    # Ref-Integration path (//:docs_combo..)
-    return get_github_link_from_json(metadata, link)
+    metadata: moduleInfo
+    | NeedLink
+    | DataForTestLink
+    | DataOfTestCase
+    | None = None,
+    link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
+) -> str:
+    """
+    Return a GitHub link for the given location.
+    Backwards compatible calling conventions:
+    - New style: get_github_link(metadata, link)
+    - Old style: get_github_link(link)  # metadata omitted
+    """
+    # Distinguish between new-style and old-style calls.
+    if isinstance(metadata, moduleInfo):
+        actual_metadata: moduleInfo | None = metadata
+        actual_link = link
+    else:
+        # Old-style: first argument is actually the link (or None)
+        actual_metadata = None
+        # If both metadata and link are provided but metadata is not a moduleInfo,
+        # prefer the explicitly provided link argument.
+        actual_link = link if link is not None else metadata
+    if actual_link is None:
+        actual_link = DefaultNeedLink()
+    # If no metadata is available or metadata.hash is falsy, fall back to git-based link.
+    if actual_metadata is None or not getattr(actual_metadata, "hash", None):
+        # Local path (//:docs)
+        return get_github_link_from_git(actual_link)
+    # Ref-Integration path (//:docs_combo..)
+    return get_github_link_from_json(actual_metadata, actual_link)

Conversation

MaximilianSoerenPollak commented Mar 11, 2026

📌 Description

🚨 Impact Analysis

✅ Checklist

Uh oh!

github-actions bot commented Mar 11, 2026

License Check Results

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

MaximilianSoerenPollak Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants