Skip to content

WIP: Source Links Fix#440

Draft
MaximilianSoerenPollak wants to merge 2 commits intoeclipse-score:mainfrom
MaximilianSoerenPollak:MSP_fix_sourcelinks_2
Draft

WIP: Source Links Fix#440
MaximilianSoerenPollak wants to merge 2 commits intoeclipse-score:mainfrom
MaximilianSoerenPollak:MSP_fix_sourcelinks_2

Conversation

@MaximilianSoerenPollak
Copy link
Contributor

Still in progress, do not merge.
It works but need to fix tests and clean it up a lot.

📌 Description

🚨 Impact Analysis

  • This change does not violate any tool requirements and is covered by existing tool requirements
  • This change does not violate any design decisions
  • Otherwise I have created a ticket for new tool qualification

✅ Checklist

  • Added/updated documentation for new or changed features
  • Added/updated tests to cover the changes
  • Followed project coding standards and guidelines

@github-actions
Copy link

License Check Results

🚀 The license check job ran with the Bazel command:

bazel run --lockfile_mode=error //src:license-check

Status: ⚠️ Needs Review

Click to expand output
[License Check Output]
Extracting Bazel installation...
Starting local Bazel server (8.3.0) and connecting to it...
INFO: Invocation ID: edc87323-e0ef-4647-8515-b103e90b355e
Computing main repo mapping: 
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
    currently loading: src
Loading: 0 packages loaded
    currently loading: src
Analyzing: target //src:license-check (1 packages loaded, 0 targets configured)
Analyzing: target //src:license-check (1 packages loaded, 0 targets configured)

Analyzing: target //src:license-check (68 packages loaded, 9 targets configured)

Analyzing: target //src:license-check (72 packages loaded, 9 targets configured)

Analyzing: target //src:license-check (128 packages loaded, 1218 targets configured)

Analyzing: target //src:license-check (132 packages loaded, 2516 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2574 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2576 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2576 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

Analyzing: target //src:license-check (143 packages loaded, 4465 targets configured)

INFO: Analyzed target //src:license-check (145 packages loaded, 4715 targets configured).
[12 / 16] JavaToolchainCompileClasses external/rules_java+/toolchains/platformclasspath_classes; 0s disk-cache, processwrapper-sandbox ... (2 actions, 1 running)
[14 / 16] JavaToolchainCompileBootClasspath external/rules_java+/toolchains/platformclasspath.jar; 0s disk-cache, processwrapper-sandbox
[15 / 16] Building src/license.check.license_check.jar (); 0s disk-cache, multiplex-worker
INFO: Found 1 target...
Target //src:license.check.license_check up-to-date:
  bazel-bin/src/license.check.license_check
  bazel-bin/src/license.check.license_check.jar
INFO: Elapsed time: 24.894s, Critical Path: 2.92s
INFO: 16 processes: 12 internal, 3 processwrapper-sandbox, 1 worker.
INFO: Build completed successfully, 16 total actions
INFO: Running command line: bazel-bin/src/license.check.license_check src/formatted.txt <args omitted>
usage: org.eclipse.dash.licenses.cli.Main [-batch <int>] [-cd <url>]
       [-confidence <int>] [-ef <url>] [-excludeSources <sources>] [-help] [-lic
       <url>] [-project <shortname>] [-repo <url>] [-review] [-summary <file>]
       [-timeout <seconds>] [-token <token>]

@github-actions
Copy link

The created documentation from the pull request is available at: docu-html

@@ -0,0 +1,23 @@
# *******************************************************************************
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably want to delete this file to make everything easier

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reworks how “source code links” are generated and injected into Sphinx needs, aiming to support both local builds (derive Git hash from the current repo) and combo/ref-integration builds (use per-module metadata such as repo URL + commit hash).

Changes:

  • Add metadata support (module name / repo URL / commit hash) to NeedLink JSON handling and Bazel CLI generation/merge scripts.
  • Introduce grouping of needs by module and a new module-grouped cache JSON.
  • Update Sphinx extension flow to read the new caches and generate GitHub links using either git-derived or metadata-derived refs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/helper_lib/additional_functions.py Changes GitHub link generation API to require module metadata.
src/extensions/score_source_code_linker/needlinks.py Extends NeedLink with metadata fields; adds new metadata-aware JSON format/load path.
src/extensions/score_source_code_linker/need_source_links.py Moves group_by_need into this module.
src/extensions/score_source_code_linker/module_source_links.py New module-grouped cache format and grouping logic.
src/extensions/score_source_code_linker/metadata.py New TypedDict + TypeGuard for metadata records.
src/extensions/score_source_code_linker/generate_source_code_links_json.py Refactors extraction helper signature and logging (currently inconsistent call sites).
src/extensions/score_source_code_linker/init.py Adds module-linker stage and changes injection to use module-grouped cache + metadata-based link generation.
scripts_bazel/merge_sourcelinks.py Merges per-module sourcelinks and enriches with known-good repo/hash metadata.
scripts_bazel/generate_sourcelinks_cli.py Emits sourcelinks JSON with a leading metadata dict and uses updated extraction helper signature.
docs.bzl Adds optional known_good wiring into the sourcelinks merge genrule and the public docs() macro.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 100 to +105
data = json.load(f)
metadata = data[0]
if metadata["module_name"] and metadata["module_name"] != "local_module":
hash, repo = parse_info_from_known_good(
known_good_json=args.known_good, module_name=metadata["module_name"]
)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--known_good is optional, but when a module has a non-empty metadata["module_name"] this code unconditionally calls parse_info_from_known_good(known_good_json=args.known_good, ...). If --known_good is omitted, args.known_good is None and open(None, ...) will fail. Either make --known_good required when external modules are present, or guard this branch and emit a clear error explaining how to provide the known-good file.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +66
print("===THIS IS MODULE NAME WE LOOK FOR===========")
print(module_name)
with open(known_good_json, "r") as f:
kg_json = json.load(f)
for category in kg_json["modules"].values():
print("===THIS IS CATEGORY=========")
print(category)
if module_name in category:
print("===THIS IS MODULE NAME INSIDe CATEGORY===========")
print(module_name)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper contains multiple print() debug statements that will pollute build logs (and potentially leak large JSON fragments via print(category)). Prefer using the configured logger at DEBUG level (or remove these statements entirely) once the behavior is validated.

Suggested change
print("===THIS IS MODULE NAME WE LOOK FOR===========")
print(module_name)
with open(known_good_json, "r") as f:
kg_json = json.load(f)
for category in kg_json["modules"].values():
print("===THIS IS CATEGORY=========")
print(category)
if module_name in category:
print("===THIS IS MODULE NAME INSIDe CATEGORY===========")
print(module_name)
logger.debug("Looking up module name %r in known_good_json %s", module_name, known_good_json)
with open(known_good_json, "r") as f:
kg_json = json.load(f)
for category in kg_json["modules"].values():
logger.debug("Inspecting category while searching for module %r: %s", module_name, category)
if module_name in category:
logger.debug("Found module name %r inside category", module_name)

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +60
print("======== THIs IS PATH we PARSIGN FOr MODULE NAME")
print(path)
if str(path).startswith("external/"):
module_raw = str(path).removeprefix("external/")
filepath_split = str(module_raw).split("/", maxsplit=1)
module_name = str(filepath_split[0].removesuffix("+"))
return module_name
return "local_module"
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_module_name_from_path currently uses print() for debugging. This will add noise to Bazel build output; use the module logger at DEBUG level or remove these statements.

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +108
store_source_code_links_with_metadata_json(
file=args.output, metadata=metadata, needlist=all_need_references
)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switches the generated JSON format from a plain list of NeedLinks to a list whose first element is a metadata dict. Any existing consumers/tests that expect the old schema will now fail. Consider either keeping the old format as default (with an opt-in flag for metadata), or updating all in-repo consumers and tests in the same PR to avoid a partially-migrated state.

Copilot uses AI. Check for mistakes.
Comment on lines +262 to +264
gruped_cache_exists = grouped_cache.exists()
# TODO this cache should be done via Bazel
if not gruped_cache_exists or not app.config.skip_rescanning_via_source_code_linker:
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name: gruped_cache_exists should be grouped_cache_exists for readability and consistency with grouped_cache.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +39
metadata: moduleInfo,
link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
) -> str:
if link is None:
link = DefaultNeedLink()
if not metadata.hash:
# Local path (//:docs)
return get_github_link_from_git(link)
# Ref-Integration path (//:docs_combo..)
return get_github_link_from_json(metadata, link)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_github_link now requires a metadata argument, but there are existing call sites in the repo that still call get_github_link(link) (e.g. in tests and xml_parser.py). This is a breaking API change that will raise TypeError. Consider keeping backwards compatibility by making metadata optional with a sensible default (e.g. infer local/git behavior when metadata is omitted) or providing a new function name for the metadata-based behavior.

Suggested change
metadata: moduleInfo,
link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
) -> str:
if link is None:
link = DefaultNeedLink()
if not metadata.hash:
# Local path (//:docs)
return get_github_link_from_git(link)
# Ref-Integration path (//:docs_combo..)
return get_github_link_from_json(metadata, link)
metadata: moduleInfo
| NeedLink
| DataForTestLink
| DataOfTestCase
| None = None,
link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
) -> str:
"""
Return a GitHub link for the given location.
Backwards compatible calling conventions:
- New style: get_github_link(metadata, link)
- Old style: get_github_link(link) # metadata omitted
"""
# Distinguish between new-style and old-style calls.
if isinstance(metadata, moduleInfo):
actual_metadata: moduleInfo | None = metadata
actual_link = link
else:
# Old-style: first argument is actually the link (or None)
actual_metadata = None
# If both metadata and link are provided but metadata is not a moduleInfo,
# prefer the explicitly provided link argument.
actual_link = link if link is not None else metadata
if actual_link is None:
actual_link = DefaultNeedLink()
# If no metadata is available or metadata.hash is falsy, fall back to git-based link.
if actual_metadata is None or not getattr(actual_metadata, "hash", None):
# Local path (//:docs)
return get_github_link_from_git(actual_link)
# Ref-Integration path (//:docs_combo..)
return get_github_link_from_json(actual_metadata, actual_link)

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +131
Returns a nested dictionary structure with 'CodeLink' and 'TestLink' categories.
Example output:


{
"need": "<need_id>",
"module_name": <module_name>,
"hash": <git hash>,
"url": <github base url>,
"links": {
"CodeLinks": [NeedLink, NeedLink, ...],
"TestLinks": [testlink, testlink, ...]
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The group_by_need docstring/documented example includes top-level module_name/hash/url fields, but SourceCodeLinks only contains need and links and this function does not populate metadata. Please update the docstring/example to match the actual returned structure to avoid confusion for future callers.

Suggested change
Returns a nested dictionary structure with 'CodeLink' and 'TestLink' categories.
Example output:
{
"need": "<need_id>",
"module_name": <module_name>,
"hash": <git hash>,
"url": <github base url>,
"links": {
"CodeLinks": [NeedLink, NeedLink, ...],
"TestLinks": [testlink, testlink, ...]
Returns a list of SourceCodeLinks objects, each containing a 'need' ID and
associated 'CodeLinks' and 'TestLinks'.
Example output element:
{
"need": "<need_id>",
"links": {
"CodeLinks": [NeedLink, NeedLink, ...],
"TestLinks": [DataForTestLink, DataForTestLink, ...]

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +44
def to_dict_full(self) -> dict[str, str | Path]:
return asdict(self)

def to_dict_without_metadata(self) -> dict[str, str | Path]:
d = asdict(self)
d.pop("module_name", None)
d.pop("hash", None)
d.pop("url", None)
return d
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotations for to_dict_full() / to_dict_without_metadata() are too narrow: asdict(self) includes an int (line) and the dict may contain str, int, and Path values. Update the return type to reflect the actual value types (or use dict[str, object]).

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +34
@dataclass
class moduleInfo:
name: str
hash: str
url: str
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moduleInfo is a class but is named in lowerCamelCase. This is inconsistent with the rest of this package (e.g. NeedLink, ModuleSourceLinks) and makes type usage harder to read. Rename it to ModuleInfo (and update imports/usages).

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +116
def store_source_code_links_json(file: Path, needlist: list[NeedLink]) -> None:
"""
Writes a JSON array:
[ meta_dict, needlink1, needlink2, ... ]

meta_dict must include:
module_name, hash, url
"""

file.parent.mkdir(exist_ok=True)
with open(file, "w", encoding="utf-8") as f:
json.dump(needlist, f, cls=NeedLinkEncoder, indent=2, ensure_ascii=False)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for store_source_code_links_json claims it writes [meta_dict, needlink1, ...], but the implementation writes only needlist. This is misleading for callers and makes it unclear which on-disk format they should expect. Either update the docstring to match the actual output, or change the function to include metadata (and keep the non-metadata format under a different name for backwards compatibility).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants