Conversation
|
Hi, thanks for the contribution! I currently don't have much time to review it, please give me some days. |
|
No worries at all, this PR turned out pretty big and I know it will take time to go through everything. Please take your time with the review. I’ve tried to be careful with the Smogon links and file placements, so hopefully everything looks good there, but I’m happy to adjust anything you spot. |
6bb59fb to
d864acc
Compare
|
@anhthang The PR is too big to be reviewed, at least by me. Could you come up with a separate smaller one if possible (related to one script only maybe), so that it's going to be easier to review? |
|
@Naramsim This PR only depends on the download script and an update to the existing rename script to better handle forms/varieties. The check-missing script can be ignored for review purposes — it’s only used to identify missing assets and doesn’t affect the main workflow. If splitting the PR would make it easier to review, which approach would you prefer? I could split the updates per generation sheet. However, Gen 9 might still be quite large since it includes many updates. Another, though it may not be ideal, is reviewing by commit, since the changes are already logically separated across commits. |
|
@Naramsim I noticed that many images in the Do we still need to keep the lowres folder around? If it’s no longer necessary, I’d be happy to open a separate PR to remove it safely. |
Yes let's do it! First we remove the folder and then we merge this PR. |
You wanna do it in this PR? My thought was that it might be better to do it later, once this PR and the other one for renaming varieties are in a good place. |
|
Better in a separate PR |
PR opened |
31540a8 to
98b7acb
Compare
scripts/check_missing_sprites.py
Outdated
| # Export CSV for this category | ||
| file_safe_name = label.lower().replace(" ", "_") | ||
| pd.DataFrame(missing_list).to_csv( | ||
| f"missing_{file_safe_name}.csv", index=False |
There was a problem hiding this comment.
Does it make sense to run this script on our CICD let's say every week or month so to have a recap about which missing sprites are there?
There was a problem hiding this comment.
Definitely. Probably once a month, or whenever there are changes in the sprites folder.
There was a problem hiding this comment.
This sounds like a pretty good idea we can schedule the action to run every time there is a merge to generate a report we can create a tracking issue which maintains a link to this report (maybe using this action https://github.com/JasonEtco/create-an-issue and use this field update_existing to update the issue everytime its run)
One point of concern is that the raw report might not fit in the issue body so alternatives would be
- https://github.com/actions/upload-artifact uploading it as a workflow associated artefact
- maybe using some filebin. gist etc (or we could also just write the report to one of the branches/folders in this repo)
Naramsim
left a comment
There was a problem hiding this comment.
Thanks for the job! Now I understand better the changes. @FallenDeity could you as well take a look here at the files check_missing_sprites.py, smogon_download.py and renameSmogon.sh?
scripts/forms.json
Outdated
| "896": "glastrier", | ||
| "897": "spectrier", | ||
| "978": "tatsugiri-curly", | ||
| "658_2": "greninja-battle-bond", |
There was a problem hiding this comment.
is this intended the order is not maintained anymore?
There was a problem hiding this comment.
I don’t think the old script was actually maintaining the order either, but I can revert it to minimize changes and make the review easier.
There was a problem hiding this comment.
As @Naramsim raised I think it would be better to have this record csv as a part of the pipeline along with maybe a summary report too with categorization of sprites broken down like no. missing, and some diffs metrics and breakdown between back, front, shinies etc. The artifact upload step stores it as a zip so we could have some detailed records stored there
There was a problem hiding this comment.
I haven’t looked into the pipeline yet, but I think the current report is good enough for now. We can definitely extend it later to include a CSV artifact and a more detailed breakdown once we align on the exact format we want.
There was a problem hiding this comment.
Yeah i can write the pipeline after we merge this PR
| local pokemonName | ||
| local inputGeneration="${1:-6}" | ||
| local files="../smogon/gen$inputGeneration/" | ||
| local files="downloads/" |
There was a problem hiding this comment.
Why is it downloads now? we can just keep it smogon gen specific right, I think a little comment regarding files source img dir and how pokeapi sprites (i.e destination root relates to that in terms of placing directories).
Currently it attempts to go two directories up from downloads? the behavior is not clear
There was a problem hiding this comment.
It’s just the folder path/name used to store images downloaded from Smogon — there’s no logic change here. I only updated it to use the downloads directory, which is where smogon_download.py saves the images.
There was a problem hiding this comment.
alright sounds good although I prefer stuff to happen relative to my root project dir but downloads work here if the download script uses it I suppose
scripts/renameSmogon.sh
Outdated
| else | ||
| echo "[+] Copying GMax $smogonName to $destination/$pokemonID.png" | ||
| cp "$smogonName" "$destination/$pokemonID.png" | ||
| mv "$smogonName" "$destination/$pokemonID.png" |
There was a problem hiding this comment.
mv consumes the original file so if the next if statement is triggered after this it will fail since mv will no longer find the original file, which will crash the script
scripts/renameSmogon.sh
Outdated
| # --- FALLBACK LOGIC: Search 'forms' using the full name --- | ||
| echo "[!] $pokemonName not found directly. Searching forms for Species ID: $id..." | ||
|
|
||
| speciesResponse=$(curl -sS "https://pokeapi.co/api/v2/pokemon/$id/" 2>/dev/null) |
There was a problem hiding this comment.
Previously the $id is being used to hit the species endpoint in the gmax if and here its in pokemon endpoint is that intended or can we directly hit the species endpoint because pokemon forms and species varieties give different results for example pikachu (25)
There was a problem hiding this comment.
I'm using the Pokémon endpoint here to properly cover different varieties (e.g. Pikachu caps, Vivillon forms, etc.). Those can’t reliably be resolved from the species endpoint by name alone, since species and forms/varieties return different structures and data.
So this change ensures we correctly handle form-specific cases rather than just the base species.
There was a problem hiding this comment.
but the pokemon endpoint for say pikachu dosent give all the cap variants the species endpoint does that if you check the forms field its just pikachu there
There was a problem hiding this comment.
The old logic is still here (https://github.com/PokeAPI/sprites/pull/196/changes#diff-2074368bbccd08fa9ceeafb33f6af9f1f93044835d3cd67eb74eb4702b257cccR93). If the variants aren’t available from the Pokémon endpoint, we’ll fall back to the species endpoint to retrieve them.
There was a problem hiding this comment.
you are right that the old logic is still there (it hits the pokemon endpoint to check)
but in your fallback logic you are again hitting the pokemon endpoint itself did you mean to hit the pokemon-species endpoint?
# --- FALLBACK LOGIC: Search 'forms' using the full name ---
echo "[!] $pokemonName not found directly. Searching forms for Species ID: $id..."
speciesResponse=$(curl -sS "https://pokeapi.co/api/v2/pokemon/$id/" 2>/dev/null)
because for pokemon endpoint for pikachu https://pokeapi.co/api/v2/pokemon/25
"forms": [
{
"name": "pikachu",
"url": "https://pokeapi.co/api/v2/pokemon-form/25/"
}
],
species endpoint https://pokeapi.co/api/v2/pokemon-species/25/
"varieties": [
{
"is_default": true,
"pokemon": {
"name": "pikachu",
"url": "https://pokeapi.co/api/v2/pokemon/25/"
}
},
{
"is_default": false,
"pokemon": {
"name": "pikachu-rock-star",
"url": "https://pokeapi.co/api/v2/pokemon/10080/"
}
},
{
"is_default": false,
"pokemon": {
"name": "pikachu-belle",
"url": "https://pokeapi.co/api/v2/pokemon/10081/"
}
},
...
or it might be smart to cover both forms and varieties I am not entirely sure how this distinction is draw in pokeapi in bulbapedia they just list everything as form diffs https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_with_form_differences
but in pokeapi we have pokemon -> forms and followed by species -> varieties I think we should cover both in this fallback logic? @Naramsim what are your thoughts on this?
There was a problem hiding this comment.
Yes, it's an old debate. Nobody know now the true difference between forms, variants, species.
scripts/smogon_download.py
Outdated
| import requests | ||
| import re | ||
|
|
||
| # The list of URLs you provided |
There was a problem hiding this comment.
Add the google doc link url here or a pointer to the forum/discussion? its not clear where the urls are coming from, if its a public google sheet cant we just download it using requests etc and parse that instead of hardcoding urls? this script feels a bit redundant
There was a problem hiding this comment.
It's fine to explain again here (I mentioned it in the doc too).
The URLs are image links from the Smogon spreadsheet. The issue is that some sprite links have wrong filenames, duplicate IDs, and a few are Imgur links that don’t follow the same naming format. Because of that, we can’t reliably auto-download and rename them with a consistent rule.
Since the content needs manual verification anyway, I added the links explicitly to avoid broken mappings.
There was a problem hiding this comment.
But once a download is done by the pr and merged those urls are no longer relevant right also can you share the spreadsheet url here I wasn't able to find it 😅
There was a problem hiding this comment.
I already added the Smogon thread in the latest commit — you’ll find all the spreadsheet links in that topic.
|
Also just another note some images just seem to be updated with no diffs at all https://github.com/PokeAPI/sprites/pull/196/changes#diff-fbf2451a323761b6e6ee4b4f289800136c2a33650eaa9c279ff54301bb13ed2c Is this happening because the script is overwriting existing files during the form logic? if there is such a case maybe compare the image contents and if there is a diff log it and replace, if not just delete the |
You're right — that's essentially what’s happening. For a few images I’ve already noticed there are slight differences (just a few pixels changed), but I don’t remember all of them off the top of my head. |
6f923b5 to
08039c8
Compare
FallenDeity
left a comment
There was a problem hiding this comment.
Reviewed just two minor changes rest of the stuff looks good
| local pokemonName | ||
| local inputGeneration="${1:-6}" | ||
| local files="../smogon/gen$inputGeneration/" | ||
| local files="downloads/" |
There was a problem hiding this comment.
alright sounds good although I prefer stuff to happen relative to my root project dir but downloads work here if the download script uses it I suppose
scripts/renameSmogon.sh
Outdated
| # --- FALLBACK LOGIC: Search 'forms' using the full name --- | ||
| echo "[!] $pokemonName not found directly. Searching forms for Species ID: $id..." | ||
|
|
||
| speciesResponse=$(curl -sS "https://pokeapi.co/api/v2/pokemon/$id/" 2>/dev/null) |
There was a problem hiding this comment.
you are right that the old logic is still there (it hits the pokemon endpoint to check)
but in your fallback logic you are again hitting the pokemon endpoint itself did you mean to hit the pokemon-species endpoint?
# --- FALLBACK LOGIC: Search 'forms' using the full name ---
echo "[!] $pokemonName not found directly. Searching forms for Species ID: $id..."
speciesResponse=$(curl -sS "https://pokeapi.co/api/v2/pokemon/$id/" 2>/dev/null)
because for pokemon endpoint for pikachu https://pokeapi.co/api/v2/pokemon/25
"forms": [
{
"name": "pikachu",
"url": "https://pokeapi.co/api/v2/pokemon-form/25/"
}
],
species endpoint https://pokeapi.co/api/v2/pokemon-species/25/
"varieties": [
{
"is_default": true,
"pokemon": {
"name": "pikachu",
"url": "https://pokeapi.co/api/v2/pokemon/25/"
}
},
{
"is_default": false,
"pokemon": {
"name": "pikachu-rock-star",
"url": "https://pokeapi.co/api/v2/pokemon/10080/"
}
},
{
"is_default": false,
"pokemon": {
"name": "pikachu-belle",
"url": "https://pokeapi.co/api/v2/pokemon/10081/"
}
},
...
or it might be smart to cover both forms and varieties I am not entirely sure how this distinction is draw in pokeapi in bulbapedia they just list everything as form diffs https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_with_form_differences
but in pokeapi we have pokemon -> forms and followed by species -> varieties I think we should cover both in this fallback logic? @Naramsim what are your thoughts on this?
There was a problem hiding this comment.
I don't think this is required can be simplified to a jq command like this which we can put in README, or run in the pipeline if required
jq 'to_entries | sort_by(
.key |
(capture("^(?<n>\\d+)(?<r>.*)") // {n: "999999999", r: .}) |
(.n | tonumber) as $m | .r |
if . == "" then [$m, 0, 0, ""]
elif test("^_\\d+$") then [$m, 1, (.[1:] | tonumber), ""]
elif test("^[a-zA-Z]+$") then [$m, 2, 0, .]
elif startswith("_") then [$m, 3, 0, .]
else [$m, 4, 0, .]
end
) | from_entries' forms.json > sorted_forms.jsonThere was a problem hiding this comment.
thanks. run in pipeline is good idea and @Naramsim can do it I believe
|
@FallenDeity Thanks for pointing that out about the rename script. The first call is by name and the fallback is by dex number on the same endpoint. In my mind the fallback should use pokemon-species to pull varieties, but it seems to work. Let me double-check that logic again. Edit:
|
|
Alright, @anhthang is this mergeable? I don't know if there's more work to do. |
|
Yes please merge the PR @Naramsim, then I will sync this change to generation-v folder and do rename varieties for that |
This PR adds tooling and assets to help complete missing sprites:
check_missing_sprites.pyscript that reads Pokémon and forms CSVs, determines generation via version groups, and reports/export CSVs of missing front/back (normal and shiny) sprites.smogon_download.pyscript to download missing sprites from the Smogon attachment URLs, then userenameSmogon.sh(updated to work from thescripts/downloadsfolder and move files instead of copy) to place them into the correct sprite directories.pokemon_id≥ 10000 that are not covered by the automated scripts.