Skip to content

Add missing Pokémon sprites#196

Open
anhthang wants to merge 29 commits intoPokeAPI:masterfrom
anhthang:add-missing-sprites
Open

Add missing Pokémon sprites#196
anhthang wants to merge 29 commits intoPokeAPI:masterfrom
anhthang:add-missing-sprites

Conversation

@anhthang
Copy link
Contributor

This PR adds tooling and assets to help complete missing sprites:

  • Add a check_missing_sprites.py script that reads Pokémon and forms CSVs, determines generation via version groups, and reports/export CSVs of missing front/back (normal and shiny) sprites.
  • Add a smogon_download.py script to download missing sprites from the Smogon attachment URLs, then use renameSmogon.sh (updated to work from the scripts/downloads folder and move files instead of copy) to place them into the correct sprite directories.
  • Manually add sprites for special Imgur URLs and forms with pokemon_id ≥ 10000 that are not covered by the automated scripts.

@Naramsim
Copy link
Member

Hi, thanks for the contribution! I currently don't have much time to review it, please give me some days.

@anhthang
Copy link
Contributor Author

No worries at all, this PR turned out pretty big and I know it will take time to go through everything. Please take your time with the review.

I’ve tried to be careful with the Smogon links and file placements, so hopefully everything looks good there, but I’m happy to adjust anything you spot.

@anhthang anhthang force-pushed the add-missing-sprites branch from 6bb59fb to d864acc Compare February 20, 2026 04:49
@Naramsim
Copy link
Member

Naramsim commented Feb 24, 2026

@anhthang The PR is too big to be reviewed, at least by me. Could you come up with a separate smaller one if possible (related to one script only maybe), so that it's going to be easier to review?

@anhthang
Copy link
Contributor Author

@Naramsim This PR only depends on the download script and an update to the existing rename script to better handle forms/varieties. The check-missing script can be ignored for review purposes — it’s only used to identify missing assets and doesn’t affect the main workflow.

If splitting the PR would make it easier to review, which approach would you prefer? I could split the updates per generation sheet. However, Gen 9 might still be quite large since it includes many updates.

Another, though it may not be ideal, is reviewing by commit, since the changes are already logically separated across commits.

@anhthang
Copy link
Contributor Author

@Naramsim I noticed that many images in the ‎⁠lowres⁠ folder don’t match the default sprite style, and the folder overall seems quite outdated.

Do we still need to keep the ‎⁠lowres⁠ folder around? If it’s no longer necessary, I’d be happy to open a separate PR to remove it safely.

@Naramsim
Copy link
Member

@Naramsim I noticed that many images in the ‎⁠lowres⁠ folder don’t match the default sprite style, and the folder overall seems quite outdated.

Do we still need to keep the ‎⁠lowres⁠ folder around? If it’s no longer necessary, I’d be happy to open a separate PR to remove it safely.

Yes let's do it! First we remove the folder and then we merge this PR.

@anhthang
Copy link
Contributor Author

Yes let's do it! First we remove the folder and then we merge this PR.

You wanna do it in this PR? My thought was that it might be better to do it later, once this PR and the other one for renaming varieties are in a good place.

@Naramsim
Copy link
Member

Naramsim commented Mar 1, 2026

Better in a separate PR

@anhthang
Copy link
Contributor Author

anhthang commented Mar 2, 2026

Better in a separate PR

PR opened

@anhthang anhthang force-pushed the add-missing-sprites branch from 31540a8 to 98b7acb Compare March 2, 2026 09:14
# Export CSV for this category
file_safe_name = label.lower().replace(" ", "_")
pd.DataFrame(missing_list).to_csv(
f"missing_{file_safe_name}.csv", index=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to run this script on our CICD let's say every week or month so to have a recap about which missing sprites are there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Probably once a month, or whenever there are changes in the sprites folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a pretty good idea we can schedule the action to run every time there is a merge to generate a report we can create a tracking issue which maintains a link to this report (maybe using this action https://github.com/JasonEtco/create-an-issue and use this field update_existing to update the issue everytime its run)

One point of concern is that the raw report might not fit in the issue body so alternatives would be

Copy link
Member

@Naramsim Naramsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the job! Now I understand better the changes. @FallenDeity could you as well take a look here at the files check_missing_sprites.py, smogon_download.py and renameSmogon.sh?

@anhthang
Copy link
Contributor Author

anhthang commented Mar 3, 2026

I've added documentation for the new scripts so it's easier to understand and maintain the workflow going forward.

Once this PR is merged, issues #76 and #104 can be considered completed. The remaining missing sprites can be updated later once Smogon finishes adding them on their side.

"896": "glastrier",
"897": "spectrier",
"978": "tatsugiri-curly",
"658_2": "greninja-battle-bond",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intended the order is not maintained anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think the old script was actually maintaining the order either, but I can revert it to minimize changes and make the review easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @Naramsim raised I think it would be better to have this record csv as a part of the pipeline along with maybe a summary report too with categorization of sprites broken down like no. missing, and some diffs metrics and breakdown between back, front, shinies etc. The artifact upload step stores it as a zip so we could have some detailed records stored there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t looked into the pipeline yet, but I think the current report is good enough for now. We can definitely extend it later to include a CSV artifact and a more detailed breakdown once we align on the exact format we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i can write the pipeline after we merge this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

local pokemonName
local inputGeneration="${1:-6}"
local files="../smogon/gen$inputGeneration/"
local files="downloads/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it downloads now? we can just keep it smogon gen specific right, I think a little comment regarding files source img dir and how pokeapi sprites (i.e destination root relates to that in terms of placing directories).

Currently it attempts to go two directories up from downloads? the behavior is not clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s just the folder path/name used to store images downloaded from Smogon — there’s no logic change here. I only updated it to use the downloads directory, which is where smogon_download.py saves the images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright sounds good although I prefer stuff to happen relative to my root project dir but downloads work here if the download script uses it I suppose

else
echo "[+] Copying GMax $smogonName to $destination/$pokemonID.png"
cp "$smogonName" "$destination/$pokemonID.png"
mv "$smogonName" "$destination/$pokemonID.png"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mv consumes the original file so if the next if statement is triggered after this it will fail since mv will no longer find the original file, which will crash the script

# --- FALLBACK LOGIC: Search 'forms' using the full name ---
echo "[!] $pokemonName not found directly. Searching forms for Species ID: $id..."

speciesResponse=$(curl -sS "https://pokeapi.co/api/v2/pokemon/$id/" 2>/dev/null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the $id is being used to hit the species endpoint in the gmax if and here its in pokemon endpoint is that intended or can we directly hit the species endpoint because pokemon forms and species varieties give different results for example pikachu (25)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the Pokémon endpoint here to properly cover different varieties (e.g. Pikachu caps, Vivillon forms, etc.). Those can’t reliably be resolved from the species endpoint by name alone, since species and forms/varieties return different structures and data.

So this change ensures we correctly handle form-specific cases rather than just the base species.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the pokemon endpoint for say pikachu dosent give all the cap variants the species endpoint does that if you check the forms field its just pikachu there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old logic is still here (https://github.com/PokeAPI/sprites/pull/196/changes#diff-2074368bbccd08fa9ceeafb33f6af9f1f93044835d3cd67eb74eb4702b257cccR93). If the variants aren’t available from the Pokémon endpoint, we’ll fall back to the species endpoint to retrieve them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right that the old logic is still there (it hits the pokemon endpoint to check)

but in your fallback logic you are again hitting the pokemon endpoint itself did you mean to hit the pokemon-species endpoint?

# --- FALLBACK LOGIC: Search 'forms' using the full name ---
                        echo "[!] $pokemonName not found directly. Searching forms for Species ID: $id..."
                        
                        speciesResponse=$(curl -sS "https://pokeapi.co/api/v2/pokemon/$id/" 2>/dev/null)

because for pokemon endpoint for pikachu https://pokeapi.co/api/v2/pokemon/25

"forms": [
{
"name": "pikachu",
"url": "https://pokeapi.co/api/v2/pokemon-form/25/"
}
],

species endpoint https://pokeapi.co/api/v2/pokemon-species/25/

"varieties": [
{
"is_default": true,
"pokemon": {
"name": "pikachu",
"url": "https://pokeapi.co/api/v2/pokemon/25/"
}
},
{
"is_default": false,
"pokemon": {
"name": "pikachu-rock-star",
"url": "https://pokeapi.co/api/v2/pokemon/10080/"
}
},
{
"is_default": false,
"pokemon": {
"name": "pikachu-belle",
"url": "https://pokeapi.co/api/v2/pokemon/10081/"
}
},
...

or it might be smart to cover both forms and varieties I am not entirely sure how this distinction is draw in pokeapi in bulbapedia they just list everything as form diffs https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_with_form_differences

but in pokeapi we have pokemon -> forms and followed by species -> varieties I think we should cover both in this fallback logic? @Naramsim what are your thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's an old debate. Nobody know now the true difference between forms, variants, species.

import requests
import re

# The list of URLs you provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the google doc link url here or a pointer to the forum/discussion? its not clear where the urls are coming from, if its a public google sheet cant we just download it using requests etc and parse that instead of hardcoding urls? this script feels a bit redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to explain again here (I mentioned it in the doc too).

The URLs are image links from the Smogon spreadsheet. The issue is that some sprite links have wrong filenames, duplicate IDs, and a few are Imgur links that don’t follow the same naming format. Because of that, we can’t reliably auto-download and rename them with a consistent rule.

Since the content needs manual verification anyway, I added the links explicitly to avoid broken mappings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But once a download is done by the pr and merged those urls are no longer relevant right also can you share the spreadsheet url here I wasn't able to find it 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already added the Smogon thread in the latest commit — you’ll find all the spreadsheet links in that topic.

@FallenDeity
Copy link
Contributor

FallenDeity commented Mar 3, 2026

Also just another note some images just seem to be updated with no diffs at all https://github.com/PokeAPI/sprites/pull/196/changes#diff-fbf2451a323761b6e6ee4b4f289800136c2a33650eaa9c279ff54301bb13ed2c

Is this happening because the script is overwriting existing files during the form logic? if there is such a case maybe compare the image contents and if there is a diff log it and replace, if not just delete the <id>-<form_suffix>.png this way there will only be a deletion diff and not both a delete + replacement (the diffs wont be as noisy)

@anhthang
Copy link
Contributor Author

anhthang commented Mar 3, 2026

Also just another note some images just seem to be updated with no diffs at all https://github.com/PokeAPI/sprites/pull/196/changes#diff-fbf2451a323761b6e6ee4b4f289800136c2a33650eaa9c279ff54301bb13ed2c

Is this happening because the script is overwriting existing files during the form logic? if there is such a case maybe compare the image contents and if there is a diff log it and replace, if not just delete the <id>-<form_suffix>.png this way there will only be a deletion diff and not both a delete + replacement (the diffs wont be as noisy)

You're right — that's essentially what’s happening.

For a few images I’ve already noticed there are slight differences (just a few pixels changed), but I don’t remember all of them off the top of my head.

@anhthang anhthang requested a review from FallenDeity March 3, 2026 16:28
@anhthang anhthang force-pushed the add-missing-sprites branch from 6f923b5 to 08039c8 Compare March 3, 2026 16:31
Copy link
Contributor

@FallenDeity FallenDeity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed just two minor changes rest of the stuff looks good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

local pokemonName
local inputGeneration="${1:-6}"
local files="../smogon/gen$inputGeneration/"
local files="downloads/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright sounds good although I prefer stuff to happen relative to my root project dir but downloads work here if the download script uses it I suppose

# --- FALLBACK LOGIC: Search 'forms' using the full name ---
echo "[!] $pokemonName not found directly. Searching forms for Species ID: $id..."

speciesResponse=$(curl -sS "https://pokeapi.co/api/v2/pokemon/$id/" 2>/dev/null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right that the old logic is still there (it hits the pokemon endpoint to check)

but in your fallback logic you are again hitting the pokemon endpoint itself did you mean to hit the pokemon-species endpoint?

# --- FALLBACK LOGIC: Search 'forms' using the full name ---
                        echo "[!] $pokemonName not found directly. Searching forms for Species ID: $id..."
                        
                        speciesResponse=$(curl -sS "https://pokeapi.co/api/v2/pokemon/$id/" 2>/dev/null)

because for pokemon endpoint for pikachu https://pokeapi.co/api/v2/pokemon/25

"forms": [
{
"name": "pikachu",
"url": "https://pokeapi.co/api/v2/pokemon-form/25/"
}
],

species endpoint https://pokeapi.co/api/v2/pokemon-species/25/

"varieties": [
{
"is_default": true,
"pokemon": {
"name": "pikachu",
"url": "https://pokeapi.co/api/v2/pokemon/25/"
}
},
{
"is_default": false,
"pokemon": {
"name": "pikachu-rock-star",
"url": "https://pokeapi.co/api/v2/pokemon/10080/"
}
},
{
"is_default": false,
"pokemon": {
"name": "pikachu-belle",
"url": "https://pokeapi.co/api/v2/pokemon/10081/"
}
},
...

or it might be smart to cover both forms and varieties I am not entirely sure how this distinction is draw in pokeapi in bulbapedia they just list everything as form diffs https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_with_form_differences

but in pokeapi we have pokemon -> forms and followed by species -> varieties I think we should cover both in this fallback logic? @Naramsim what are your thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is required can be simplified to a jq command like this which we can put in README, or run in the pipeline if required

jq 'to_entries | sort_by(
  .key |
  (capture("^(?<n>\\d+)(?<r>.*)") // {n: "999999999", r: .}) |
  (.n | tonumber) as $m | .r |
  if . == "" then [$m, 0, 0, ""]
  elif test("^_\\d+$") then [$m, 1, (.[1:] | tonumber), ""]
  elif test("^[a-zA-Z]+$") then [$m, 2, 0, .]
  elif startswith("_") then [$m, 3, 0, .]
  else [$m, 4, 0, .]
  end
) | from_entries' forms.json > sorted_forms.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. run in pipeline is good idea and @Naramsim can do it I believe

@anhthang
Copy link
Contributor Author

anhthang commented Mar 5, 2026

@FallenDeity Thanks for pointing that out about the rename script. The first call is by name and the fallback is by dex number on the same endpoint. In my mind the fallback should use ‎⁠pokemon-species⁠ to pull varieties, but it seems to work. Let me double-check that logic again.

Edit: pokemonName comes from the pokemon-species endpoint.

  • To determine the correct Pokémon ID, we first attempt to fetch the Pokémon using the variety-name. If it exists, we use the returned Pokémon ID.
  • If it does not exist, we fetch the Pokémon’s forms using the Pokémon ID and check whether the form-name matches. When a matching form is found, the image is renamed using the format <id>-<form-name>

@Naramsim
Copy link
Member

Naramsim commented Mar 6, 2026

Alright, @anhthang is this mergeable? I don't know if there's more work to do.

@anhthang
Copy link
Contributor Author

anhthang commented Mar 6, 2026

Yes please merge the PR @Naramsim, then I will sync this change to generation-v folder and do rename varieties for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants