0 votes
by (120 points)
This article is incomplete.Please feel free to edit this article to add missing information and complete it. Reason: French, German, Italian, and Spanish character encodings.

The Generation I games use a proprietary character encoding to store text data. Versions of the games in different languages may use different encodings, some more different than others.
Fixed-length user-input strings are terminated with 0x50. If a fixed-length string is terminated before using its full capacity, the contents of the remaining space are not specified.

Contents

1 Character sets

1.1 English

1.1.1 Tilemap sections
1.1.2 Character codes

1.1.2.1 Dialogue control codes
1.1.2.2 Variable control codes
1.1.2.3 Text control codes

1.2 Japanese
1.3 Japanese control characters

Character sets
Note that 0x7F is a space (" "), not empty. All characters that are not control characters print in one character.
In some contexts, some characters may display differently than suggested below. For example, in the character input table, ED is 0xF0 instead of the Pokémon Dollar symbol, and in the Pokédex (in English), اللعب عبر الانترنت مجانا the feet (') and inches (") marks are 0x60 and 0x61.

English
Bytes with a dark gray background are not used normally in the English games. Characters with a light gray background are holdovers from the Japanese game but that are not used in the English game.

-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-A
-B
-C
-D
-E
-F

0-

NULL

1-

Junk

2-

3-

4-

Control characters

5-

6-

A
B
C
D
E
F
G
H
I
V
S
L
M
 :



7-



"
"






Text box borders

8-

J
K
N
O
P

9-

Q
R
T
U
W
X
Y
Z
(
)
 ;
[
]

A-

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p

B-

q
r
s
t
u
v
w
x
y
z
é
'd
'l
's
't
'v

C-

D-

E-

'
PK
MN
-
'r
'm
 ?
 !
.








F-

×
/
,

0
1
2
3
4
5
6
7
8
9

In the Japanese games (as can be seen below), 0xF2 is distinguishable from 0xE8, with the former meant as a decimal point while the latter is punctuation. Presumably this intention was largely inherited when the English games were made, as most of the game's script uses 0xE8 exclusively; however, 0xF2 appears in the character table for user input, meaning it may appear in user-input names (and, conversely, 0xE8 never should).
The full list of characters that are available for user input are: A-Z and a-z, space, and the following: ×():;[]PKMN-?!♂♀/.,.

Tilemap sections
The game sections off various areas of the tilemap loaded into VRAM and each character code directly corresponds to a tile in the tilemap. Not all tiles in the tilemap are accessible via character code, but many are.

VRAM addresses 0x9000 to 0x9480 correspond to a portion of the current tileset of the map. Character codes 0x01 to 0x48 and 0x4D directly correspond to them. For example, while the player is outside, tile #3 is the animated flower so character code 0x03 will place the animated flower in text, but in other locations (such as in battle or in a cave), a completely different tile will be displayed.
Characters 0x49 - 0x5F are also in this same section, but with the exception of 0x4D, they are control characters that link to code rather than the tile they would normally correspond to.
VRAM addresses 0x9600 to 0x97F0 partially corresponds to characters 0x60-0x7F. This is where the user interface tiles are stored, such as bold letters and tiles that are used to draw borders for text boxes and menus. The space character is also in this range. These tiles can sometimes change, meaning that characters that reference them may print out a different tile image; however, they are far more consistent than tiles in the 0x9000 to 0x9480 range.
VRAM addresses 0x8800 to 0x8BF0 corresponds to characters 0x80-0xBF. This is where the main font is placed when rendering text.
VRAM addresses 0x8C00 to 0x8DF0 are split into 2 tile sections:
The range 0xC0-0xDF is reserved for certain areas that need extra space for extra tiles. As such, they are usually unoccupied, so normally only print blank characters. The player info screen is an example of a screen that uses some of this space.
The range 0xE0-0xFF includes numbers, some symbols, and more user interface characters. The player-enterable characters PK, MN, and gender symbols are also stored here.
Character codes
Character codes are within the 0x49-0x5F range, with the exception of 0x4D which defaults to tile 4D.
Control characters work by intercepting the tile that would normally correspond to the control character and instead perform a different action whether it be end the text or print a lengthy message.

Dialogue control codes
These control codes control dialogue text placement, paging, etc.

0x49 - "page" - Begins a new Pokedex page
0x4B - "_cont"- Stops and waits for confirmation before scrolling the dialogue down by 1
0x4C - "autocont" - Scroll dialogue down 1 without waiting for confirmation
0x4E - "next line" - Move a line down in dialogue
0x4F - "bottom line" - Write at the last line of dialogue
0x50 - "end" - Marks the end of a string
0x51 - "paragraph" - Begin a new dialogue page with button confirmation
0x55 - "cont" - A variation of 0x4B and 0x4C
0x57 - "done" - Ends text box
0x58 - "prompt" - Prompts to end textbox
0x5F - "dex" - Displays a period and ends the Pokédex entry
Variable control codes
These control codes print text defined elsewhere.

0x52 - "players name" - The player's name
0x53 - "rivals name" - The rival's name
0x59 - "target" - In battle, the target of a move. If the dialogue is referring to the opponent's Pokémon, "Enemy " will be prepended to the Pokémon's name; if referring to the player's Pokémon, it will just display the Pokémon's name. Outside of battle, it will retain the last value that was stored in it in-battle.
0x5A - "user" - In battle, the user of a move. Just like "target", "Enemy " will be prepended to the name of opposing Pokémon.
Text control codes
These control codes print a hardcoded string. They are used to decrease the number of bytes to write common strings while still rendering as the correct number of characters.

0x4A - "pkmn" - Prints "PKMN"
0x54 - "poke" - Prints "Poké"
0x56 - "......" - Prints 2 ellipses, "……"
0x5B - "pc" - Prints "PC"
0x5C - "tm" - Prints "TM"
0x5D - "trainer" - Prints "TRAINER"
0x5E - "rocket" - Prints "ROCKET"
Japanese
Technically all characters under 0x60 are control characters, the majority of which have the behavior of causing a specific character from the main font (0x80-0xFF) to be printed with a diacritic in the space above it. Those characters that have different, more complicated functions are detailed below.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
Welcome to Newpost Q&A, where you can ask questions and receive answers from other members of the community.
...