Dragon „Hackst“ IV

On this page I will incrementally collect all the information what I currently (2020-11-27) found out looking in Dragon Quest IV Playstation 1 Remake (ドラゴンクエストIV 導かれし者たち) data. If I can not continue this project, it may help others to get into it. You can contact me for questions, collaborations or hints.

Related source code is on GitHub. These is also a related romhacking.net forum post.

Getting the data

I took the japanese game from coolrom. Using cebix’s psximager, the *.bin file can be extracted. You will find three files on the disk:

  • SYSTEM.CNF (68 Bytes) – just a small config what executable will be started
  • SLPM_869.16 (692.2 KB) – the PS-X EXE executable
  • HBD1PS1D.Q41 (319.4 MB) – the game’s resources

The company Heart Beat Inc. both implemented Dragon Quest IV (DQ4) and Dragon Quest VII (DQ7) for the PlayStation. This is why HBD1PS1D.Q41 could mean Heart Beat Disc/Data 1 for PlayStation 1 Dragon Quest 4. Understanding this archive would allow to extract resources and also could help to translate it from japanese to english. Similarly, DQ7 has the file HBD1PS1D.Q71.

Other attempts in the past

In 2008 the user Kojiro started an initiative to translate Dragon Quest IV. However, the site is down and it seems that there was no progress. Using the Wayback Machine it is maybe helpful to look into their forum.

In 2010 the question of a translation was stated in the Dragon Den’s Forum, but without any results.

2012 a user named rveach got very far with Dragon Quest VII for a French community.

The user loveemu extracted in 2014 from the HBD1PS1D.Q41 file the music. See also the user’s Dragon Quest VII: Sound Engine Analysis. The tool psdq7rip that extracts the sound files from HBD1PS1D file just scans for sound data and does not extract the archive completely.

In 2000, Tonura mentions that in the DQ7 file there are monster images at a certain position but compressed with LZ algorithm. The Game Lab magazin (ゲームラボの記事) is mentioned here. Seems to be that the October 2000 volume is the right one.

Understanding HBD1PS1D file

HBD1PS1D.Q41 has a size of 319436800 bytes. It perfectly divides by 2048 byte blocks: 319436800 bytes file size / 2048 bytes = 155975 blocks. In fact, when we visualize each byte of the file as a gray pixel on a 2048 x 155975 bitmap, we see certain patterns:

The pattern shows that some resources consist of more than one 2048 bytes block. We call them * 00 00 00 blocks, since they always start with this pattern. In the middle of the file are white spots: these resources have another header than the other resources. We call them 0x60010108 blocks, since they always start with this pattern.

The first 2048 bytes

Hex view of the first 2048 bytes

The very first 2048 bytes of the HBD1PS1D (the first block) is different to the following blocks. It is noticeable that the ASCII string „hdb1ps1d.q41“ is exactly at position 0x400 (1024). Maybe we see here two 1024 byte blocks. There are rarely 0x00 bytes which is why I guess that we do not see short or int numbers here. Decompressing does not show any good data and a check with several japanese text encodings does also not show text.

When we compare DQ4 and DQ7, we see that the first 2048 bytes (until address 0x800) is nearly identical. It only is different in the string „hdb1ps1d.q{4,7}1″. That means, that this very first block is not dependent on the different data these games have.

Comparison between DQ4 and DQ7 beginning of HBD1PS1D file. In the picture we see it starting from address 0x740.

The * 00 00 00 blocks

These are data blocks it seems. The name comes from the fact that they always start with a integer that is small, because it states the number of sub-blocks (at max maybe 18). This * 00 00 00 block’s header is 16 bytes in length. The integers and shorts are little-endian.

StartLength (bytes)Comment
0x004The number of sub-blocks this block has.
0x044The number of 2048 byte sectors the block consists of.
0x084The total data length (raw data without the header information).
0x0c4Always zero. Maybe the previous is not an integer, its a long value.
The * 00 00 00 block header (16 bytes).

At the beginning of a 2048 byte block this header tells us, how big the block truely is. The total data length is awalys smaller and filled with 00 bytes until a 2048 byte sector is completed. I guess this is because of reading performance. There are 3243 of these blocks when we read through the whole file. But this block consists of sub-blocks.

The * 00 00 00 sub-blocks

Each sub-block is described with a 16 byte header. Thus, we first have to read <number of sub-blocks> times the following header information. The i-th header information (zero-indexed) is at 0x10 + (i * 16 bytes).

StartLength (bytes)Comment
0x004The data length of this sub-block. If the parent block has only one sub-block, this has the same size as the parent block.
0x044If the data is compressed, this is the uncompressed data length. If it is not compressed, than the previous integer is the same.
0x084Unknown
0x0c2It seems some flags. Most of the time 0, but in 25% of the cases 1280. If it is 1280, then uncompressed data length is always bigger than the data length. Thus, I assume that this indicates if the sub-block’s data is compressed.
count | prop. | decimal value
18001 | 75,546% | 0
5821 | 24,429% | 1280
0x0e2Also some flags or maybe a type information for the sub-block. It can be of the following values: [1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 31, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]. E.g.: in 21 we always find qQES data.
The sub-block header (16 bytes).

After the main block header (16 bytes) and the sub-block headers (a multiple of 16 bytes) comes the raw data. The sub-block’s data length sum is equal to the main blocks total data length. For each sub-block we can extract its data array.

Writing the blocks with their sub-blocks (using tab) as a tree and also providing an index for each block at the beginning, it looks like the following. I also searched for the „pQES“ substring in the sub-block’s data and marked the sub-block as compressed when 0x0005 value is present at 0x0c. Maybe each block is a scene or level with sub-block data to render it.

Sub-Block Types

TypeCount [with duplicates](Proportion) / Count DistinctCompressedComment
16 (0.03%) / 2noFont Images
61730 (7.26%) / 691yesmulti-image, Chipset Images, maybe textures
71458 (6.12%) / 518noHas some patterns but maybe is not an image, (width=20)
8309 (1.30%) / 272yesMonster sprites and battle effects images, (width=128 or 256), TIM files with header 0x10000000 09000000 or 08000000
9473 (1.99%) / 444yesSeems to be images (width=512 maybe), looks like gradients
10256 (1.07%) / 221yesMulti-Image data, monster sprites, multi-TIM with header 0x10000000 09000000
11309 (1.30%) / 201yesUnknown, sometimes contains character sequences at the start (e.g. from a – z), maybe ints that are counted up (hex editor width=8)
12309 (1.30%) / 286nodata has these white (0xff) patterns (width=32)
13970 (4.07%) / 375yescharacter (NPC) sprites images, (width=128), starts with 0x0c000000
1444 (0.18%) / 35yescharacter (NPC) sprites images, starts with 0x0c000000
152 (0.01%) / 1yesbigger character (NPC) sprites images (width=128)
175 (0.02%) / 1yeshorse sprites images, starts with 0x0c000000
185 (0.02%) / 1yessprites images, starts with 0x0c000000
19141 (0.59%) / 96yesbattle effects (fire, slashes and so on) images
203 (0.01%) / 1noqQES format
213317 (13.92%) / 278noqQES format
2272 (0.30%) / 2noqQES format
2344 (0.18%) / 12nomaybe image data (has these vertical black lines)
241062 (4.46%) / 485noqQES format
2527 (0.11%) / 3yesImage, background texture, in tiles, width=256
26573 (2.40%) / 454nosmall size, has also a bit these 0xff patterns, maybe level data? (hex width=4), numbers counting down
311025 (4.30%) / 885nosmall size, patterns, maybe level data?, some differ only by small changes
3232 (0.13%) / 8nostarts sometimes with printline output text and has japanese text, e.g. シナリオ (path=350/6), start with 0x081f0180
34975 (4.09%) / 957nosmall size, patterns, maybe level data?, some differ by small changes only
351576 (6.61%) / 583nosmall size, patterns (hex editor width=4)
361506 (6.32%) / 238nosmall size, patterns (hex editor width=4), rows starts with 0x0*00
371033 (4.34%) / 298nosmall size, patterns (hex editor width=2)
381377 (5.78%) / 392nosmall size, nearly same patterns
39976 (4.10%) / 927yes (but some are not)has these CCC (0x00434343) patterns often
401315 (5.52%) / 893nomaybe multi data, has the „text“ header
411730 (7.26%) / 240nosmall size, mostly 0x00, small diffs
42213 (0.89%) / 213nohas the „text“ header
4324 (0.10%) / 23yessprite images, starts with 0x0c000000
44152 (0.64%) / 29no (but some are)has sometimes error messages, maybe scripts? (path=26022/8 -> エンカウントOFF) , (path=637/2 -> job names in japanese), (path=350/11 -> こうげき [attack]), (path=26024/17 -> RIGHT), (path=26027/4 -> メモリーカード), (path=26028/3 -> start menu script?), (path=26022/0 -> battle magic names?)
45140 (0.59%) / 88nosome have „{buki,majinyobi} open NG“ ascii header
46612 (2.57%) / 118yes (but some not)contains error messages and japanese text, maybe also messages, start always with 0xe8ffbd27, contains japanese text of inn npc, shop npc etc. some have DQ41章2章 etc. ,
path=596/8 -> MPが  ふえたHP
4727 (0.11%) / 1nocontains error messages: c a n ‚ t . g e t . n e w _ f m a p ! ! ( % d ) ( m a x = % d )

The 0x60010108 blocks

These blocks make the white spots explained with the gray image above. They are always 2048 bytes in size and start with 0x60010108. In the image they are often „white“, because their data contain trailing 0xFF bytes.

The header seems to be 32 bytes in length.

StartLength (bytes)Comment
0x004Always the 0x60010108 „magic number
0x042An index ranging always from 0 to 4, +1 per block.
0x062Seems to be a count number which is always 5. The previous number loops from 0 to 4, thus has five values.
0x084A kind of part number that counts up after the index at 0x04 reaches 0 again. (1-indexed).
0x0c4An integer
0x102Always 0x8000 which is 128
0x122Always 0x7800 which is 120
0x144Unknown number, maybe flags? Forth byte is always 0x38
0x182Can be 0x0100, 0x0200 or 0x0300
0x1a4Always 0x03000000
0x1e2Always 0x00
The 0x60010108 block header (32 bytes)

There are 26635 of these blocks. When we put the parts (at 0x08) together, for example, from part 1 to 195, we get 40 entries.

I checked: it is not japanese text in some japanese encodings.

Maybe we see here some soundfonts, sound effects or wave files?

Find Japanese Texts

The Japanese Industrial Standards (JIS) tell us, how the japanese text can be encoded in data. It seems to be that the Shift-JIS standard is used in Dragon Quest where every letter has 2 bytes. You can spot Hiragana in the data by looking at two bytes where the first is 0x82. If this happens very often, there is a hiragana sequence.

In the first chapter there is already some dialog. This can help to find the position of the text in the data, if it is not compressed. I just set the hero’s name to „ああああ“.

The dialog is:
どうした? <Heroname>。
もう降参かい?
そうだな。今日は このくらいに
しておこう……。
私の役目は はやく お前を
一人前に 育てることだが
あせっても しかたあるまい。
ちて もどるとするか。
<Heroname>も 家で ゆっくり
休むといいだろう。
勇者さま 勇者さま……。
勇者さま どうか たすけて……。

Unfortunately, words of the text above can not be found using typical japanese encodings in the non-image blocks.

Use Dragon Quest VII to find the text

Good news are that Dragon Quest VII is available for both English and Japanese. The English version has the HBD1PS1D.W71 file, while the Japanese version has the HBD1PS1D.Q71. We search for the data blocks that differ the most, because they had to change the dialog texts completely. Based on the sub-blocks type information I compared them all using hashes on the binary data. In cases where the hashes do not match, I assume that they changed a lot. The most suspicious are listed in the table:

(has some header data, than there comes the bytes, jp is always larger in size)
type 23: 1911 sub-blocks vs 1911 sub-blocks
877 distinct hashes vs 877 distinct hashes
877 en no match vs 877 jp no match
(same header like 23)
type 24: 167 sub-blocks vs 167 sub-blocks
167 distinct hashes vs 167 distinct hashes
167 en no match vs 167 jp no match
(same header like 23)
type 25: 37 sub-blocks vs 37 sub-blocks
37 distinct hashes vs 37 distinct hashes
37 en no match vs 37 jp no match
(same header like 23)
type 27: 7 sub-blocks vs 7 sub-blocks
6 distinct hashes vs 6 distinct hashes
6 en no match vs 6 jp no match
(beginning is same but then it changes in some byte, need to look deeper, En size smaller than Jp size, seems like scipts, some output text is translated, looks like multi-data [image-data])
type 31: 2597 sub-blocks vs 2596 sub-blocks, DIFF Unknown
1772 distinct hashes vs 1771 distinct hashes
1769 en no match vs 1768 jp no match

Interestingly, type 23, 24, 25 and 27 share the same header. A closer look reveals that in fact the binary data changes a lot. What we can learn from that is how the sub-blocks looks that contain very probably text. We can use this knowledge to search through the types of DQ IV again. It turns out that type 40 and 42 has the same header information. We will call them text-blocks.

Text-Blocks and their huffman coding

StartLength (bytes)Comment
0x004An offset that points to the near end of this block. We call it „a“. If we read there a 4-byte int, it has the same value.
0x044Unique ID. If ID is same, code is also the same. The first scene in the game has ID 0x0000006C. IDs are never longer than 2 bytes.
0x084An offset. We call it „c“ for now. It will tell us where to read the „huffman code“ (begin index).
0x0c4An offset. We call it „d“. Sometimes 0. Tells us where to read the „huffman tree“ (end index).
0x104An offset. We call it „e“ for now. It will tell us where to read the „huffman code“ (end index).
0x144Seems to be often zero.
Header information of the text-blocks.

The text-block header has six 4-byte integers. The offset „c“ is very often 24, so I assumed that is a start point of a byte array after the header information in the block. „c“ < „e“ < „d“ (if d is not zero) is always true.

The „c-e range“ covers the huffman code (the bit information) that has a length which is always a multiple of 4 (it has trailing zeros). After „e“ comes a 4-byte int, antoher 4-byte int and a 2-byte short. The next „e-d range“ contains the huffman tree: the actual japanese letters and some node information. There is also the „d-a range“. If d is zero then this range does not exist and we set d to a (which means the „e-d range“ becomes „e-a range“). At „a“, after the same 4-byte int value, there is a 4-byte int counter (but often zero). If not zero we can read counter * 8 bytes and reach the end of the block.

A concrete example: This text-block seems to contain the text ダミー (en: dummy).

Two ranges seem to be most relevant: the huffman code and the huffman tree. In the huffman tree range are byte-pairs with the actual japanese shift-JIS letters. If the first byte is not 0x80 (indicates a node) or 0x7f (indicates a control character), we have to swap the bytes and add 0x80 to get the actual letter, for example, 0x835f which is ダ. In case of 0x80, it is a node with a number. The 0x7f** are special control characters such as new line, end of text or <Heroname>. Each letter always occurs only one time. In order to uncompress the huffman coding, we have to understand the trees first.

Various huffman trees from different dialog sections.

The text decoding routine

Left the first dialog in the game. Right the decoded text starting at 0x800F4DEB in PSX RAM. At the top the Shift-JIS encoded string to check that it encodes in fact そうだな etc.

In order to draw the text to screen, the dialog has to be decoded resp. decompressed first. Dialog text is decoded line-by-line and written to RAM at 0x800F4DEB (every time it is overwritten). You can define a memory breakpoint using PCSX 1.5 with Debugger to see it. The preceding 0x50710F80 value is only in RAM when the textbox is shown, otherwise 0x00000000.

In the PSX-EXE the decoding routine seems to start at 0x8008F3BC. It is called at 0x8002D084. When the routine returns, one decoded 2-byte Shift-JIS letter is in r2 (register 2). r3 and r6 are pointers to the huffman tree. r6 points to the start and r3 points to the middle of the tree. These two different offsets are used for the left and right side of a branch. The routine uses the offsets to decide where to jump next. The node number (e.g. 0x8149 minus 0x8000 = 0x0149 = 329) tells us how far we have to jump to reach the next node or leaf. With this information we can construct the trees.

Part of the decoding routine. Jumps along node pointers until a letter is found.
The huffman tree of the first scene in the game.

The huffman code is the actual text but encoded in bits. By decoding the bits and traversing along the tree (0 = left, 1 = right), we get the japanese text. For each byte we read from right bit to left bit.

The decoded text of the first scene. The first dialog is on the bottom.
Control Character (0x7f**)Meaning
0x7f1fHeroname
0x7f02New Line + Tab
0x7f0b
0x7f04
0x7f0aBlinking Cursor, Wait for User Input
0x0000End of Text
Control Characters and their meaning.

Translation Preparation

In order to translate the dialog texts, we have to extract them. This is done by the translationPreparation method in the HBD1PS1D class. For each distinct textblock (using the unique IDs) we create a csv file with the name of the ID in hex (e.g. 006C.csv). Each row of the CSV file is a dialog segment, usually ending with the {0000} control character. The CSV is used to enter the translated text per dialog segment.

The 006C.csv (first scene) file content

An additional offset information in the third column of the CSV tells us the byte offset from begin of the text textblock to the byte where the huffman coding starts. This way we can tell by the byte-distance what segment will be loaded in the game. Once we replace all the huffman code with translated texts, the start offsets of the segments are also changed (because the translated text is in bits not exactly as long as the original text). When the game wants to load a certain dialog (e.g. on offset position 0x01F2) we have to correct this offset to the actual start of the translated text. That is why we keep the original offset in order to map it to the new one later.

The extraction creates 925 files (2.8 MB). They can be downloaded in a ZIP file with the following link.

Translation Embedding

In the class TranslationEmbedding the actual embedding happens. My idea is that a translator creates a folder and puts translated files (see above) in it.

A translation is entered in the first line of 006C.csv.

Only the corresponding textblocks will be patched in the HBD1PS1D file. However, it is also necessary to patch the SLPM_869.16 PS-EXE file, since as already mentioned we have to also change the loading of the dialog start point. While it seems to be rather simple and also stable to replace the textblocks‘ huffman codes and trees, the patching of the PS-EXE file is trickier.

When we debug the game at position 0x8008EBA4 (a nop) it seems to be the perfect position to correct the dialog pointer.

Breakpoint at 0x8008EBA4 before the dialog is loaded.

This breakpoint is right in the middle of two things: (a) it is after a dialog pointer is written to RAM (at 0x800F4E40 where now 0x5AD91815 is stored) and (b) before the huffman code is decoded to actual letters. The dialog pointer is used by the decoding routine to know where (byte and bit position) the dialog should be decoded. For example, the 0x5AD91815 in RAM is read to a register as 0x1518D95A. Using only the right 3 bytes (0x0018D95A) and adding 0x80000000 to it, we get the location of the dialog beginning of the huffman code byte in RAM: 0x8018D95A. The remaining 0x15 part, namely the left 4-bit 0x1 tells us the bit position in the byte (0 – 7). The 0x*5 I currently don’t know what it could mean.

I’m unsure if this is always the case, but it helps us to get all information to do the actual replacement of the dialog pointer with the correct one: In register r5 we find 0x8018D768 which is the start position of loaded text block. In register r9 is 0x800F4DE4 which is the start of dialog region in RAM. In register r16 is 0x800F4DE8 which is the start of dialog textbox (this „0x50710F80“). Using r5 we can get the textblock ID and using r9 or r16 we can get the dialog pointer.

I used the position 0x8008EBA4 (a nop) to replace it with a jump operation in order to do the pointer replacement in a separate code section. I decided to overwrite at the beginning of the PS-EXE file where debug messages are stored with the hope that overwriting those will not break the game. In PS RAM at 0x80017F00 program PS-EXE seems to start. I placed the extra code at 0x8001D4CC.

The idea is to use always relative (not absolute) byte distances to find out what dialog segment is loaded in order to replace it with the correct distance. First, the (deprecated) dialog pointer is stored in register r23 and the byte difference (distance) from the textblock begin to the pointer is calculated. In r22 the textblock ID is stored. With the textblock ID and the byte distance we can do a sequence of if-else operations to find the matching one. Pseudo-code is:

if(textblockID == 0x006C) { //first scene
  if(byteDist == 0x0018) { 
    byteDist = 0x0018;
    bitIndex = 0;
  } else if // etc ... 
  
  } else if(byteDist == 0x01F2) { //first dialog segment
    byteDist = 0x0069;
    bitIndex = 2;
  }
}

We replace the distance accordingly and also set the correct bit index. The last part of the routine is putting this together in the dialog pointer and stores it at the RAM position were the old pointer was. This way we make sure the huffman code decoding routing which is coming after this starts at the right bit and byte from the huffman code.

The first dialog successfully replaced with the translated text.

We made sure that huffman tree and huffman code is patched correctly in the textblock. The huffman tree contains japanese letters (Shift-JIS) but our familiar letters (a- z resp. A- Z). Because this is already part of the font, the dialog works out-of-the-box. Playername, system text (memory card loading text) and dialog options have to be patched by another way.

Download the assembler-analysis spreadsheet (open office document) for more details:

Translated Text from Dragon Quest IV Mobile Version

If you download the game’s data, you will also get a main.11029.com.square_enix.android_googleplay.dq4.obb file. The obb file is actually a zip file. It contains the game’s assets.

In the msg folder we find for each language a folder: en, ja, ko, sl, tl. All of them have 201 files that are named similarly. They are UTF-8 encoded.

Left the Japanese file and right the English file.

If you align them in a text editor based on these @ annotations, you get the translation of each dialog textblock. With some code I guess we can easily get all translations from Japanese to English. Be aware that the mobile version uses accents.

LZS Compression Scheme

Some subblock data is compressed with the LZS algorithm. Using this helpful documentation it seems I guessed the LZS scheme correctly.

The buffer has a size of 4096 bytes. The control byte has to be read bit-wise from right to left. A 1-bit indicates a literal byte. A 0-bit indicates a reference. The reference is 2-bytes in length, e.g. 1110101111110000 . The last four bits describe the length. To the length has to be added +3 to get the correct length. In our example 0000 is the length which is 0 in decimal value, plus 3 it is the correct length of 3. The offset is combined in the following way: The remaining three parts p1=1110 p2=1011 p3=1111 are combined to p3 p1 p2 which creates: 1111 1110 1011 . The decimal value of our example offset is 4093 (little-endian). To the offset has to be added a value of +18, so we get the correct offset of 4093. Since our buffer has a size of 4096 and our example length is 3 bytes, the last three bytes of the buffer are written to the output stream, in the given example. Maybe it can happen that the offset value overflows the buffer size which is why you should put % 4096 (modulo the buffer size). Read until the compressed data is read completely.

The algorithm is implemented in the DQLZS class.

Doing this for subblocks of type 8 we can decompress the TIM files without an error.

A decompressed TIM file of DQ4.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.