Tuesday, April 21, 2020

Fixing NES Headers and Converting them to NES 2.0 : Putting Theory into Practice!

In my last blog entry, I announced the creation of an evolving database of NES ROM headers, focused on cartridge accuracy.  However, while I can make a spreadsheet for easy accessibility, spreadsheets are not the best way to organize data for use by other programs.  I cannot expect someone wanting a full set of proper NES 2.0 ROMs to manually edit the headers of over 2,900 separate files!

There has to be an easier way, right?
The task of manual fixing isn't slight.
Well, if you read further now,
I'll be happy to tell you how.


The most accurate and complete database for NES ROMs is the NES 2.0 XML Database.  A download of the most current version can be found at that link.  The Database almost completely covers the No-Intro NES set (3,012 entries as of 04-05-2020) and a good deal of the last GoodNES set (22,096 entries as of v3.23b).  As of 04-21-2020, the NES 2.0 XML Database contains over 4000 entries and may continue to grow.

This XML file uses an entry for each game and markup language that looks something like this :

>game<>!-- Licensed North America\Super Mario Bros. 3 (rev1).nes --<
>prgrom size="262144" crc32="A0ED7D20" sha1="46A9ACFC0B2F7C891A90A104D0EA803F96330CED" sum16="8580" /<
>chrrom size="131072" crc32="C2928C49" sha1="2697D1F21B72A6D8E7D2A2D2C51C9C5550F68B56" sum16="6166" /<
>rom size="393216" crc32="2E6301ED" sha1="BB894D104C796F69BA16587EB66C0275F5C2FC02" /<
>prgram size="8192" /<
>pcb mapper="4" submapper="0" mirroring="H" battery="0" /<
>console type="0" region="0" /<
>expansion type="1" /<
>/game<

Now you can obtain most of that information from my spreadsheet in database form in a manner more pleasing to us human beings.  This information is enclosed in tags to set it apart from the next entry in the list, but when the list is over 4,000 entries, it is hard to get a view of the "big picture".

The tags permit a program to access that data and use it.  Scripting languages function as higher level than using C or assembly to perform a function.  One such scripting language is called Python, which is very flexible.  To use Python, you need to download a script interpreter which will read a Python script file and interpret the commands into something the OS and the CPU can understand.  

Before the Database was publicly announced, someone else was busy trying to take that information and use it in a tool that could modify NES file headers to match the data contained in the Database.  The process requires generating a hash of a ROM without including the header bytes, comparing that hash to the database, and if a match is found, replacing the existing header with the one in the Database.  There are been programs with their own GUI like Clrmamepro and ROMcenter which require external .dat files to audit and rename ROMs.  Those programs can be rather intimidating for many people.

A Python script was written to avoid the hassle of programming and maintaining a full-fledged program.  The script is fairly limited in what it can do, but it was designed to do only a few things.  The first thing it can do is to fix the header of any ROM it recognizes.  If the header is already correct, it does nothing. It can also add headers to headerless ROMs if the binary hashes match that of the Database's hash.

The python script, nes_fix_headers.py, is a file and requires the nes20db.xml to be in the same directory.  It is run on the command line as follows : \python nes_fix_headers.py  The script file is recursive, it will check all subdirectories below the directory where the script and database files are located.  Just make sure the option START_PATH = '.' in the script if you want it to start with the directory the Script and Database are located.

The Script file has a few options which you can edit by opening the script file in a text editor and adjusting settings.  The script will not actually alter any ROM files unless the variable "TRIAL_RUN = 0".  Change that variable, save the script file with the change, run the script, and depending on how many files are being examined, be prepared to wait a minute or two for the script to do its job.  The Script also has support for command line arguments.  To get a list and description of the arguments, run the Script as follows : python nes_header_repair.py --help.

The Script file can output what it has done or would do to your files via the "redirect" symbol > to a text file.  The command becomes something like \python nes_fix_headers.py > changes.txt  This will tell you the ROMs it recognized and changed and the ROMs it did not recognize.  It is not quite as fully featured as an auditor program because it cannot easily tell you which ROMs are missing from the database on your system.

There is an option to just modify the headers to use iNES 1.0 header features, "NES_20 = ".  This will not add any NES 2.0 features, which almost defeats the purpose of the header fixing tool because some games will not work correctly or at all without NES 2.0 features.  Another option, "SORT_UNKNOWN = ". will move all ROMs which the database does not recognize into an "unsupported" folder.  This can help you find bad ROMs or ROMs the Database does not (yet) recognize.

The way the Script handles UNIF also deserves a mention.  The Script will try to convert UNIF-format ROMs into NES format ROMs.  UNIF-formatted ROMs can have file block tags inserted into the ROMs outside of the UNIF header.  The Script will calculate the hash of the ROM without the UNIF header and the UNIF file block tags, and if a match is found to the Database, it will strip out all the UNIF metadata and add a NES 2.0 header.  If you wish to preserve your UNIF-formatted files, make a backup copy first.  For unrecognized UNIF ROMs, you can use the option "MARK_UNHEADERED = 0" to remove all the UNIF stuff and leave it as a bare binary.  UNIF files must have a .nes file name extension, you can use a tool like Bulk Rename Utility to rename files with the .unf or .unif extension to .nes.

Some NES ROMs come with some extra data at the end of the ROM which is truly junk information such as the date the game was dumped and the name of the dumper.  The option "TRIM_UNKNOWN_DATA" can eliminate that data. 

If you open the XML file in a text editor, and I recommend Notepad++ for this task, you will find that each game entry has a name which will correspond to neither the ROM's GoodNES or No-Intro name (if the ROM is included in either set).  Japanese games will use Japanese characters if the original title was in Japanese, and likewise for Chinese, Korean and Russian games.  These characters require Unicode support to display properly, so a renaming function is not provided.

There are instances where the Database will truncate ROM data which was duplicated on the original hardware.  Vs. Dual System ROMs, except for Vs. Tennis, use duplicate CHR-ROM data for each side of the arcade cabinet, so the CHR-ROM size reported in the database is half what is actually in the machine.  Adan & Eva (Gluk Video) (Spain) uses a 32KiB chip for 16KiB CHR-ROM and some copies of Sky Shark (USA) and possibly Knight Rider (Japan) may use a 128KiB chip for 64KiB of PRG-ROM.  The Database categorizes these as "Bad Dumps", but overdumps are not necessarily broken.  

Kid Dracula (Castlevania Anniversary Collection).nes will require Kid Dracula (Castlevania Anniversary Collection).sav file with a size of 8,192 bytes and a CRC32 of ABA5001B to work properly.  The Database does not include the .sav file, no-intro provides that information.

The Analogue Nt Mini supports NES 2.0 ROMs, but it has difficulty with the Miscellaneous ROM field and the Default Controller Type field, which were added after the Nt Mini was released in 2017.  The Database-headers can work with the Nt Mini so long as these fields, bytes 14 and 15, are set to 0.  The Script can be modified to do this by changing the lines :

        header[14] = (miscrom & 0x3)
        header[15] = (expansion & 0x3F)
to :
        header[14] = (miscrom & 0)
        header[15] = (expansion & 0)

You will have to remove the contents of any miscellaneous ROMs as well. 

The short version, grab the database, grab the script, download and install Python, backup your NES ROMs, copy the database and script into the ROM's "root directory", change the option in the script to "TRIAL_RUN = 0" and from the command line enter : python nes_fix_headers.py > changes.txt.  

3 comments:

  1. If u on windows10, you can start powershell and type: python3 and if it gets into a different mode, then you know u have python installed. You can just hit CTRL+C to cancel it. then you can write the command below. Else windows 10 will take you to it's store and show python installer and you can just click install and then try the command again!

    The correct command is:
    python3 .\nes_header_repair.py > changes.txt

    ReplyDelete
  2. Would be nice if there was a way to batch remove fds rom headers

    ReplyDelete