0xef 0xbb 0xbf characters (Loading the FAA’s Aircraft Registration Database into PostgreSQL)

Trying to import the CSV files supplied by the FAA as the Releasable Aircraft Database Download into PostgreSQL using the COPY command. PostgreSQL was barfing with the error message:

character with byte sequence 0xef 0xbb 0xbf in encoding "UTF8" has no equivalent in encoding "LATIN9"

Turns out those three bytes are the Byte Order Mark, which somehow I’ve never encountered in a “plain text” file before. Removing those bytes was pretty straightforward in 0xED:

Screenshot of 0xED editor
Screenshot of 0xED editor

The same thing can be accomplished by using the terminal, if 0xED is not available (it’s apparently been discontinued by the developer, so it’s only a matter of time before Apple’s relentless march forward leaves it behind):

% tail -c +4 DEREG.txt > DEREG_no_BOM.txt

Will do the same thing (strip the first three bytes from, e.g., DEREG.txt, creating the new file DEREG_no_BOM.txt). This process took about 12 seconds on a MacBook Pro M1 Pro 14" base model, FWIW.

The other thing I had to do to make the COPY command work properly was to strip the trailing and extraneous ',' from each line in the FAA’s export; this I did using the Find and Replace function in TextMate:

TextMate find and replace
TextMate find and replace

Finally, this is particular to the later versions of macOS with sandboxing enabled by default, but to get the files accessible to the PostgreSQL server, I had to move them to /tmp.

postgres=# COPY master(n_number, serial_number, mfr_mdl_code, eng_mgr_mdl, year, type_registrant, name, street1, street2, city, state, zip_code, region, county, country, last_action_date, cert_issue_date, certification, type_aircraft, type_engine, status_code, mode_s_code, fractional_ownership, airworthiness_date, other_name_1, other_name_2, other_name_3, other_name_4, other_name_5, expiration_date, unique_id, kit_mfr, kit_model, mode_s_code_hex) FROM '/tmp/MASTER.txt' DELIMITER ',' CSV HEADER;

COPY 296110

Comments