ID3v1

The ID3 site has a good overview of ID3. There is a type of graph that I would like to reproduce and so I'll try:

Header ("TAG")
Title
Artist
Album
Year
Comment
(30 bytes)
(30 bytes)
(30 bytes)
(4 bytes)
(30 bytes)
Genre (1 byte)

Well, it's a pretty graph, but really better suited for showing proportions than a technical spec. I guess I'll leave it here for reference… The fields are null padded to their appropriate length. The genre is a code. Initially there were 80 possible genres and this list was extended by Nullsoft:

00Blues
01Classic Rock
02Country
03Dance
04Disco
05Funk
06Grunge
07Hip-Hop
08Jazz
09Metal
0ANew Age
0BOldies
0COther
0DPop
0ER&B
0FRap
10Reggae
11Rock
12Techno
13Industrial
14Alternative
15Ska
16Death Metal
17Pranks
18Soundtrack
19Euro-Techno
1AAmbient
1BTrip-Hop
1CVocal
1DJazz+Funk
1EFusion
1FTrance
20Classical
21Instrumental
22Acid
23House
24Game
25Sound Clip
26Gospel
27Noise
28Alternative Rock
29Bass
2ASoul
2BPunk
2CSpace
2DMeditative
2EInstrumental Pop
2FInstrumental Rock
30Ethnic
31Gothic
32Darkwave
33Techno-Industrial
34Electronic
35Pop-Folk
36Eurodance
37Dream
38Southern Rock
39Comedy
3ACult
3BGangsta
3CTop 40
3DChristian Rap
3EPop/Funk
3FJungle
40Native US
41Cabaret
42New Wave
43Psychadelic
44Rave
45Showtunes
46Trailer
47Lo-Fi
48Tribal
49Acid Punk
4AAcid Jazz
4BPolka
4CRetro
4DMusical
4ERock & Roll
4FHard Rock
50 Folk
51 Folk-Rock
52 National Folk
53Swing
54Fast Fusion
55Bebob
56Latin
57Revival
58Celtic
59Bluegrass
5AAvantgarde
5BGothic Rock
5CProgressive Rock
5DPsychedelic Rock
5ESymphonic Rock
5FSlow Rock
60Big Band
61Chorus
62Easy Listening
63Acoustic
64Humour
65Speech
66Chanson
67Opera
68Chamber Music
69Sonata
6ASymphony
6BBooty Bass
6CPrimus
6DPorn Groove
6ESatire
6FSlow Jam
70Club
71Tango
72Samba
73Folklore
74Ballad
75Power Ballad
76Rhythmic Soul
77Freestyle
78Duet
79Punk Rock
7ADrum Solo
7BAcapella
7CEuro-House
7DDance Hall
7EGoa
7FDrum & Bass
80Club-House
81Hardcore
82Terror
83Indie
84BritPop
85Negerpunk
86Polsk Punk
87Beat
88Christian Gangsta
89Heavy Metal
8ABlack Metal
8BCrossover
8CContemporary Christian
8DChristian Rock
8EMeringue
8FSalsa
90Thrash Metal
91Anime
92JPop
93SynthPop

ID3v1.1

This is essentially the same as v1, but the last two bytes of the comment are taken to represent the track number. The first byte is a null just to guarantee that a v1 parser doesn't try to parse the track number. The second byte is the track number (not an ASCII digit).

ID3v2

The rigidity of ID3v1 should be readily apparent. To deal with this, as well as some other issues, ID3v2 was introduced. It is a frame based format allowing for quite a bit of flexibility and extensibility.

ID3v2.2

There are two primary components added are tags and frames:

ID3v2.2 Tag

The tag is a container for several frames that contain actual information. The tag has a ten byte header:

Identifier1-3"ID3"
Version4-5major (02) minor (02)
Flags6 [uc000000]
Size7-10[0xxxxxxx]{4} 28 bits for the size of the tag after unsynchronization without the header

ID3v2.2 Frame

The tag is comprised of several frames which contain the actual information. The header for the frame is:

Identifier1-3[A-Z0-9]{3} ([XYZ][A-Z0-9]{2} for experimental use)
Size4-7size of the tag
Encoding

ID3v2.4

The various frames are pretty straightforward, but there are a couple changes in 2.4 that are particularly interesting to me:

TPE[1-4] Frame

The performer has been identified for a while. What is new in 2.4 is:

"There may only be one text information frame of its kind in an tag. All text information frames supports multiple strings, stored as a null separated list, where null is represented by the termination code for the character encoding. All text frame identifiers begin with "T". Only text frame identifiers begin with "T", with the exception of the "TXXX" frame.

Because of the amount of data I am dealing with in this program, I am planning on backing it with a database. One of the irritating things that I would be doing is taking songs with multiple artists nd separate the artists. As it is when I search for songs by Ciara, I don't get duets which include Ciara because the name doesn't match. If I have the names separated out I can search better.

In ID3v2.3 doing this requires some method for taking a list of artists and combining them. A good method is to separate each artist from the next with a comma except for the last two which are separated by "and." This way we get John, Paul and Ringo. What happens with Crosby, Stills and Nash though? That is a single band and I don't want to separate those artists. Also Jay-Z and Linkin Park should be split, C and C Music Factory should not. Allowing multiple artists to be stored unambiguously is nice.

SYLT Frame

Synchronized Lyrics and Text lets me set up karaoke, which is cool, and singalong, which is even cooler. One of my goals in life is to be able to understand French rap and I'd love to be able to have the lyrics play in time with the music. Additionally the program for syncing the lyrics could be used for creating ETCO transition points for the slideshow program. I'd just need some sort of interface to record them in conjunction with keyboard events or something like that.

Synchronized Lyrics/Text Header (SYLT)
Text Encoding
  • $00 • ISO-8859-1 [ISO-8859-1]. Terminated with $00.
  • $01 • UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All strings in the same frame SHALL have the same byteorder. Terminated with $00 00.
  • $02 • UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM. Terminated with $00 00.
  • $03 • UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.
LanguageISO-639-2 three byte language code
Time Stamp Format
  • $01 • Absolute time, 32 bit sized, using MPEG [MPEG] frames as unit
  • $02 • Absolute time, 32 bit sized, using milliseconds as unit
Content Type
  • $00 • other
  • $01 • lyrics
  • $02 • text transcription
  • $03 • movement/part name (e.g. "Adagio")
  • $04 • events (e.g. "Don Quixote enters the stage")
  • $05 • chord (e.g. "Bb F Fsus")
  • $06 • trivia/'pop up' information
  • $07 • URLs to webpages
  • $08 • URLs to images
Content Descriptortext string according to encoding

Each syllable is positioned chronologically. The example given in the spec is: (note the placement of spaces and the newline (0A) character)

"Strang" $00 xx xx "ers" $00 xx xx " in" $00 xx xx " the" $00 xx xx " night" $00 xx xx 0A "Ex" $00 xx xx "chang" $00 xx xx "ing" $00 xx xx "glan" $00 xx xx "ces" $00 xx xx

I like this system more than the CD+G used in normal karaoke machines. That allows for arbitrary bitmaps to be displayed on screen, which is more versatile, but which destroys the actual text. If all my songs included these tags, I could search for songs based on a lyric from the song.

UFID Frame

The Unique File Identifier frame is interesting to me because though I really like the idea, I am not at all pleased with musicbrainz's interface. This frame could be used to hold a MusicBrainz id.

Unique File Identifier Header (UFID)
Owner IdentifierISO-8859-1 encoded uri (preferably mailto:) $00
Identifierup to 64 bytes binary data

API


Synchronization

The format of an MPEG frame header is:

Synchronization1-1111111111111
Version12-13
  • 00 - MPEG Version 2.5
  • 01 - reserved
  • 10 - MPEG Version 2 (ISO/IEC 13818-3)
  • 11 - MPEG Version 1 (ISO/IEC 11172-3)
Layer14-15
  • 00 - reserved
  • 01 - Layer III
  • 10 - Layer II
  • 11 - Layer I
CRC Present16
  • 0 - Protected by CRC (16bit crc follows header)
  • 1 - Not protected
Bitrate in kbps17-20
bitsV1, L1V1, L2V1, L3V2, L1V2, L2 & L3
0000freefreefreefreefree
0001323232328
00106448404816
00119656485624
010012864566432
010116080648040
011019296809648
01112241129611256
100025612811212864
100128816012814480
101032019216016096
1011352224192176112
1100384256224192128
1101416320256224144
1110448384320256160
1111badbadbadbadbad
Sampling Rate in Hz21-22
bitsMPEG1MPEG2MPEG2.5
00441002205011025
01480002400012000
1032000160008000
11reserv.reserv.reserv.
Padding23
  • 0 - frame is not padded
  • 1 - frame is padded with one extra slot (4 bytes in L1, 1 byte in L2 and L3)
Experimental24Reserved for application use
Channel Mode25-28
  • 00 - Stereo
  • 01 - Joint stereo (Stereo)
  • 10 - Dual channel (Stereo)
  • 11 - Single channel (Mono)
Mode Extension (for joint stereo)29-30
  • Layer I & II
    • 00 - bands 4 to 31
    • 01 - bands 8 to 31
    • 10 - bands 12 to 31
    • 11 - bands 16 to 31
  • Layer III [im]
    • i - intensity stereo
    • m - m/s stereo
Copyright31
  • 0 - Audio is not copyrighted
  • 1 - Audio is copyrighted
Original32
  • 0 - Copy of original media
  • 1 - Original media
Emphasis33-34
  • 00 - none
  • 01 - 50/15 ms
  • 10 - reserved
  • 11 - CCIT J.17

The important bit in all that so far as tagging is the first 11 bits, all 1 which represent a frame sync. If a tag were to contain those bits in a player that doesn't recognize ID3v2 tags, the tag data would be incorrectly interpreted as MPEG frame data. To avoid this, if a tag is marked as unsynchronized, all occurrences of 11111111 111xxxxx are replaced with 11111111 00000000 111xxxxxx. This process performed after any compression and undone before the frames interpreted.

The issue is that the string 11111111 00000000 could occur in the data before unsynchronization and upon resynchronization the 00000000 would be erroneously lost. To avoid this, 11111111 00000000 is replaced with 11111111 00000000 00000000 in the unsyncronization process.