The ID3 site has a good overview of ID3. There is a type of graph that I would like to reproduce and so I'll try:
Well, it's a pretty graph, but really better suited for showing proportions than a technical spec. I guess I'll leave it here for reference… The fields are null padded to their appropriate length. The genre
is a code. Initially there were 80 possible genres and this list was extended by Nullsoft:
00 | Blues |
01 | Classic Rock |
02 | Country |
03 | Dance |
04 | Disco |
05 | Funk |
06 | Grunge |
07 | Hip-Hop |
08 | Jazz |
09 | Metal |
0A | New Age |
0B | Oldies |
0C | Other |
0D | Pop |
0E | R&B |
0F | Rap |
10 | Reggae |
11 | Rock |
12 | Techno |
13 | Industrial |
14 | Alternative |
15 | Ska |
16 | Death Metal |
17 | Pranks |
18 | Soundtrack |
19 | Euro-Techno |
1A | Ambient |
1B | Trip-Hop |
1C | Vocal |
1D | Jazz+Funk |
1E | Fusion |
1F | Trance |
20 | Classical |
21 | Instrumental |
22 | Acid |
23 | House |
24 | Game |
25 | Sound Clip |
26 | Gospel |
27 | Noise |
28 | Alternative Rock |
29 | Bass |
2A | Soul |
2B | Punk |
2C | Space |
2D | Meditative |
2E | Instrumental Pop |
2F | Instrumental Rock |
30 | Ethnic |
31 | Gothic |
32 | Darkwave |
33 | Techno-Industrial |
34 | Electronic |
35 | Pop-Folk |
36 | Eurodance |
37 | Dream |
38 | Southern Rock |
39 | Comedy |
3A | Cult |
3B | Gangsta |
3C | Top 40 |
3D | Christian Rap |
3E | Pop/Funk |
3F | Jungle |
40 | Native US |
41 | Cabaret |
42 | New Wave |
43 | Psychadelic |
44 | Rave |
45 | Showtunes |
46 | Trailer |
47 | Lo-Fi |
48 | Tribal |
49 | Acid Punk |
4A | Acid Jazz |
4B | Polka |
4C | Retro |
4D | Musical |
4E | Rock & Roll |
4F | Hard Rock |
50 | Folk |
51 | Folk-Rock |
52 | National Folk |
53 | Swing |
54 | Fast Fusion |
55 | Bebob |
56 | Latin |
57 | Revival |
58 | Celtic |
59 | Bluegrass |
5A | Avantgarde |
5B | Gothic Rock |
5C | Progressive Rock |
5D | Psychedelic Rock |
5E | Symphonic Rock |
5F | Slow Rock |
60 | Big Band |
61 | Chorus |
62 | Easy Listening |
63 | Acoustic |
64 | Humour |
65 | Speech |
66 | Chanson |
67 | Opera |
68 | Chamber Music |
69 | Sonata |
6A | Symphony |
6B | Booty Bass |
6C | Primus |
6D | Porn Groove |
6E | Satire |
6F | Slow Jam |
70 | Club |
71 | Tango |
72 | Samba |
73 | Folklore |
74 | Ballad |
75 | Power Ballad |
76 | Rhythmic Soul |
77 | Freestyle |
78 | Duet |
79 | Punk Rock |
7A | Drum Solo |
7B | Acapella |
7C | Euro-House |
7D | Dance Hall |
7E | Goa |
7F | Drum & Bass |
80 | Club-House |
81 | Hardcore |
82 | Terror |
83 | Indie |
84 | BritPop |
85 | Negerpunk |
86 | Polsk Punk |
87 | Beat |
88 | Christian Gangsta |
89 | Heavy Metal |
8A | Black Metal |
8B | Crossover |
8C | Contemporary Christian |
8D | Christian Rock |
8E | Meringue |
8F | Salsa |
90 | Thrash Metal |
91 | Anime |
92 | JPop |
93 | SynthPop |
This is essentially the same as v1, but the last two bytes of the comment
are taken to represent the track number
. The first byte is a null just to guarantee that a v1 parser doesn't try to parse the track number. The second byte is the track number (not an ASCII digit).
The rigidity of ID3v1 should be readily apparent. To deal with this, as well as some other issues, ID3v2 was introduced. It is a frame based format allowing for quite a bit of flexibility and extensibility.
There are two primary components added are tags and frames:
The tag is a container for several frames that contain actual information. The tag has a ten byte header:
Identifier | 1-3 | "ID3 " |
Version | 4-5 | major (02 ) minor (02 ) |
Flags | 6 | [uc000000]
|
Size | 7-10 | [0xxxxxxx]{4} 28 bits for the size of the tag after unsynchronization without the header |
The tag is comprised of several frames which contain the actual information. The header for the frame is:
Identifier | 1-3 | [A-Z0-9]{3} ([XYZ][A-Z0-9]{2} for experimental use) |
Size | 4-7 | size of the tag |
Encoding |
The various frames are pretty straightforward, but there are a couple changes in 2.4 that are particularly interesting to me:
The performer has been identified for a while. What is new in 2.4 is:
"There may only be one text information frame of its kind in an tag. All text information frames supports multiple strings, stored as a null separated list, where null is represented by the termination code for the character encoding. All text frame identifiers begin with "T". Only text frame identifiers begin with "T", with the exception of the "TXXX" frame.
Because of the amount of data I am dealing with in this program, I am planning on backing it with a database. One of the irritating things that I would be doing is taking songs with multiple artists nd separate the artists. As it is when I search for songs by Ciara, I don't get duets which include Ciara because the name doesn't match. If I have the names separated out I can search better.
In ID3v2.3 doing this requires some method for taking a list of artists and combining them. A good method is to separate each artist from the next with a comma except for the last two which are separated by "and." This way we get John, Paul and Ringo. What happens with Crosby, Stills and Nash though? That is a single band and I don't want to separate those artists. Also Jay-Z and Linkin Park should be split, C and C Music Factory should not. Allowing multiple artists to be stored unambiguously is nice.
Synchronized Lyrics and Text lets me set up karaoke, which is cool, and singalong, which is even cooler. One of my goals in life is to be able to understand French rap and I'd love to be able to have the lyrics play in time with the music. Additionally the program for syncing the lyrics could be used for creating ETCO transition points for the slideshow program. I'd just need some sort of interface to record them in conjunction with keyboard events or something like that.
Synchronized Lyrics/Text Header (SYLT) | |
---|---|
Text Encoding |
|
Language | ISO-639-2 three byte language code |
Time Stamp Format |
|
Content Type |
|
Content Descriptor | text string according to encoding |
Each syllable is positioned chronologically. The example given in the spec is: (note the placement of spaces and the newline (0A
) character)
"Strang" $00 xx xx "ers" $00 xx xx " in" $00 xx xx " the" $00 xx xx " night" $00 xx xx 0A "Ex" $00 xx xx "chang" $00 xx xx "ing" $00 xx xx "glan" $00 xx xx "ces" $00 xx xx
I like this system more than the CD+G used in normal karaoke machines. That allows for arbitrary bitmaps to be displayed on screen, which is more versatile, but which destroys the actual text. If all my songs included these tags, I could search for songs based on a lyric from the song.
The Unique File Identifier frame is interesting to me because though I really like the idea, I am not at all pleased with musicbrainz's interface. This frame could be used to hold a MusicBrainz id.
Unique File Identifier Header (UFID) | |
---|---|
Owner Identifier | ISO-8859-1 encoded uri (preferably mailto:) $00 |
Identifier | up to 64 bytes binary data |
The format of an MPEG frame header is:
Synchronization | 1-11 | 11111111111 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Version | 12-13 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Layer | 14-15 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CRC Present | 16 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bitrate in kbps | 17-20 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sampling Rate in Hz | 21-22 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Padding | 23 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Experimental | 24 | Reserved for application use | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Channel Mode | 25-28 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mode Extension (for joint stereo) | 29-30 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright | 31 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Original | 32 |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Emphasis | 33-34 |
|
The important bit in all that so far as tagging is the first 11 bits, all 1
which represent a frame sync. If a tag were to contain those bits in a player that doesn't recognize ID3v2 tags, the tag data would be incorrectly interpreted as MPEG frame data. To avoid this, if a tag is marked as unsynchronized, all occurrences of 11111111 111xxxxx
are replaced with 11111111 00000000 111xxxxxx
. This process performed after any compression and undone before the frames interpreted.
The issue is that the string 11111111 00000000
could occur in the data before unsynchronization and upon resynchronization the 00000000
would be erroneously lost. To avoid this, 11111111 00000000
is replaced with 11111111 00000000 00000000
in the unsyncronization process.