Introduction

There are a number of metadata frameworks and indexers such as Beagle, Kat, Strigi and GLScube, as well as a new freedesktop system Tracker, which is based on this spec and is currently under development. These frameworks provide a rich source of metadata about files including such things as the author of a document or the artist of an mp3 file. The purpose of this specification is to define a common metadata naming scheme that each framework can implement to allow applications to tap into this wealth of information. Some examples of interested applications would include filemanagers that want to display and allow editing of this metadata as well as providing integrated search functionality and virtual folder capability (IE folders whose file contents are defined by metadata rather than physical location). This specification will define a common set of "well-known" metadata.

Also worth reading is Apple's spotlight metadata attributes reference

CommonExtendedAttributes describes common extended attributes that can be used by indexers when retrieving metadata from files.

Metadata

Metadata is usually defined as data about data. In our case the metadata describes data about files that is often user visible in file managers, office applications, document viewers and audio players. Metadata can typically be viewed or written by selecting "properties" from the file menu of one of these applications.

Whilst there are some standards for naming document metadata like Dublin Core, most desktop applications use a propriety set of metadata names. This specification will attempt to define a common set of metadata using a mixture of Dublin Core, ID3 for audio files, EXIF for image files as well as application specific metadata names. The purpose of these common metadata names is not just for the benefit of metadata frameworks and search engines but also for standardising the display of metadata in all applications.

Metadata rules

The only requirement for metadata names is that they are unique and do not overload or cause confusion with each other. To make this possible, all metadata is namespaced by an appropriate class based on the type of the file or the application name (if the metadata is application specific).

This specification only defines a common subset of all possible metadata and is not designed to limit what metadata any file can have nor does it provide any formal names for custom or non-standard metadata other than a namespace class.

None of the metadata defined in this specification is mandatory and the existence of any metadata is dependent on the framework being used and the files being indexed.

All metadata defined here may be used in search strings.

Only metadata that is not derived from the file or file contents may be editable in the interface (applications that want to change non-writable metadata need to modify the embedded metadata in the file's contents themselves).

Metadata Data Types

Metadata typically comes in a variety of formats and types. In order to facilitate efficient storage and querying, we need to define a group of data types and formats that all metadata we are interested can conform to. The basic data types specified here are:

  • String (variable length string - up to 16MB max in length)
  • Array of String (comma delimited list of strings)
  • Integer (signed 32 bit)
  • UInt (unsigned 32 bit)
  • Int64 (signed 64 bit)
  • UInt64 (64 bit unsigned)
  • Float (64 bit signed floating point number - system dependent double precision)
  • Datetime (ISO 8601 format with timezone IE YYYY-MM-DDThh:mm:ss+xx:yy EG "2006-04-01T12:30+01:00") Datetime values that contain no explicit timezone info are assumed to be in the user's timezone.

Metadata Namespaces

For all metadata, each metadata item needs to be namespaced with its class type using a "." qualifier (EG Audio.Artist represents the metadata Artist for an audio class file). Metadata that is strictly application specific should use a namespace class based on the application name (EG "Nautilus.Window_Geometry").

This specification defines the following built-in classes:

  • File
  • Audio
  • Doc
  • Image

Generic File Metadata

Generic file metadata is applicable to all files regardless of their format. The specified metadata uses a few Dublin Core based types where applicable with the rest being custom ones. Generic file metadata types are namespaced with the "File" class. Only some of the generic metadata may be writable. Custom metadata not listed below that is generic and applies to all files should also be namespaced with the "File" class unless it is strictly application specific.

* Name * * Type * Writable * Description*
File.Name string No File name excluding path but including the file extension
File.Path string No
full file path of file excluding the filename 
File.Link string No
Uri of link target 
File.Format string No
Mime type of the file or if a directory it should contain value "Folder"
File.Size uint64 No size of the file in bytes or if a directory no. of items it contains
File.Permissions string No Permission string in unix format eg "-rw-r--r--"
File.Publisher string Yes Editable DC type for the name of the publisher of the file (EG dc:publisher field in RSS feed)
File.Content string No File's contents filtered as plain text (IE as stored by the indexer)
File.Description string
Yes  
Editable free text/notes
File.Keywords array of string
Yes  
Editable array of keywords
File.Rank integer
Yes  
Editable file rank for grading favourites. Value should be in the range 1..10
File.IconPath string
Yes  
Editable file uri for a custom icon for the file
File.SmallThumbnailPath string
Yes  
Editable file uri for a small thumbnail of the file suitable for use in icon views
File.LargeThumbnailPath string
Yes  
Editable file uri for a larger thumbnail of the file suitable for previews
File.Modified datetime No Last modified datetime
File.Accessed datetime No Last access datetime

Audio Metadata

Audio metadata is based on the widespread ID3.1 tags embedded in mp3, ogg and similar files. These are already defined in that specification. All metadata in this section is prefixed with "Audio" and it is recommended that any other metadata not listed below also uses this prefix if its audio related (unless it is application specific).

* Name * * Type * Writable * Description*
Audio.Title string No title of the track
Audio.Artist string No artist of the track
Audio.Album string No
name of the album 
Audio.AlbumArtist string No
artist of the album 
Audio.AlbumTrackCount integer No total no. of tracks on the album
Audio.TrackNo integer No position of track on the album
Audio.DiscNo integer No specifies which disc the track is on
Audio.Performer string No Name of the performer/conductor of the music
Audio.TrackGain float No gain adjustment of track
Audio.TrackPeakGain float No peak gain adjustment of track
Audio.AlbumGain float No gain adjustment of album
Audio.AlbumPeakGain float No peak gain adjustment of album
Audio.Duration integer No duration of track in seconds
Audio.ReleaseDate Datetime No date track was released
Audio.Comment string No
comments on the track 
Audio.Genre string No type of music classification for the track as defined in ID3 spec
Audio.Codec string No codec encoding description
Audio.CodecVersion string No codec version
Audio.Samplerate integer No samplerate in Hz
Audio.Bitrate float No bitrate in kbps
Audio.Channels integer No no. of channels in the audio (2 = stereo)
Audio.LastPlay datetime Yes when track was last played
Audio.PlayCount integer Yes No. of times the track has been played
Audio.IsNew integer Yes set to "1" if track is new to the user. (default "0")
Audio.MBAlbumID string Yes MusicBrainz album ID in UUID format
Audio.MBArtistID string Yes MusicBrainz artist ID in UUID format
Audio.MBAlbumArtistID string Yes MusicBrainz album artist ID in UUID format
Audio.MBTrackID string Yes MusicBrainz track ID in UUID format
Audio.Lyrics string Yes Lyrics of the track
Audio.CoverAlbumThumbnailPath string Yes File path to thumbnail image of the cover album

Document Metadata

For documents, applications have typically used a mixture of Dublin Core types and propriety types. In order to be consistent with them, we have based our metadata names likewise. We have also based these names on metadata names found in Open Office, Ms Word and PDF documents. All metadata in this section is prefixed with the "Doc" class and any other document based metadata should also have this prefix (unless it is application specific). All the metadata here is not editable through the interface as all of it is derived from the file contents.

* Name * * Type * * Description*
Doc.Title string Title of document
Doc.Subject string document subject
Doc.Author string name of the author
Doc.Keywords string string of keywords
Doc.Comments string user definable free text
Doc.PageCount integer no. of pages in document
Doc.WordCount integer total no. of chars in document
Doc.Created datetime datetime document was originally

Image Metadata

For images, most support the EXIF standard and so this largely makes up this specification. SVG files have user definable non-standard metadata so a subset of Dublin Core is also provided here. All metadata in this section is prefixed with the "Image" class and any other image based metadata should also have this prefix (unless it is application specific). All the metadata here is not editable through the interface as all of it is derived from the file contents.

* Name * * Type * * Description*
Image.Height integer Height in pixels
Image.Width integer Width in pixels
Image.Title string Title of image
Image.Album string Name of an album the image belongs to
Image.Date datetime datetime image was originally created
Image.Keywords string string of keywords
Image.Creator string name of the author
Image.Comments string user definable free text
Image.Description string description of the image
Image.Software string software used to produce/enhance the image
Image.CameraMake string make of camera used to take the image
Image.CameraModel string model of camera used to take the image
Image.Orientation string represents the orientation of the image wrt camera IE "top,left" or "bottom,right"
Image.ExposureProgram string The program used by the camera to set exposure when the picture is taken. EG Manual, Normal, Aperture priority etc
Image.ExposureTime float Exposure time used to capture the photo in seconds
Image.Fnumber float Diameter of the aperture relative to the effective focal length of the lens.
Image.Flash integer Set to 1 if flash was fired
Image.FocalLength float Focal length of lens in mm
Image.ISOSpeed float ISO speed used to acquire the document contents. For example, 100, 200, 400, etc
Image.MeteringMode string Metering mode used to acquire the image (IE Unknown, Average, ?CenterWeightedAverage, Spot, ?MultiSpot, Pattern, Partial)
Image.WhiteBalance string White balance setting of the camera when the picture was taken (auto or manual)
Image.Copyright string Embedded copyright message