MediaMosa Transcripting Proof of Concept

28Dec2011

This PoC was an implementation of some of the findings described in the technology scout Video Transcription. The implementation resulted in this document and a PoC website: http://spraak.mediamosa.surfnet.nl/.

In this PoC two transcription tools were implemented: SPRAAK for Dutch and CMUSphinx for English spoken language. The results of open source speech recognition without further training of the tools, gave mixed results when used in subtitles, but, in our opinion, are sufficient for searching through metadata. The sample videos used in this PoC were recordings of the eight o’clock news, which produced good results. More research should be done in order to improve the speech recognition results when using no ‘studio quality’ videos.

The implementation resulted in two separated “transcoding”-modules in MediaMosa. The concept of separate modules in MediaMosa for separate tools resulted in clean implementations of the two tools. Without having to modify the MediaMosa-core, the tools can easily be added. Some extra changes are needed in MediaMosa to handle the specific speech results. The most important two are new metadata fields added to an asset and the support for more tickets in the html object codes of video players.

It is highly recommended to use the results of this PoC in the next development of MediaMosa (MediaMosa 3.5) to make it useful for the MediaMosa community.
Besides the obvious benefits to viewers with hearing disabilities, transcription and captioning also offer a number of additional benefits to a much broader community of users that should not be overlooked:

  • Indexing and Searching: Transcription produces additional metadata, which is time coded as well. This allows the content to become easily searchable with traditional text searches. With user generated metadata it is not possible to search within a video.
  • Improved Accessibility: Improved accessibility will make content more useful to a broader audience. Viewers with many types of learning disabilities will benefit from the increased comprehension and increased retention that captioning brings. Transcoding technology will make it easier to produce captions.
  • Improved quality: To improve the quality of the captions, there should be functionality developed to manually edit the automatically produced captions.
  • Localization: Adding translations to your captions, with support for multiple caption tracks, widens your potential audience massively.

Read the complete document here.

AttachmentSize
Proof of Concept Video Transcription in Mediamosa v1.1.pdf992.53 KB