This PoC was an implementation of some of the findings described in the technology scout Video Transcription. The implementation resulted in this document and a PoC website: http://spraak.mediamosa.surfnet.nl/.
In this PoC two transcription tools were implemented: SPRAAK for Dutch and CMUSphinx for English spoken language. The results of open source speech recognition without further training of the tools, gave mixed results when used in subtitles, but, in our opinion, are sufficient for searching through metadata. The sample videos used in this PoC were recordings of the eight o’clock news, which produced good results. More research should be done in order to improve the speech recognition results when using no ‘studio quality’ videos.
The implementation resulted in two separated “transcoding”-modules in MediaMosa. The concept of separate modules in MediaMosa for separate tools resulted in clean implementations of the two tools. Without having to modify the MediaMosa-core, the tools can easily be added. Some extra changes are needed in MediaMosa to handle the specific speech results. The most important two are new metadata fields added to an asset and the support for more tickets in the html object codes of video players.
It is highly recommended to use the results of this PoC in the next development of MediaMosa (MediaMosa 3.5) to make it useful for the MediaMosa community.
Besides the obvious benefits to viewers with hearing disabilities, transcription and captioning also offer a number of additional benefits to a much broader community of users that should not be overlooked:
Read the complete document here.
| Attachment | Size |
|---|---|
| Proof of Concept Video Transcription in Mediamosa v1.1.pdf | 992.53 KB |