Automated Metadata Enrichment of Large Speech Radio Archives

Metadata

Publisher: SMPTE
Doc Type: Journal Article
Article Type: orig-research
Abstract: The British Broadcasting Corp. (BBC) manually tags recent programs on its website. Editors draw and assign these tags from open datasets made available within the Linked Data cloud, but this is a time-consuming process. Aside from recent programming, which is tagged, the BBC has a large radio archive that is untagged. Thus the possibility of automatically assigning tags to programs in a reasonable amount of time has been investigated. Tags enable a variety of use cases, such as dynamic building of topical aggregations, retrieval through topic-based search, or cross-domain navigation. Automatic tagging of archive content would ensure archive programs are as findable as recent programs. It would mean that topic-based collections of archive content can be easily built, for example, to find archive content that relates to current news events. This paper describes an infrastructure to process large program archives in a cost-effective and scalable manner using Amazon Web Services. An automated tagging algorithm using speech audio as an input is described. The paper also explains how this algorithm can be separated and distributed and how the workflow can be managed robustly, ensuring appropriate error handling, resource monitoring, and data management on a large scale. Finally, the results from processing the BBC World Service English-speaking audio archive are presented.
Publication Date: 2014-01-01
DOI: 10.5594/j18370
Link: https://doi.org/10.5594/j18370
Author(s): Y. Raimond, C. Lowis, R. Hodgson, D. Tinley

Source Data (JSON)

Full registry record with provenance metadata. Open directly: /api/doc/10.5594-j18370.json

Reference this Doc

Plain text (ISO 690 compliant)

Preview:

Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; Automated Metadata Enrichment of Large Speech Radio Archives, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014. Available at https://doi.org/10.5594/j18370

Snippet:

Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; Automated Metadata Enrichment of Large Speech Radio Archives, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014. Available at https://doi.org/10.5594/j18370

HTML (ISO 690 compliant)

Preview:

Snippet:

<span class="citation">Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; <cite>Automated Metadata Enrichment of Large Speech Radio Archives</cite>, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014. Available at <a href="https://doi.org/10.5594/j18370" target="_blank" rel="noopener">https://doi.org/10.5594/j18370</a></span>

SMPTE's HTML Pub

Preview:

Snippet:

<li>
Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; <cite id="bib-10-5594-j18370">Automated Metadata Enrichment of Large Speech Radio Archives</cite>, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014
<span class="doi">10.5594/j18370</span>
</li>