Automated Metadata Enrichment of Large Speech Radio Archives
Metadata
- Publisher
- SMPTE
- Doc Type
- Journal Article
- Article Type
- orig-research
- Abstract
- The British Broadcasting Corp. (BBC) manually tags recent programs on its website. Editors draw and assign these tags from open datasets made available within the Linked Data cloud, but this is a time-consuming process. Aside from recent programming, which is tagged, the BBC has a large radio archive that is untagged. Thus the possibility of automatically assigning tags to programs in a reasonable amount of time has been investigated. Tags enable a variety of use cases, such as dynamic building of topical aggregations, retrieval through topic-based search, or cross-domain navigation. Automatic tagging of archive content would ensure archive programs are as findable as recent programs. It would mean that topic-based collections of archive content can be easily built, for example, to find archive content that relates to current news events. This paper describes an infrastructure to process large program archives in a cost-effective and scalable manner using Amazon Web Services. An automated tagging algorithm using speech audio as an input is described. The paper also explains how this algorithm can be separated and distributed and how the workflow can be managed robustly, ensuring appropriate error handling, resource monitoring, and data management on a large scale. Finally, the results from processing the BBC World Service English-speaking audio archive are presented.
- Publication Date
- 2014-01-01
- DOI
10.5594/j18370- Link
- https://doi.org/10.5594/j18370
- Author(s)
- Y. Raimond, C. Lowis, R. Hodgson, D. Tinley
Source Data (JSON)
Full registry record with provenance metadata. Open directly: /api/doc/10.5594-j18370.json
Reference this Doc
Plain text (ISO 690 compliant)
Preview:
Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; Automated Metadata Enrichment of Large Speech Radio Archives, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014. Available at https://doi.org/10.5594/j18370
Snippet:
Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; Automated Metadata Enrichment of Large Speech Radio Archives, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014. Available at https://doi.org/10.5594/j18370
HTML (ISO 690 compliant)
Preview:
Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; Automated Metadata Enrichment of Large Speech Radio Archives, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014. Available at https://doi.org/10.5594/j18370
Snippet:
<span class="citation">Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; <cite>Automated Metadata Enrichment of Large Speech Radio Archives</cite>, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014. Available at <a href="https://doi.org/10.5594/j18370" target="_blank" rel="noopener">https://doi.org/10.5594/j18370</a></span>
SMPTE's HTML Pub
Preview:
Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; Automated Metadata Enrichment of Large Speech Radio Archives, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014
doi: 10.5594/j18370
url: https://doi.org/10.5594/j18370
doi: 10.5594/j18370
url: https://doi.org/10.5594/j18370
Snippet:
<li> Y. Raimond, C. Lowis, R. Hodgson, and D. Tinley; <cite id="bib-10-5594-j18370">Automated Metadata Enrichment of Large Speech Radio Archives</cite>, SMPTE Motion Imaging Journal ( Volume: 123, Issue: 1, 2014); SMPTE, 2014 <span class="doi">10.5594/j18370</span> </li>