本次语音之家公开课邀请到 Wenwu Wang 进行分享 Audio-Text Cross Modal Translation。
公开课简介
主题:Audio-Text Cross Modal Translation
时间:2023年4月4日9:00-10:00
嘉宾介绍
Wenwu Wang
Wenwu Wang is a Professor in Signal Processing and Machine Learning, and a Co-Director of the Machine Audition Lab within the Centre for Vision Speech and Signal Processing, University of Surrey, UK. He is also an AI Fellow at the Surrey Institute for People Centred Artificial Intelligence. His current research interests include signal processing, machine learning and perception, artificial intelligence, machine audition (listening), and statistical anomaly detection. He has (co)-authored over 300 papers in these areas. He has been involved as Principal or Co-investigator in more than 30 research projects, funded by UK and EU research councils, and industry (e.g. BBC, NPL, Samsung, Tencent, Huawei, Saab, Atlas, Kaon), with a total grant portfolio of over £30M.
He is a (co-)author or (co-)recipient of over 15 awards including the 2022 IEEE Signal Processing Society Young Author Best Paper Award, ICAUS 2021 Best Paper Award, DCASE 2020 Judge’s Award, DCASE 2019 and 2020 Reproducible System Award, LVA/ICA 2018 Best Student Paper Award, FSDM 2016 Best Oral Presentation, and Dstl Challenge 2012 Best Solution Award.He is on the 2021 and 2022 Standford University List of Top 2% Scientists Worldwide.
He is a Senior Area Editor for IEEE Transactions on Signal Processing, an Associate Editor for IEEE/ACM Transactions on Audio Speech and Language Processing, an Associate Editor for (Nature) Scientific Report, and a Specialty Editor in Chief of Frontier in Signal Processing. He is the elected Chair of IEEE Signal Processing Society Machine Learning for Signal Processing Technical Committee, the Vice Chair of the EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing, an elected Member of the IEEE Signal Processing Theory and Methods Technical Committee, and an elected Member of the International Steering Committee of Latent Variable Analysis and Signal Separation. He was a Satellite Workshop Co-Chair for INTERSPEECH 2022, a Publication Co-Chair for IEEE ICASSP 2019, Local Arrangement Co-Chair of IEEE MLSP 2013, and Publicity Co-Chair of IEEE SSP 2009.
课程摘要
Cross modal translation of audio and texts has emerged as an important research area in artificial intelligence, sitting at the intersection of audio signal processing and natural language processing. Generating a meaningful description for an audio clip is known as automated audio captioning, which is useful in applications such as assisting the hearing-impaired to understand environmental sounds, facilitating retrieval of multimedia content, and analyzing sounds for security surveillance. Generating audio with text prompts is known as text-to-audio generation, which can be used as sound synthesis tools for film making, game design, virtual reality/metaverse, digital media, and digital assistants for text understanding by visually impaired. Cross modal audio-text translation requires understanding of the audio events and scenes in an audio clip, and the textual information as natural language, and learning the mapping and alignment of these two streams of information. Exciting new developments have emerged recently in automated audio-text cross modal translation. In this talk, we will give a brief introduction of this field, including problem description, potential applications, datasets, open challenges, recent technical progresses, and finally, possible future research directions.
议 程
听课方式
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。