百万歌曲数据集:拿去吧,它是免费的

  • Main point: The next phase of research on smart music delivery systems is underway with the release of the Million Song Dataset by The Echo Nest. It aims to develop new Music Information Retrieval services and is freely available for download.
  • Key information:

    • The dataset contains audio features and metadata for a million contemporary popular music tracks and is being analyzed by Columbia University's Lab ROSA.
    • Currently, services like Pandora rely on trained musicologists to build their music genome. Lab ROSA has been experimenting with machine learning to analyze music tracks.
    • The National Science Foundation's Listening Machine Project focuses on analyzing individual sources in sound recordings.
    • A mega-sized dataset is needed to reveal problems in algorithms and allow researchers to compare results.
    • The Echo Nest's applications use this dataset, and they hope it will strengthen the connection between academic research and commercial development.
  • Important details:

    • The dataset is 300GB in size and contains derived features of a million tunes in database table form, not audio.
    • Sample sound can be obtained from 7digital using code provided by Lab ROSA.
    • Lab ROSA suggests starting with a subset of around 10,000 songs for a quick taste.
    • The need for a large dataset has been a challenge due to the recording industry's stance and expense.
阅读 8
0 条评论