Smarter document extraction starts here.
One use case is to convert podcast subtitles to the lyrics format (.lrc), which can then be played on various portable music/media players such as foobar2000 with OpenLyrics plugin ...
Python is a language that seems easy to do, especially for prototyping, but make sure not to make these common mistakes when coding.