I posted yesterday about my quick play with the BBC Web API for programme schedules. I wanted to be able to programmatically find out what programme was on a particular channel at a given time.
The problem with the quick code I came up with was that it only gets me BBC channels. What if I want to know what was on a non-BBC channel?
Andrew pointed me at the Radio Times website, which makes programme schedule data available in XMLTV format.
And Dom pointed me at a neat Python library for parsing XMLTV data.
Getting the XMLTV data
Radio Times make the XMLTV data available for each channel individually. For example, BBC1 programme data is at xmltv.radiotimes.com/xmltv/92.dat.
Rather than download each file individually, I downloaded the XMLTV program to do this for me. There is a Windows exe version available in the XMLTV SourceForge project.
Step one is to run the exe:
> xmltv tv_grab_uk_rt --configure
The tv_grab_uk_rt tells XMLTV to get the data from the Radio Times website.
This takes you through some config steps to choose which channels you want to download – either individually choosing each channel you want, or choosing from a preset group (e.g. all FreeView channels).
Step two is to export all of the data to a single XMLTV file:
> xmltv tv_grab_uk_rt --output xmltv.out.xml
This creates an xmltv.out.xml file containing the schedule information for all the channels I chose, in a format that the Python library can understand.
Using the XMLTV data – locally
A quick Python script lets me recreate what I was doing in Java before:
# # IMPORTS # import xmltv from datetime import * from pprint import pprint # # INPUTS # filename = 'C:\\location\\of\\my\\xmltv.out.xml' requestedChannel = "channel4.com" requestedStart = "20090727190500" requestedFinish = "20090727201500" # # CODE # dateFormat = "%Y%m%d%H%M%S" xmltv.locale = 'Latin-1' programmes = xmltv.read_programmes(open(filename, 'r')) requestedStartTime = datetime.strptime(requestedStart, dateFormat) requestedEndTime = datetime.strptime(requestedFinish, dateFormat) for programme in programmes: if programme['channel'] == requestedChannel: progStartTime = datetime.strptime(programme['start'][:-6], dateFormat) if requestedEndTime >= progStartTime: progFinishTime = datetime.strptime(programme['stop'][:-6], dateFormat) if requestedStartTime <= progFinishTime: duration = None if progFinishTime > requestedEndTime: duration = requestedEndTime - progStartTime elif requestedEndTime > progFinishTime: duration = progFinishTime - requestedStartTime else: duration = progFinishTime - progStartTime if duration != timedelta(0): print programme['title'][0][0] print progStartTime print progFinishTime print duration print '--------------------'
Running this will output:
> testtvguide.py Channel 4 News 2009-07-27 19:00:00 2009-07-27 19:55:00 0:50:00 -------------------- 3 Minute Wonder: The Estate 2009-07-27 19:55:00 2009-07-27 20:00:00 0:55:00 -------------------- Dispatches 2009-07-27 20:00:00 2009-07-27 21:00:00 0:15:00 --------------------
It tells me what programmes were on that time, and how much of it was on in the provided time window.
Making this available over the web
For what I want to do with this service, I need to be able to access it remotely – so I wrapped this bit of Python in a web service that I could run in Google App Engine.
import cgi from datetime import * import wsgiref.handlers import xmltv from django.utils import simplejson from google.appengine.ext import webapp class QueryPage(webapp.RequestHandler): dateFormat = "%Y%m%d%H%M%S" def getChannelNameSynonyms(self, channelname): channelname = channelname.lower() if channelname == '1' or channelname == 'bbc1' or channelname == 'south.bbc1.bbc.co.uk': return 'south.bbc1.bbc.co.uk' if channelname == '2' or channelname == 'bbc2' or channelname == 'south.bbc2.bbc.co.uk': return 'south.bbc2.bbc.co.uk' if channelname == 'bbc3' or channelname == 'bbcthree.bbc.co.uk': return 'bbcthree.bbc.co.uk' if channelname == 'bbc4' or channelname == 'bbcfour' or channelname == 'bbcfour.bbc.co.uk': return 'bbcfour.bbc.co.uk' if channelname == 'bbc24' or channelname == 'news.bbc.co.uk' or channelname == 'news' or channelname == 'news24': return 'news.bbc.co.uk' if channelname == 'cbbc' or channelname == 'cbbc.bbc.co.uk': return 'cbbc.bbc.co.uk' if channelname == 'cbeebies' or channelname == 'cbeebies.bbc.co.uk': return 'cbeebies.bbc.co.uk' if channelname == '3' or channelname == 'meridian' or channelname == 'itv' or channelname == 'meridian.itv1.itv.co.uk': return 'meridian.itv1.itv.co.uk' if channelname == '4' or channelname == 'ch4' or channelname == 'channel4' or channelname == 'channel4.com': return 'channel4.com' if channelname == 'e4' or channelname == 'e4.channel4.com': return 'e4.channel4.com' if channelname == 'film4' or channelname == 'filmfour' or channelname == 'filmfour.channel4.com': return 'filmfour.channel4.com' if channelname == 'dave' or channelname == 'dave.uktv.co.uk': return 'dave.uktv.co.uk' def get(self): xmltv.locale = 'Latin-1' filename = 'xmltv.out.xml' filehandle = open(filename, 'r') programmes = xmltv.read_programmes(open(filename, 'r')) requestedChannel = self.getChannelNameSynonyms(self.request.get('channel')) requestedStartTime = datetime.strptime(self.request.get('start'), self.dateFormat) requestedEndTime = datetime.strptime(self.request.get('stop'), self.dateFormat) matchedProgrammes = [] for programme in programmes: if programme['channel'] == requestedChannel: progStartTime = datetime.strptime(programme['start'][:-6], self.dateFormat) if requestedEndTime >= progStartTime: progFinishTime = datetime.strptime(programme['stop'][:-6], self.dateFormat) if requestedStartTime <= progFinishTime: duration = None if progStartTime > requestedStartTime and requestedEndTime > progFinishTime: duration = progFinishTime - progStartTime elif requestedStartTime > progStartTime and progFinishTime > requestedEndTime: duration = requestedEndTime - requestedStartTime elif progFinishTime > requestedEndTime: duration = requestedEndTime - progStartTime elif requestedEndTime > progFinishTime: duration = progFinishTime - requestedStartTime else: duration = None if duration != timedelta(0): matchedProgrammes.append({ 'title' : programme['title'][0][0], 'start' : str(progStartTime), 'stop' : str(progFinishTime), 'watched' : str(duration) }) self.response.out.write(simplejson.dumps(matchedProgrammes)) application = webapp.WSGIApplication([('/tvquery', QueryPage)], debug=True) def main(): wsgiref.handlers.CGIHandler().run(application) if __name__ == "__main__": main()
This means that I now have a web service roughly similar to the BBC Web API one, but one which can give me information for non-BBC channels, too.
For example, using wget to test it:
> wget -O - "http://not-telling-you/tvquery?channel=channel4&start=20090727190500&stop=20090727201500" [ { "start": "2009-07-27 19:00:00", "stop": "2009-07-27 19:55:00", "title": "Channel 4 News", "watched": "0:50:00" }, { "start": "2009-07-27 19:55:00", "stop": "2009-07-27 20:00:00", "title": "3 Minute Wonder: The Estate", "watched": "0:05:00" }, { "start": "2009-07-27 20:00:00", "stop": "2009-07-27 21:00:00", "title": "Dispatches", "watched": "0:15:00" } ]
The problem
You might have noticed that I hid the hostname where I’ve put this service in the example above. I’ve done this because not doing so would be breaching the conditions regarding the use of the Radio Times data:
In accessing this XML feed, you agree that you will only access its contents for your own personal and non-commercial use and not for any commercial or other purposes, including advertising or selling any goods or services, including any third-party software applications available to the general public.
I can do this as long as it’s for my own personal use, but I can’t make it available as a web service to anyone else.
Shame.
Is there anywhere else I can get XMLTV format data that wouldn’t have this restriction?
If I stick a TV tuner card in my server and get the data from the digital TV signal that set-top boxes use to produce EPGs, would that be okay?
Tags: gae, google app engine, programmes, python, radiotimes, schedules, tv, xmltv
Hi Dale,
I had been using the RT feeds for some time, to build a web-based service for myself that would automatically search through the listings on demand for programmes of interest – essentially I had set up a large series of rules, including (i) keywords I liked (subjects, actors etc), (ii) keywords I didn’t want. To help build these rules, for all programmes displayed I included button options, for example: ignore programmes of this title, and optionally for that channel only.
I found that using the programme descriptions as the unique identity for programmes helped weed out unwanted programmes.
The main problem I had was that every few days I had to rebuild my internal database from the latest RT feeds, which took quite a while and sometimes failed 🙁
@Gary – That sounds like a cool idea.
I agree that needing to get xmltv to re-run periodically to get an updated xml file is a pain, but could be done as a cron job, I guess.
I know what you mean about it failing… I’m only doing selected channels at the moment, because some of the channels in there have data that the Python library can’t handle. I need to go through them in turn to work out which ones are causing the trouble. 🙁
An ex-IBMer friend Andrew Flegg was looking at this sort of thing years ago and wrote his site http://bleb.org/tv/ it would be interesting to see how times have changed with what you can do now vs back then. He’s @jaffa2 on Twitter.
@Graham – Thanks for the link. Looks like just the sort of data source I wanted. Wonder where he gets it from…?