HTTP Based Importers
From NonameTV
This page describes how to write an importer that downloads data via HTTP. It assumes that the data is structured in files that represent one day, one week or all available data for one channel. This is the the case for most http-based importers.
Note that not all grabbers use the methods described on this page. The way to write a grabber has evolved over time and this page describes only the most recent method, since this is the simplest and most powerful method.
Contents |
Configuration
The configuration for the importer comes from the nonametv.conf configuration file and all configuration parameters are available in the $self object in the constructor and in all subsequent calls to the object methods.
To use the Skeleton importer, you should have the following in nonametv.conf:
Importers => {
Skeleton => {
Type => 'Skeleton',
UrlRoot => 'http://baseurl.com/test',
MaxWeeks => 4,
Channels => {
"test.skeleton.se" =>
[ "Skeleton TV", "skel", "sv", 0, ],
}
}
}
- The first "Skeleton" defines the name of the grabber that you use in the call to nonametv-import.
- The Type parameter defines that the NonameTV::Importer::Skeleton class shall be used.
- The UrlRoot parameter is used in the importer.
- The MaxWeeks parameter tells NonameTV::Importer::BaseWeekly how many weeks of data it shall try to download.
- The Channels section defines the channels that the importer shall download data for. It consists of a hashref with xmltvids as keys and an array with channel name, grabber_info, sched_lang and empty_ok. These are copied to the channels table and available in the $chd parameter in calls to the importer.
Inheritance
The base class for these types of importers is NonameTV::Importer::BasePeriodic. There are three classes that derive from BasePeriodic: BaseDaily, BaseWeekly and BaseOne. Your importer should normally derive from one of these. The file lib/NonameTV/Importer/Skeleton.pm contains a starting point for a new importer.
First of all, you need to derive from the right baseclass:
use NonameTV::Importer::BaseWeekly; use base 'NonameTV::Importer::BaseWeekly';
Then, you need to add a constructor that at least calls the constructor of the base class and sets the name of the grabber (shall be equal to the class name):
sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
my $self = $class->SUPER::new( @_ );
bless ($self, $class);
$self->{grabber_name} = 'Skeleton';
return $self;
}
Additionally, you should verify that the importer has been properly configured and set up a DataStore::Helper object if you need it:
defined( $self->{UrlRoot} ) or die "You must specify UrlRoot";
my $dsh = NonameTV::DataStore::Helper->new( $self->{datastore} );
$self->{datastorehelper} = $dsh;
Required Methods
An importer that derives from BasePeriodic must implement a number of methods to actually perform any action.
Object2Url
The Object2Url converts a batchname to an url. The batchname is of the form "xmltvid_year-week" for BaseWeekly, e.g "tv1.svt.se_2008-32". For BaseDaily, the batchname has the form "xmltvid_year-month-day" and for BaseOne it is "xmltvid_all".
Object2Url can return more than one url. This is useful when the data can be at several different urls. The url:s are tried in order until one of them succeeds.
FilterContent
The FilterContent method is optional, but it can be very useful. It allows the downloaded data to be filtered before it is passed to ImportContent. The filter serves several purposes:
- The output from the filter is compared with the output from the filter the last time it was run. If it is equivalent, then ImportContent is not called. This allows the filter to remove any content that is changed every time a url is loaded.
- Both the downloaded content and the filtered content is written to disk where it can be viewed for debugging purposes. If the filter removes any unused data in the file, the filtered content becomes easier to understand.
The default FilterContent method passes the data through unchanged.
ContentExtension
The ContentExtension method should simply return a string such as "xml", "html", or "txt". This string is used as the filename extension when the downloaded content is written to disk. Setting a sane extension here makes it easier to look at the files with a file-browser. ContentExtension does not take any parameters.
FilteredExtension
Same as ContentExtension, but this method is called to get an extension for writing the data output by FilterContent.
ImportContent
This method shall take the data returned by FilterContent and import it inti the mysql database. It is called with three parameters:
- $batchname - same as Object2Url
- $cref - a reference to the filtered content
- $chd - a hashref containing data about the channel from the channels table.
To add the data to the database, the NonameTV::DataStore object available as $self->{datastore} or a NonameTV::DataStore::Helper object that you create from it.
