1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272
|
NAME
HTML::Microformats - parse microformats in HTML
SYNOPSIS
use HTML::Microformats;
my $doc = HTML::Microformats
->new_document($html, $uri)
->assume_profile(qw(hCard hCalendar));
print $doc->json(pretty => 1);
use RDF::TrineShortcuts qw(rdf_query);
my $results = rdf_query($sparql, $doc->model);
DESCRIPTION
The HTML::Microformats module is a wrapper for parser and handler
modules of various individual microformats (each of those modules has a
name like HTML::Microformats::Format::Foo).
The general pattern of usage is to create an HTML::Microformats object
(which corresponds to an HTML document) using the "new_document" method;
then ask for the data, as a Perl hashref, a JSON string, or an
RDF::Trine model.
Constructor
"$doc = HTML::Microformats->new_document($html, $uri, %opts)"
Constructs a document object.
$html is the HTML or XHTML source (string) or an
XML::LibXML::Document.
$uri is the document URI, important for resolving relative URL
references.
%opts are additional parameters; currently only one option is
defined: $opts{'type'} is set to 'text/html' or
'application/xhtml+xml', to control how $html is parsed.
Profile Management
HTML::Microformats uses HTML profiles (i.e. the profile attribute on the
HTML <head> element) to detect which Microformats are used on a page.
Any microformats which do not have a profile URI declared will not be
parsed.
Because many pages fail to properly declare which profiles they use,
there are various profile management methods to tell HTML::Microformats
to assume the presence of particular profile URIs, even if they're
actually missing.
"$doc->profiles"
This method returns a list of profile URIs declared by the document.
"$doc->has_profile(@profiles)"
This method returns true if and only if one or more of the profile
URIs in @profiles is declared by the document.
"$doc->add_profile(@profiles)"
Using "add_profile" you can add one or more profile URIs, and they
are treated as if they were found in the document.
For example:
$doc->add_profile('http://microformats.org/profile/rel-tag')
This is useful for adding profile URIs declared outside the document
itself (e.g. in HTTP headers).
Returns a reference to the document.
"$doc->assume_profile(@microformats)"
For example:
$doc->assume_profile(qw(hCard adr geo))
This method acts similarly to "add_profile" but allows you to use
names of microformats rather than URIs.
Microformat names are case sensitive, and must match
HTML::Microformats::Format::Foo module names.
Returns a reference to the document.
"$doc->assume_all_profiles"
This method is equivalent to calling "assume_profile" for all known
microformats.
Returns a reference to the document.
Parsing Microformats
Generally speaking, you can skip this. The "data", "json" and "model"
methods will automatically do this for you.
"$doc->parse_microformats"
Scans through the document, finding microformat objects.
On subsequent calls, does nothing (as everything is already parsed).
Returns a reference to the document.
"$doc->clear_microformats"
Forgets information gleaned by "parse_microformats" and thus allows
"parse_microformats" to be run again. This is useful if you've
modified or added some profiles between runs of "parse_microformats".
Returns a reference to the document.
Retrieving Data
These methods allow you to retrieve the document's data, and do things
with it.
"$doc->objects($format);"
$format is, for example, 'hCard', 'adr' or 'RelTag'.
Returns a list of objects of that type. (If called in scalar
context, returns an arrayref.)
Each object is, for example, an HTML::Microformat::hCard object, or
an HTML::Microformat::RelTag object, etc. See the relevant
documentation for details.
"$doc->all_objects"
Returns a hashref of data. Each hashref key is the name of a
microformat (e.g. 'hCard', 'RelTag', etc), and the values are
arrayrefs of objects.
Each object is, for example, an HTML::Microformat::hCard object, or
an HTML::Microformat::RelTag object, etc. See the relevant
documentation for details.
"$doc->json(%opts)"
Returns data roughly equivalent to the "all_objects" method, but as
a JSON string.
%opts is a hash of options, suitable for passing to the JSON
module's to_json function. The 'convert_blessed' and 'utf8' options
are enabled by default, but can be disabled by explicitly setting
them to 0, e.g.
print $doc->json( pretty=>1, canonical=>1, utf8=>0 );
"$doc->model"
Returns data as an RDF::Trine::Model, suitable for serialising as
RDF or running SPARQL queries.
"$object->serialise_model(as => $format)"
As "model" but returns a string.
"$doc->add_to_model($model)"
Adds data to an existing RDF::Trine::Model.
Returns a reference to the document.
Utility Functions
"HTML::Microformats->modules"
Returns a list of Perl modules, each of which implements a specific
microformat.
"HTML::Microformats->formats"
As per "modules", but strips 'HTML::Microformats::Format::' off the
module name, and sorts alphabetically.
WHY ANOTHER MICROFORMATS MODULE?
There already exist two microformats packages on CPAN (see
Text::Microformat and Data::Microformat), so why create another?
Firstly, HTML::Microformats isn't being created from scratch. It's
actually a fork/clean-up of a non-CPAN application (Swignition), and in
that sense predates Text::Microformat (though not Data::Microformat).
It has a number of other features that distinguish it from the existing
packages:
* It supports more formats.
HTML::Microformats supports hCard, hCalendar, rel-tag, geo, adr,
rel-enclosure, rel-license, hReview, hResume, hRecipe, xFolk, XFN,
hAtom, hNews and more.
* It supports more patterns.
HTML::Microformats supports the include pattern, abbr pattern, table
cell header pattern, value excerpting and other intricacies of
microformat parsing better than the other modules on CPAN.
* It offers RDF support.
One of the key features of HTML::Microformats is that it makes data
available as RDF::Trine models. This allows your application to
benefit from a rich, feature-laden Semantic Web toolkit. Data
gleaned from microformats can be stored in a triple store; output in
RDF/XML or Turtle; queried using the SPARQL or RDQL query languages;
and more.
If you're not comfortable using RDF, HTML::Microformats also makes
all its data available as native Perl objects.
BUGS
Please report any bugs to <http://rt.cpan.org/>.
SEE ALSO
HTML::Microformats::Documentation::Notes.
Individual format modules:
* HTML::Microformats::Format::adr
* HTML::Microformats::Format::figure
* HTML::Microformats::Format::geo
* HTML::Microformats::Format::hAtom
* HTML::Microformats::Format::hAudio
* HTML::Microformats::Format::hCalendar
* HTML::Microformats::Format::hCard
* HTML::Microformats::Format::hListing
* HTML::Microformats::Format::hMeasure
* HTML::Microformats::Format::hNews
* HTML::Microformats::Format::hProduct
* HTML::Microformats::Format::hRecipe
* HTML::Microformats::Format::hResume
* HTML::Microformats::Format::hReview
* HTML::Microformats::Format::hReviewAggregate
* HTML::Microformats::Format::OpenURL_COinS
* HTML::Microformats::Format::RelEnclosure
* HTML::Microformats::Format::RelLicense
* HTML::Microformats::Format::RelTag
* HTML::Microformats::Format::species
* HTML::Microformats::Format::VoteLinks
* HTML::Microformats::Format::XFN
* HTML::Microformats::Format::XMDP
* HTML::Microformats::Format::XOXO
Similar modules: RDF::RDFa::Parser, HTML::HTML5::Microdata::Parser,
XML::Atom::Microformats, Text::Microformat, Data::Microformats.
Related web sites: <http://microformats.org/>,
<http://www.perlrdf.org/>.
AUTHOR
Toby Inkster <tobyink@cpan.org>.
COPYRIGHT AND LICENCE
Copyright 2008-2012 Toby Inkster
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
DISCLAIMER OF WARRANTIES
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|