This tests autolink:: >>> from lxml.html import usedoctest >>> from lxml_html_clean import autolink_html >>> print(autolink_html(''' ...

Link here: http://test.com/foo.html.

... '''))

Link here: http://test.com/foo.html.

>>> print(autolink_html(''' ...

Mail me at mailto:ianb@test.com or http://myhome.com

... '''))

Mail me at ianb@test.com or http://myhome.com

>>> print(autolink_html(''' ...

The great thing is the http://link.com links and ... the http://foobar.com links.

'''))

The great thing is the http://link.com links and the http://foobar.com links.

>>> print(autolink_html(''' ...

Link: <http://foobar.com>

'''))

Link: <http://foobar.com>

>>> print(autolink_html(''' ...

Link: (http://foobar.com)

'''))

Link: (http://foobar.com)

Parenthesis are tricky, we'll do our best:: >>> print(autolink_html(''' ...

(Link: http://en.wikipedia.org/wiki/PC_Tools_(Central_Point_Software))

... '''))

(Link: http://en.wikipedia.org/wiki/PC_Tools_(Central_Point_Software))

>>> print(autolink_html(''' ...

... a link: http://foo.com)

... '''))

... a link: http://foo.com)

Some cases that won't be caught (on purpose):: >>> print(autolink_html(''' ...

A link to http://localhost/foo/bar won't, but a link to ... http://test.com will

'''))

A link to http://localhost/foo/bar won't, but a link to http://test.com will

>>> print(autolink_html(''' ...

A link in

'''))

A link in

>>> print(autolink_html(''' ...

A link in http://bar.com

'''))

A link in http://bar.com

>>> print(autolink_html(''' ...

A link in http://foo.com or ... http://bar.com

'''))

A link in http://foo.com or http://bar.com

There's also a word wrapping function, that should probably be run after autolink:: >>> from lxml_html_clean import word_break_html >>> def pascii(s): ... print(s.encode('ascii', 'xmlcharrefreplace').decode('ascii')) >>> pascii(word_break_html( u''' ...

Hey you ... 12345678901234567890123456789012345678901234567890

'''))

Hey you 12345678901234567890123456789012345678901234567890

Not everything is broken: >>> pascii(word_break_html(''' ...

Hey you ... 12345678901234567890123456789012345678901234567890

'''))

Hey you 12345678901234567890123456789012345678901234567890

>>> pascii(word_break_html(''' ... text''')) text