1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260
|
sfeed
-----
RSS and Atom parser (and some format programs).
It converts RSS or Atom feeds from XML to a TAB-separated file. There are
formatting programs included to convert this TAB-separated format to various
other formats. There are also some programs and scripts included to import and
export OPML and to fetch, filter, merge and order feed items.
Build and install
-----------------
$ make
# make install
To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:
$ make SFEED_CURSES=""
# make SFEED_CURSES="" install
To change the theme for sfeed_curses you can set SFEED_THEME. See the themes/
directory for the theme names.
$ make SFEED_THEME="templeos"
# make SFEED_THEME="templeos" install
Usage
-----
Initial setup:
mkdir -p "$HOME/.sfeed/feeds"
cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
is included and evaluated as a shellscript for sfeed_update, so its functions
and behaviour can be overridden:
$EDITOR "$HOME/.sfeed/sfeedrc"
or you can import existing OPML subscriptions using sfeed_opml_import(1):
sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"
an example to export from an other RSS/Atom reader called newsboat and import
for sfeed_update:
newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
an example to export from an other RSS/Atom reader called rss2email (3.x+) and
import for sfeed_update:
r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
Update feeds, this script merges the new items, see sfeed_update(1) for more
information what it can do:
sfeed_update
Format feeds:
Plain-text list:
sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"
HTML view (no frames), copy style.css for a default style:
cp style.css "$HOME/.sfeed/style.css"
sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"
HTML view with the menu as frames, copy style.css for a default style:
mkdir -p "$HOME/.sfeed/frames"
cp style.css "$HOME/.sfeed/frames/style.css"
cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
To automatically update your feeds periodically and format them in a way you
like you can make a wrapper script and add it as a cronjob.
Most protocols are supported because curl(1) is used by default and also proxy
settings from the environment (such as the $http_proxy environment variable)
are used.
The sfeed(1) program itself is just a parser that parses XML data from stdin
and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS,
Gopher, SSH, etc.
See the section "Usage and examples" below and the man-pages for more
information how to use sfeed(1) and the additional tools.
Dependencies
------------
- C compiler (C99).
- libc (recommended: C99 and POSIX >= 200809).
Optional dependencies
---------------------
- POSIX make(1) for the Makefile.
- POSIX sh(1),
used by sfeed_update(1) and sfeed_opml_export(1).
- POSIX utilities such as awk(1) and sort(1),
used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and
sfeed_update(1).
- curl(1) binary: https://curl.haxx.se/ ,
used by sfeed_update(1), but can be replaced with any tool like wget(1),
OpenBSD ftp(1) or hurl(1): https://codemadness.org/hurl.html
- iconv(1) command-line utilities,
used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
encoded then you don't need this. For a minimal iconv implementation:
https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
- xargs with support for the -P and -0 option,
used by sfeed_update(1).
- mandoc for documentation: https://mdocml.bsd.lv/
- curses (typically ncurses), otherwise see minicurses.h,
used by sfeed_curses(1).
- a terminal (emulator) supporting UTF-8 and the used capabilities,
used by sfeed_curses(1).
Optional run-time dependencies for sfeed_curses
-----------------------------------------------
- xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
- xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it.
- awk, used by the sfeed_content and sfeed_markread script.
See the ENVIRONMENT VARIABLES section in the man page to change it.
- lynx, used by the sfeed_content script to convert HTML content.
An alternative: webdump: https://codemadness.org/webdump.html
See the ENVIRONMENT VARIABLES section in the man page to change it.
Formats supported
-----------------
sfeed supports a subset of XML 1.0 and a subset of:
- Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
- Atom 0.3 (draft, historic).
- RSS 0.90+.
- RDF (when used with RSS).
- MediaRSS extensions (media:).
- Dublin Core extensions (dc:).
Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are
supported by converting them to RSS/Atom or to the sfeed(5) format directly.
OS tested
---------
- Linux,
compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc,
libc: glibc, musl.
- OpenBSD (clang, gcc).
- NetBSD (with NetBSD curses).
- FreeBSD
- DragonFlyBSD
- GNU/Hurd
- Illumos (OpenIndiana).
- Windows (cygwin gcc + mintty, mingw).
- HaikuOS
- SerenityOS
- FreeDOS (djgpp, Open Watcom).
- FUZIX (sdcc -mz80, with the sfeed parser program).
Architectures tested
--------------------
amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.
Files
-----
sfeed - Read XML RSS or Atom feed data from stdin. Write feed data
in TAB-separated format to stdout.
sfeed_atom - Format feed data (TSV) to an Atom feed.
sfeed_content - View item content, for use with sfeed_curses.
sfeed_curses - Format feed data (TSV) to a curses interface.
sfeed_frames - Format feed data (TSV) to HTML file(s) with frames.
sfeed_gopher - Format feed data (TSV) to Gopher files.
sfeed_html - Format feed data (TSV) to HTML.
sfeed_json - Format feed data (TSV) to JSON Feed.
sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
sfeed_markread - Mark items as read/unread, for use with sfeed_curses.
sfeed_mbox - Format feed data (TSV) to mbox.
sfeed_plain - Format feed data (TSV) to a plain-text list.
sfeed_twtxt - Format feed data (TSV) to a twtxt feed.
sfeed_update - Update feeds and merge items.
sfeed_web - Find URLs to RSS/Atom feed from a webpage.
sfeed_xmlenc - Detect character-set encoding from a XML stream.
sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
style.css - Example stylesheet to use with sfeed_html(1) and
sfeed_frames(1).
Files read at runtime by sfeed_update(1)
----------------------------------------
sfeedrc - Config file. This file is evaluated as a shellscript in
sfeed_update(1).
At least the following functions can be overridden per feed:
- fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
- filter: to filter on fields.
- merge: to change the merge logic.
- order: to change the sort order.
See also the sfeedrc(5) man page documentation for more details.
The feeds() function is called to process the feeds. The default feed()
function is executed concurrently as a background job in your sfeedrc(5) config
file to make updating faster. The variable maxjobs can be changed to limit or
increase the amount of concurrent jobs (8 by default).
Files written at runtime by sfeed_update(1)
-------------------------------------------
feedname - TAB-separated format containing all items per feed. The
sfeed_update(1) script merges new items with this file.
The format is documented in sfeed(5).
File format
-----------
man 5 sfeed
man 5 sfeedrc
man 1 sfeed
Usage and examples
------------------
Find RSS/Atom feed URLs from a webpage:
url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"
output example:
https://codemadness.org/atom.xml application/atom+xml
https://codemadness.org/atom_content.xml application/atom+xml
- - -
Make sure your sfeedrc config file exists, see the sfeedrc.example file. To
update your feeds (configfile argument is optional):
sfeed_update "configfile"
Format the feeds files:
# Plain-text list.
sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
# HTML view (no frames), copy style.css for a default style.
sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
# HTML view with the menu as frames, copy style.css for a default style.
mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*
View formatted output in your browser:
$BROWSER "$HOME/.sfeed/feeds.html"
View formatted output in your editor:
$EDITOR "$HOME/.sfeed/feeds.txt"
- - -
View formatted output in a curses interface. The interface has a look inspired
by the mutt mail client. It has a sidebar panel for the feeds, a panel with a
listing of the items and a small statusbar for the selected item/URL. Some
functions like searching and scrolling are integrated in the interface itself.
Just like the other format programs included in sfeed you can run it like this:
sfeed_curses ~/.sfeed/feeds/*
... or by reading from stdin:
sfeed_curses < ~/.sfeed/feeds/xkcd
By default sfeed_curses marks the items of the last day as new/bold. This limit
might be overridden by setting the environment variable $SFEED_NEW_AGE to the
desired maximum in seconds. To manage read/unread items in a different way a
plain-text file with a list of the read URLs can be used. To enable this
behaviour the path to this file can be specified by setting the environment
variable $SFEED_URL_FILE to the URL file:
export SFEED_URL_FILE="$HOME/.sfeed/urls"
[ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
sfeed_curses ~/.sfeed/feeds/*
It then uses the shellscript "sfeed_markread" to process the read and unread
items.
- - -
Example script to view feed items in a vertical list/menu in dmenu(1). It opens
the selected URL in the browser set in $BROWSER:
#!/bin/sh
url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
test -n "${url}" && $BROWSER "${url}"
dmenu can be found at: https://git.suckless.org/dmenu/
- - -
Generate a sfeedrc config file from your exported list of feeds in OPML
format:
sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc
- - -
Export an OPML file of your feeds from a sfeedrc config file (configfile
argument is optional):
sfeed_opml_export configfile > myfeeds.opml
- - -
The filter function can be overridden in your sfeedrc file. This allows
filtering items per feed. It can be used to shorten URLs, filter away
advertisements, strip tracking parameters and more.
# filter fields.
# filter(name, url)
filter() {
case "$1" in
"tweakers")
awk -F '\t' 'BEGIN { OFS = "\t"; }
# skip ads.
$2 ~ /^ADV:/ {
next;
}
# shorten link.
{
if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
$3 = substr($3, RSTART, RLENGTH);
}
print $0;
}';;
"yt BSDNow")
# filter only BSD Now from channel.
awk -F '\t' '$2 ~ / \| BSD Now/';;
*)
cat;;
esac | \
# replace youtube links with embed links.
sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \
awk -F '\t' 'BEGIN { OFS = "\t"; }
function filterlink(s) {
# protocol must start with HTTP, HTTPS or Gopher.
if (match(s, /^(http|https|gopher):\/\//) == 0) {
return "";
}
# shorten feedburner links.
if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) {
s = substr($3, RSTART, RLENGTH);
}
# strip tracking parameters
# urchin, facebook, piwik, webtrekk and generic.
gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s);
gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s);
gsub(/\?&/, "?", s);
gsub(/[\?&]+$/, "", s);
return s
}
{
$3 = filterlink($3); # link
$8 = filterlink($8); # enclosure
# try to remove tracking pixels: <img/> tags with 1px width or height.
gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4);
print $0;
}'
}
- - -
Aggregate feeds. This filters new entries (maximum one day old) and sorts them
by newest first. Prefix the feed name in the title. Convert the TSV output data
to an Atom XML feed (again):
#!/bin/sh
cd ~/.sfeed/feeds/ || exit 1
awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
BEGIN { OFS = "\t"; }
int($1) >= old {
$2 = "[" FILENAME "] " $2;
print $0;
}' * | \
sort -k1,1rn | \
sfeed_atom
- - -
To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and
showing them as plain-text per line similar to sfeed_plain(1):
Create a FIFO:
fifo="/tmp/sfeed_fifo"
mkfifo "$fifo"
On the reading side:
# This keeps track of unique lines so might consume much memory.
# It tries to reopen the $fifo after 1 second if it fails.
while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'
On the writing side:
feedsdir="$HOME/.sfeed/feeds/"
cd "$feedsdir" || exit 1
test -p "$fifo" || exit 1
# 1 day is old news, don't write older items.
awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
BEGIN { OFS = "\t"; }
int($1) >= old {
$2 = "[" FILENAME "] " $2;
print $0;
}' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"
cut -b is used to trim the "N " prefix of sfeed_plain(1).
- - -
For some podcast feed the following code can be used to filter the latest
enclosure URL (probably some audio file):
awk -F '\t' 'BEGIN { latest = 0; }
length($8) {
ts = int($1);
if (ts > latest) {
url = $8;
latest = ts;
}
}
END { if (length(url)) { print url; } }'
... or on a file already sorted from newest to oldest:
awk -F '\t' '$8 { print $8; exit }'
- - -
Over time your feeds file might become quite big. You can archive items of a
feed from (roughly) the last week by doing for example:
awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new
mv feed feed.bak
mv feed.new feed
This could also be run weekly in a crontab to archive the feeds. Like throwing
away old newspapers. It keeps the feeds list tidy and the formatted output
small.
- - -
Convert mbox to separate maildirs per feed and filter duplicate messages using the
fdm program.
fdm is available at: https://github.com/nicm/fdm
fdm config file (~/.sfeed/fdm.conf):
set unmatched-mail keep
account "sfeed" mbox "%[home]/.sfeed/mbox"
$cachepath = "%[home]/.sfeed/fdm.cache"
cache "${cachepath}"
$maildir = "%[home]/feeds/"
# Check if message is in the cache by Message-ID.
match case "^Message-ID: (.*)" in headers
action {
tag "msgid" value "%1"
}
continue
# If it is in the cache, stop.
match matched and in-cache "${cachepath}" key "%[msgid]"
action {
keep
}
# Not in the cache, process it and add to cache.
match case "^X-Feedname: (.*)" in headers
action {
# Store to local maildir.
maildir "${maildir}%1"
add-to-cache "${cachepath}" key "%[msgid]"
keep
}
Now run:
$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
$ fdm -f ~/.sfeed/fdm.conf fetch
Now you can view feeds in mutt(1) for example.
- - -
Read from mbox and filter duplicate messages using the fdm program and deliver
it to a SMTP server. This works similar to the rss2email program.
fdm is available at: https://github.com/nicm/fdm
fdm config file (~/.sfeed/fdm.conf):
set unmatched-mail keep
account "sfeed" mbox "%[home]/.sfeed/mbox"
$cachepath = "%[home]/.sfeed/fdm.cache"
cache "${cachepath}"
# Check if message is in the cache by Message-ID.
match case "^Message-ID: (.*)" in headers
action {
tag "msgid" value "%1"
}
continue
# If it is in the cache, stop.
match matched and in-cache "${cachepath}" key "%[msgid]"
action {
keep
}
# Not in the cache, process it and add to cache.
match case "^X-Feedname: (.*)" in headers
action {
# Connect to a SMTP server and attempt to deliver the
# mail to it.
# Of course change the server and e-mail below.
smtp server "codemadness.org" to "hiltjo@codemadness.org"
add-to-cache "${cachepath}" key "%[msgid]"
keep
}
Now run:
$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
$ fdm -f ~/.sfeed/fdm.conf fetch
Now you can view feeds in mutt(1) for example.
- - -
Convert mbox to separate maildirs per feed and filter duplicate messages using
procmail(1).
procmail_maildirs.sh file:
maildir="$HOME/feeds"
feedsdir="$HOME/.sfeed/feeds"
procmailconfig="$HOME/.sfeed/procmailrc"
# message-id cache to prevent duplicates.
mkdir -p "${maildir}/.cache"
if ! test -r "${procmailconfig}"; then
printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2
echo "See procmailrc.example for an example." >&2
exit 1
fi
find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
name=$(basename "${d}")
mkdir -p "${maildir}/${name}/cur"
mkdir -p "${maildir}/${name}/new"
mkdir -p "${maildir}/${name}/tmp"
printf 'Mailbox %s\n' "${name}"
sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}"
done
Procmailrc(5) file:
# Example for use with sfeed_mbox(1).
# The header X-Feedname is used to split into separate maildirs. It is
# assumed this name is sane.
MAILDIR="$HOME/feeds/"
:0
* ^X-Feedname: \/.*
{
FEED="$MATCH"
:0 Wh: "msgid_$FEED.lock"
| formail -D 1024000 ".cache/msgid_$FEED.cache"
:0
"$FEED"/
}
Now run:
$ procmail_maildirs.sh
Now you can view feeds in mutt(1) for example.
- - -
The fetch function can be overridden in your sfeedrc file. This allows to
replace the default curl(1) for sfeed_update with any other client to fetch the
RSS/Atom data or change the default curl options:
# fetch a feed via HTTP/HTTPS etc.
# fetch(name, url, feedfile)
fetch() {
hurl -m 1048576 -t 15 "$2" 2>/dev/null
}
- - -
Caching, incremental data updates and bandwidth saving
For servers that support it some incremental updates and bandwidth saving can
be done by using the "ETag" HTTP header.
Create a directory for storing the ETags and modification timestamps per feed:
mkdir -p ~/.sfeed/etags ~/.sfeed/lastmod
The curl ETag options (--etag-save and --etag-compare) can be used to store and
send the previous ETag header value. curl version 7.73+ is recommended for it
to work properly.
The curl -z option can be used to send the modification date of a local file as
a HTTP "If-Modified-Since" request header. The server can then respond if the
data is modified or not or respond with only the incremental data.
The curl --compressed option can be used to indicate the client supports
decompression. Because RSS/Atom feeds are textual XML content this generally
compresses very well.
These options can be set by overriding the fetch() function in the sfeedrc
file:
# fetch(name, url, feedfile)
fetch() {
basename="$(basename "$3")"
etag="$HOME/.sfeed/etags/${basename}"
lastmod="$HOME/.sfeed/lastmod/${basename}"
output="${sfeedtmpdir}/feeds/${basename}.xml"
curl \
-f -s -m 15 \
-L --max-redirs 0 \
-H "User-Agent: sfeed" \
--compressed \
--etag-save "${etag}" --etag-compare "${etag}" \
-R -o "${output}" \
-z "${lastmod}" \
"$2" 2>/dev/null || return 1
# succesful, but no file written: assume it is OK and Not Modified.
[ -e "${output}" ] || return 0
# use server timestamp from curl -R to set Last-Modified.
touch -r "${output}" "${lastmod}" 2>/dev/null
cat "${output}" 2>/dev/null
# use write output status, other errors are ignored here.
fetchstatus="$?"
rm -f "${output}" 2>/dev/null
return "${fetchstatus}"
}
These options can come at a cost of some privacy, because it exposes
additional metadata from the previous request.
- - -
CDNs blocking requests due to a missing HTTP User-Agent request header
sfeed_update will not send the "User-Agent" header by default for privacy
reasons. Some CDNs like Cloudflare or websites like Reddit.com don't like this
and will block such HTTP requests.
A custom User-Agent can be set by using the curl -H option, like so:
curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
The above example string pretends to be a Windows 10 (x86-64) machine running
Firefox 78.
- - -
Page redirects
For security and efficiency reasons by default redirects are not allowed and
are treated as an error.
For example to prevent hijacking an unencrypted http:// to https:// redirect or
to not add time of an unnecessary page redirect each time. It is encouraged to
use the final redirected URL in the sfeedrc config file.
If you want to ignore this advise you can override the fetch() function in the
sfeedrc file and change the curl options "-L --max-redirs 0".
- - -
Shellscript to handle URLs and enclosures in parallel using xargs -P.
This can be used to download and process URLs for downloading podcasts,
webcomics, download and convert webpages, mirror videos, etc. It uses a
plain-text cache file for remembering processed URLs. The match patterns are
defined in the shellscript fetch() function and in the awk script and can be
modified to handle items differently depending on their context.
The arguments for the script are files in the sfeed(5) format. If no file
arguments are specified then the data is read from stdin.
#!/bin/sh
# sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
# Dependencies: awk, curl, flock, xargs (-P), yt-dlp.
cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
jobs="${SFEED_JOBS:-4}"
lockfile="${HOME}/.sfeed/sfeed_download.lock"
# log(feedname, s, status)
log() {
if [ "$1" != "-" ]; then
s="[$1] $2"
else
s="$2"
fi
printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3"
}
# fetch(url, feedname)
fetch() {
case "$1" in
*youtube.com*)
yt-dlp "$1";;
*.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
# allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
esac
}
# downloader(url, title, feedname)
downloader() {
url="$1"
title="$2"
feedname="${3##*/}"
msg="${title}: ${url}"
# download directory.
if [ "${feedname}" != "-" ]; then
mkdir -p "${feedname}"
if ! cd "${feedname}"; then
log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2
return 1
fi
fi
log "${feedname}" "${msg}" "START"
if fetch "${url}" "${feedname}"; then
log "${feedname}" "${msg}" "OK"
# append it safely in parallel to the cachefile on a
# successful download.
(flock 9 || exit 1
printf '%s\n' "${url}" >> "${cachefile}"
) 9>"${lockfile}"
else
log "${feedname}" "${msg}" "FAIL" >&2
return 1
fi
return 0
}
if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
# Downloader helper for parallel downloading.
# Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
# It should write the URI to the cachefile if it is successful.
downloader "$1" "$2" "$3"
exit $?
fi
# ...else parent mode:
tmp="$(mktemp)" || exit 1
trap "rm -f ${tmp}" EXIT
[ -f "${cachefile}" ] || touch "${cachefile}"
cat "${cachefile}" > "${tmp}"
echo >> "${tmp}" # force it to have one line for awk.
LC_ALL=C awk -F '\t' '
# fast prefilter what to download or not.
function filter(url, field, feedname) {
u = tolower(url);
return (match(u, "youtube\\.com") ||
match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$"));
}
function download(url, field, title, filename) {
if (!length(url) || urls[url] || !filter(url, field, filename))
return;
# NUL-separated for xargs -0.
printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
urls[url] = 1; # print once
}
{
FILENR += (FNR == 1);
}
# lookup table from cachefile which contains downloaded URLs.
FILENR == 1 {
urls[$0] = 1;
}
# feed file(s).
FILENR != 1 {
download($3, 3, $2, FILENAME); # link
download($8, 8, $2, FILENAME); # enclosure
}
' "${tmp}" "${@:--}" | \
SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")"
- - -
Shellscript to export existing newsboat cached items from sqlite3 to the sfeed
TSV format.
#!/bin/sh
# Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format.
# The data is split per file per feed with the name of the newsboat title/url.
# It writes the URLs of the read items line by line to a "urls" file.
#
# Dependencies: sqlite3, awk.
#
# Usage: create some directory to store the feeds then run this script.
# newsboat cache.db file.
cachefile="$HOME/.newsboat/cache.db"
test -n "$1" && cachefile="$1"
# dump data.
# .mode ascii: Columns/rows delimited by 0x1F and 0x1E
# get the first fields in the order of the sfeed(5) format.
sqlite3 "$cachefile" <<!EOF |
.headers off
.mode ascii
.output
SELECT
i.pubDate, i.title, i.url, i.content, i.content_mime_type,
i.guid, i.author, i.enclosure_url,
f.rssurl AS rssurl, f.title AS feedtitle, i.unread
-- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base
FROM rss_feed f
INNER JOIN rss_item i ON i.feedurl = f.rssurl
ORDER BY
i.feedurl ASC, i.pubDate DESC;
.quit
!EOF
# convert to sfeed(5) TSV format.
LC_ALL=C awk '
BEGIN {
FS = "\x1f";
RS = "\x1e";
}
# normal non-content fields.
function field(s) {
gsub("^[[:space:]]*", "", s);
gsub("[[:space:]]*$", "", s);
gsub("[[:space:]]", " ", s);
gsub("[[:cntrl:]]", "", s);
return s;
}
# content field.
function content(s) {
gsub("^[[:space:]]*", "", s);
gsub("[[:space:]]*$", "", s);
# escape chars in content field.
gsub("\\\\", "\\\\", s);
gsub("\n", "\\n", s);
gsub("\t", "\\t", s);
return s;
}
function feedname(feedurl, feedtitle) {
if (feedtitle == "") {
gsub("/", "_", feedurl);
return feedurl;
}
gsub("/", "_", feedtitle);
return feedtitle;
}
{
fname = feedname($9, $10);
if (!feed[fname]++) {
print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr";
}
contenttype = field($5);
if (contenttype == "")
contenttype = "html";
else if (index(contenttype, "/html") || index(contenttype, "/xhtml"))
contenttype = "html";
else
contenttype = "plain";
print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \
contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \
> fname;
# write URLs of the read items to a file line by line.
if ($11 == "0") {
print $3 > "urls";
}
}'
- - -
Progress indicator
------------------
The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc
config. It then calls sfeed_update and pipes the output lines to a function
that counts the current progress. It writes the total progress to stderr.
Alternative: pv -l -s totallines
#!/bin/sh
# Progress indicator script.
# Pass lines as input to stdin and write progress status to stderr.
# progress(totallines)
progress() {
total="$(($1 + 0))" # must be a number, no divide by zero.
test "${total}" -le 0 -o "$1" != "${total}" && return
LC_ALL=C awk -v "total=${total}" '
{
counter++;
percent = (counter * 100) / total;
printf("\033[K") > "/dev/stderr"; # clear EOL
print $0;
printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr";
fflush(); # flush all buffers per line.
}
END {
printf("\033[K") > "/dev/stderr";
}'
}
# Counts the feeds from the sfeedrc config.
countfeeds() {
count=0
. "$1"
feed() {
count=$((count + 1))
}
feeds
echo "${count}"
}
config="${1:-$HOME/.sfeed/sfeedrc}"
total=$(countfeeds "${config}")
sfeed_update "${config}" 2>&1 | progress "${total}"
- - -
Counting unread and total items
-------------------------------
It can be useful to show the counts of unread items, for example in a
windowmanager or statusbar.
The below example script counts the items of the last day in the same way the
formatting tools do:
#!/bin/sh
# Count the new items of the last day.
LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
{
total++;
}
int($1) >= old {
totalnew++;
}
END {
print "New: " totalnew;
print "Total: " total;
}' ~/.sfeed/feeds/*
The below example script counts the unread items using the sfeed_curses URL
file:
#!/bin/sh
# Count the unread and total items from feeds using the URL file.
LC_ALL=C awk -F '\t' '
# URL file: amount of fields is 1.
NF == 1 {
u[$0] = 1; # lookup table of URLs.
next;
}
# feed file: check by URL or id.
{
total++;
if (length($3)) {
if (u[$3])
read++;
} else if (length($6)) {
if (u[$6])
read++;
}
}
END {
print "Unread: " (total - read);
print "Total: " total;
}' ~/.sfeed/urls ~/.sfeed/feeds/*
- - -
sfeed.c: adding new XML tags or sfeed(5) fields to the parser
-------------------------------------------------------------
sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV
fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a
number. This TagId is then mapped to the output field index.
Steps to modify the code:
* Add a new TagId enum for the tag.
* (optional) Add a new FeedField* enum for the new output field or you can map
it to an existing field.
* Add the new XML tag name to the array variable of parsed RSS or Atom
tags: rsstags[] or atomtags[].
These must be defined in alphabetical order, because a binary search is used
which uses the strcasecmp() function.
* Add the parsed TagId to the output field in the array variable fieldmap[].
When another tag is also mapped to the same output field then the tag with
the highest TagId number value overrides the mapped field: the order is from
least important to high.
* If this defined tag is just using the inner data of the XML tag, then this
definition is enough. If it for example has to parse a certain attribute you
have to add a check for the TagId to the xmlattr() callback function.
* (optional) Print the new field in the printfields() function.
Below is a patch example to add the MRSS "media:content" tag as a new field:
diff --git a/sfeed.c b/sfeed.c
--- a/sfeed.c
+++ b/sfeed.c
@@ -50,7 +50,7 @@ enum TagId {
RSSTagGuidPermalinkTrue,
/* must be defined after GUID, because it can be a link (isPermaLink) */
RSSTagLink,
- RSSTagEnclosure,
+ RSSTagMediaContent, RSSTagEnclosure,
RSSTagAuthor, RSSTagDccreator,
RSSTagCategory,
/* Atom */
@@ -81,7 +81,7 @@ typedef struct field {
enum {
FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent,
FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory,
- FeedFieldLast
+ FeedFieldMediaContent, FeedFieldLast
};
typedef struct feedcontext {
@@ -137,6 +137,7 @@ static const FeedTag rsstags[] = {
{ STRP("enclosure"), RSSTagEnclosure },
{ STRP("guid"), RSSTagGuid },
{ STRP("link"), RSSTagLink },
+ { STRP("media:content"), RSSTagMediaContent },
{ STRP("media:description"), RSSTagMediaDescription },
{ STRP("pubdate"), RSSTagPubdate },
{ STRP("title"), RSSTagTitle }
@@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = {
[RSSTagGuidPermalinkFalse] = FeedFieldId,
[RSSTagGuidPermalinkTrue] = FeedFieldId, /* special-case: both a link and an id */
[RSSTagLink] = FeedFieldLink,
+ [RSSTagMediaContent] = FeedFieldMediaContent,
[RSSTagEnclosure] = FeedFieldEnclosure,
[RSSTagAuthor] = FeedFieldAuthor,
[RSSTagDccreator] = FeedFieldAuthor,
@@ -677,6 +679,8 @@ printfields(void)
string_print_uri(&ctx.fields[FeedFieldEnclosure].str);
putchar(FieldSeparator);
string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str);
+ putchar(FieldSeparator);
+ string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str);
putchar('\n');
if (ferror(stdout)) /* check for errors but do not flush */
@@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl,
}
if (ctx.feedtype == FeedTypeRSS) {
- if (ctx.tag.id == RSSTagEnclosure &&
+ if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) &&
isattr(n, nl, STRP("url"))) {
string_append(&tmpstr, v, vl);
} else if (ctx.tag.id == RSSTagGuid &&
- - -
Running custom commands inside the sfeed_curses program
-------------------------------------------------------
Running commands inside the sfeed_curses program can be useful for example to
sync items or mark all items across all feeds as read. It can be comfortable to
have a keybind for this inside the program to perform a scripted action and
then reload the feeds by sending the signal SIGHUP.
In the input handling code you can then add a case:
case 'M':
forkexec((char *[]) { "markallread.sh", NULL }, 0);
break;
or
case 'S':
forkexec((char *[]) { "syncnews.sh", NULL }, 1);
break;
The specified script should be in $PATH or be an absolute path.
Example of a `markallread.sh` shellscript to mark all URLs as read:
#!/bin/sh
# mark all items/URLs as read.
tmp="$(mktemp)" || exit 1
(cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
awk '!x[$0]++' > "$tmp" &&
mv "$tmp" ~/.sfeed/urls &&
pkill -SIGHUP sfeed_curses # reload feeds.
Example of a `syncnews.sh` shellscript to update the feeds and reload them:
#!/bin/sh
sfeed_update
pkill -SIGHUP sfeed_curses
Running programs in a new session
---------------------------------
By default processes are spawned in the same session and process group as
sfeed_curses. When sfeed_curses is closed this can also close the spawned
process in some cases.
When the setsid command-line program is available the following wrapper command
can be used to run the program in a new session, for a plumb program:
setsid -f xdg-open "$@"
Alternatively the code can be changed to call setsid() before execvp().
Open an URL directly in the same terminal
-----------------------------------------
To open an URL directly in the same terminal using the text-mode lynx browser:
SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/*
Yank to tmux buffer
-------------------
This changes the yank command to set the tmux buffer, instead of X11 xclip:
SFEED_YANKER="tmux set-buffer \`cat\`"
Alternative for xargs -P and -0
-------------------------------
Most xargs implementations support the options -P and -0.
GNU or *BSD has supported them for over 20+ years!
These functions in sfeed_update can be overridden in sfeedrc, if you don't want
to use xargs:
feed() {
# wait until ${maxjobs} are finished: will stall the queue if an item
# is slow, but it is portable.
[ ${signo} -ne 0 ] && return
[ $((curjobs % maxjobs)) -eq 0 ] && wait
[ ${signo} -ne 0 ] && return
curjobs=$((curjobs + 1))
_feed "$@" &
}
runfeeds() {
# job counter.
curjobs=0
# fetch feeds specified in config file.
feeds
# wait till all feeds are fetched (concurrently).
[ ${signo} -eq 0 ] && wait
}
Known terminal issues
---------------------
Below lists some bugs or missing features in terminals that are found while
testing sfeed_curses. Some of them might be fixed already upstream:
- cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for
scrolling.
- HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the
middle-button, right-button is incorrect / reversed.
- putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
window title.
- Mouse button encoding for extended buttons (like side-buttons) in some
terminals are unsupported or map to the same button: for example side-buttons 7
and 8 map to the scroll buttons 4 and 5 in urxvt.
License
-------
ISC, see LICENSE file.
Author
------
Hiltjo Posthuma <hiltjo@codemadness.org>
|