Stars: 195
Forks: 32
Pull Requests: 93
Issues: 99
Watchers: 16
Last Updated: 2023-05-30 08:03:40
Evolution of ttrss_plugin-af_feedmod
License: MIT License
Languages: PHP
https://discourse.tt-rss.org/t/plugin-update-feediron-v1-2-0/2018
Reforge your feeds
Recipes moved to separate Repository
About | Table Of Contents |
---|---|
This is a plugin for Tiny Tiny RSS (tt-rss). It allows you to replace an article's contents by the contents of an element on the linked URL's page i.e. create a "full feed". Keep up to date by subscribing to the Release Feed |
Checkout the directory into your plugins folder like this (from tt-RSS root directory):
$ cd /var/www/ttrss
$ git clone https://github.com/feediron/ttrss_plugin-feediron.git plugins.local/feediron
Then enable the plugin in TT-RSS preferences menu.
Install Readability.php using composer. Assuming composer is installed, navigate to the FeeIron plugin filter folder filters/fi_mod_readability
with composer.json
present and run:
$ composer install
Install Readability.php when using docker-compose:
In your docker-compose.yaml ensure your version is set to at least 3.6
version: '3.6'
Install php81-phar in the app container
sudo docker-compose exec app apk add php81-phar
Download the latest composer.phar
sudo docker-compose exec --workdir /var/www/html/tt-rss/plugins.local/feediron/filters/fi_mod_readability/ --user app app php81 -r "copy('https://getcomposer.org/download/latest-stable/composer.phar', 'composer.phar');"
Run the composer install
sudo docker-compose exec --workdir /var/www/html/tt-rss/plugins.local/feediron/filters/fi_mod_readability/ --user app app php81 -d extension=phar.so ./composer.phar install
After install in the TinyTinyRSS preferences menu you will find new tab called FeedIron. Under this tab you will have access to the FeedIron Configuration tab and the FeedIron Testing tab.
The configuration for FeedIron is done in JSON format and will be displayed in the large configuration text field. Use the large field to enter/modify the configuration data and click the Save button to store it.
Additionally you can load predefined rules/recipes submitted by the community or export your own rules. To submit your own rules/recipes you can submit a pull request through Github in the recipes repository.
There are Filters, general options and global options. Note: The rule type
Must be defined and has to be one of the following: xpath
, split
or readability
.
The best way to understand Feediron is to read the Full configuration example
A Basic Configuration must define:
example.com
|
Delimiter. e.g. "example.com|example.net"
<link>
or <author>
tag of the RSS feed item.
<link>
takes precedence over the <author>
<author>
based configurations will NOT automatically show in the Testing Tab"type":"xpath"
"xpath":"div[@id='content']"
or the array "xpath": [ "div[@id='article']", "div[@id='footer']"]
Example:
{
"example.com":{
"type":"xpath",
"xpath":"div[@id='content']"
},
"secondexample.com":{
"type":"xpath",
"xpath": [
"div[@id='article']",
"div[@id='footer']"
]
}
}
Note: Take care while values are separated by a ,
(comma) using a trailing ,
(comma) is not valid.
"type":"xpath"
type":"split"
"type":"readability"
"tags":"{options}"
FeedIron can fetch text from a page and save them as article tags. This can be used to improve the filtering options found in TT-RSS. Note: The Tag filter can use all the options available to the xpath filter and the modify option.
The order of execution for tags is:
Usage Example:
"tags": {
"type": "xpath",
"replace-tags":true,
"xpath": [
"p[@class='topics']"
],
"split":",",
"cleanup": [
"strong"
],
"modify":[
{
"type": "replace",
"search": "-",
"replace": " "
}
]
}
"type": "xpath"
"xpath":"xpath str" / [ "array of xpath str" ]
"tags":{
"type":"xpath",
"xpath":"p[@class='topics']"
}
"type": "regex"
Uses PHP preg_match() in order to find and return a string from the article. Requires at least on pattern.
"pattern": "/regex str/" / [ "/array of regex str/" ]
"tags":{
"type":"regex",
"pattern": "/The quick.*fox jumped/"
}
"index":int
Specifies the number of the entry in article to return.
Default value 1
"tags":{
"type":"regex",
"pattern": "/The quick.*fox jumped/",
"index": 2
}
"type": "search"
Search article using regex, if found it returns a pre-defined matching tag.
"tags":{
"type":"search",
"pattern": [
"/feediron/",
"/ttrss/"
],
"match": [
"FeedIron is here",
"TT-RSS is here"
]
}
"pattern": "/regex str/" / [ "/array of regex str/" ]
Must have corresponding match entries
"match": "str" / [ "array of str" ]
Must have corresponding pattern entries. This can be inverted using the !
symbol at the beginning of the match entry to return if NO match is found
"tags":{
"type":"search",
"pattern": [
"/feediron/",
"/ttrss/"
],
"match": [
"!FeedIron is not here",
"TT-RSS is here"
]
}
"replace-tags":bool
Default value false
Replace the article tags with fetched ones. By default tags are merged.
"tags":{
"type":"xpath",
"xpath":"p[@class='topics']",
"replace-tags": true
}
"split":"str"
String - Splits tags using a delimiter
"tags":{
"type":"xpath",
"xpath":"p[@class='topics']",
"split":"-"
}
Input: Tag1-Tag2-Tag3
Result: Tag1, Tag2, Tag3
"reformat":[array of options]
"modify":[array of options]
"force_charset":"charset"
"force_unicode":bool
"tidy-source":bool
"tidy":bool
"reformat":[array of options]
"modify":[array of options]
Reformat is an array of formatting rules for the url of the full article. The rules are applied before the full article is fetched. Where as Modify is an array of formatting rules for article using the same options.
"type":"regex"
regex takes a regex in an option called pattern and the replacement in replace. For details see preg_replace in the PHP documentation.
"pattern":"/regex str/"
A regular expression or regex string.
"replace":"str"
String to replace regex match with
"count":"int"
Optional integer defining the number of replacements done.
Example reformat golem.de url:
"golem0Bde0C":{
"type":"xpath",
"xpath":"article",
"reformat": [
{
"type": "regex",
"pattern": "/(?:[a-z0-9A-Z\\/.\\:]*?)golem0Bde0C(.*)0Erss0Bhtml\\/story01.htm/",
"replace": "http://www.golem.de/$1.html"
}
]
}
"type":"replace"
Uses the PHP function str_replace, which takes either a string or an array as search and replace value.
"type":"search str" / [ "array of search str" ]
String to search for replacement. If an array the order will match the replacement string order
"replace":"str" / [ "array of str" ]
String to replace search match with. Array must have the same number of options as the search array.
Example search and replace instances of srcset with null:
{
"type": "xpath",
"xpath": "img",
"modify": [
{
"type": "replace",
"search": "srcset",
"replace": "null"
}
]
}
Example search and replace h1 and h2 tags with h3 tags:
"example.tld":{
"type": "xpath",
"xpath": "article",
"modify": [
{
"type": "replace",
"search": [
"<h1>",
"<\/h1>",
"<h2>",
"<\/h2>"
],
"replace": [
"<h3>",
"<\/h3>",
"<h3>",
"<\/h3>"
]
}
]
}
"multipage":{[options]}
This option indicates that the article is split into two or more pages (eventually). FeedIron can combine all the parts into the content of the article.
You have to specify a xpath
which identifies the links (<a>) to the pages.
"example.com":{
"type": "xpath",
"multipage": {
"xpath": "a[contains(@data-ga-category,'Pagination') and text() = 'Next']",
"append": true,
"recursive": true
}
}
"append":bool
Boolean - If false, only the links are used and the original link is ignored else the links found using the xpath expression are added to the original page link.
"recursive":bool
Boolean - If true this option to parses every following page for more links. To avoid infinite loops the fetching stops if an url is added twice.
"pages":int
Integer - Maximum number of pages to recursively fetch. Default value 10
"force_charset":"charset"
force_charset allows to override automatic charset detection. If it is omitted, the charset will be parsed from the HTTP headers or loadHTML() will decide on its own.
"example.tld":{
"type": "xpath",
"xpath": "article",
"force_charset": "utf-8"
}
"force_unicode":bool
force_unicode performs a UTF-8 character set conversion on the html via iconv.
"example.tld":{
"type": "xpath",
"xpath": "article",
"force_unicode": true
}
"tidy-source":bool
Optionally installed php-tidy. Default - false
Use tidy::cleanrepair to attempt to fix fetched article source, useful for improperly closed tags interfering with xpath queries.
Note: If Character set of page cannot be detected tidy will not be executed. In this case usage of force_charset would be required.
"tidy":bool
Optionally installed php-tidy. Default - true
Use tidy::cleanrepair to attempt to fix modified article, useful for unclosed tags such as iframes.
Note: If Character set of page cannot be detected tidy will not be executed. In this case usage of force_charset would be required.
"debug":bool
Activate debugging information (Note: not for testing tab). Default - false
At the moment there is not that much debug information to be activated, this option must be places at the same level as the site configs.
Example:
{
"example.com":{
"type":"xpath",
"xpath":"div[@id='content']"
},
"secondexample.com":{
"type":"xpath",
"xpath": [
"div[@id='article']",
"div[@id='footer']"
]
},
"debug":false
}
"tidy-source":bool
Allows you to disable globally the use of php-tidy on the fetched html source. tidy-source. Default - true
Uses tidy::cleanrepair to attempt to fix fetched article source, useful for improperly closed tags interfering with xpath queries.
Example:
{
"example.com":{
"type":"xpath",
"xpath":"div[@id='content']"
},
"secondexample.com":{
"type":"xpath",
"xpath": [
"div[@id='article']",
"div[@id='footer']"
]
},
"tidy-source":false
}
The Testing tab is where you can debug/create your configurations and view a preview of the filter results. The configuration in the testing tab is identical to the configuration tab while omitting the domain/url.
{
"type":"xpath",
"xpath":"article"
}
Not
"example.tld":{
"type":"xpath",
"xpath":"article"
}
{
"heise.de": {
"name": "Heise Newsticker",
"url": "http://heise.de/ticker/",
"type": "xpath",
"xpath": "div[@class='meldung_wrapper']",
"force_charset": "utf-8"
},
"berlin.de/polizei": {
"type": "xpath",
"xpath": "div[@class='bacontent']"
},
"n24.de": {
"type": "readability",
},
"www.dorkly.com": {
"type": "xpath",
"multipage": {
"xpath": "a[contains(@data-ga-category,'Pagination') and text() = 'Next']",
"append": true,
"recursive": true
},
"xpath": "div[contains(@class,'post-content')]"
},
"golem0Bde0C": {
"type": "xpath",
"xpath": "article",
"multipage": {
"xpath": "ol/li/a[contains(@id, 'atoc_')]",
"append": true
},
"reformat": [
{
"type": "regex",
"pattern": "/(?:[a-z0-9A-Z\\/.\\:]*?)golem0Bde0C(.*)0Erss0Bhtml\\/story01.htm/",
"replace": "http://www.golem.de/$1.html"
},
{
"type": "replace",
"search": [
"0A",
"0C",
"0B",
"0E"
],
"replace": [
"0",
"/",
".",
"-"
]
}
]
},
"oatmeal": {
"type": "xpath",
"xpath": "div[@id='comic']"
},
"blog.beetlebum.de": {
"type": "xpath",
"xpath": "div[@class='entry-content']",
"cleanup": [ "header", "footer" ]
},
"sueddeutsche.de": {
"type": "xpath",
"xpath": [
"h2/strong",
"section[contains(@class,'authors')]"
],
"join_element": "<p>",
"cleanup": [
"script"
]
},
"www.spiegel.de": {
"type": "split",
"steps": [
{
"after": "/article-section clearfix\"\\W*>/",
"before": "/<div\\W*class=\"module-box home-link-box/"
},
{
"before": "/<div\\W*class=\"btwBarInArticles/"
}
],
"cleanup" : [ "~<script([^<]|<(?!/script))*</script>~msi" ],
"force_unicode": true
},
"debug": false
}
Thanks to mbirth who wrote af_feedmod who gave me a starting base.