Demonoid.parser

class demonoid.parser.Parser[source]

The Parser is a static class, responsible for parsing HTML elements and text. It shouldn’t be used directly.

Attr:TORRENTS_LIST_XPATH is a XPATH expression used to capture the torrent lists in range [4:-3] from the HTML parent table element. However last() - 3 will be handled with Python slicing in get_torrents_rows method as it’s more DRY than writing a really longer XPATH expression.
Attr:DATE_TAG_XPATH is a XPATH expression used to capture the HTML parent tr‘s td element holding the date row.
Attr:DATE_STRPTIME_FORMAT is a datetime-compliant string used to parse the DATE_TAG’s date text.
Attr:FIRST_ROW_XPATH is a XPATH used to capture the first torrent’s table row’s id, title, tracked_by, category_url and torrent_url (torrents consist of 2 table rows).
static get_date_td(rows)[source]

Static method that gets the torrent data element containing the torrents’ date. Executes DATE_TAG_XPATH on given dom.

Parameters:lxml.HtmlElement rows (list) – the rows to search in
Returns:table data containg torrents’ date
Return type:lxml.HtmlElement
static get_params(url, ignore_empty=False)[source]

Static method that parses a given url and retrieves url‘s parameters. Could also ignore empty value parameters. Handles parameters-only urls as q=banana&peel=false.

Parameters:
  • url (str) – url to parse
  • ignore_empty (bool) – ignore empty value parameter or not
Returns:

dictionary of params and their values

Return type:

dict

static get_torrents_rows(dom)[source]

Static method that gets the torrent list rows from the given dom by running TORRENTS_LIST_XPATH and trims the last() - 3 non-torrent rows, which are actually sorting preferences rows.

Parameters:dom (lxml.HtmlElement) – the dom to operate on
Returns:returns torrent rows
Return type:list lxml.HtmlElement
static is_language(params)[source]

Static method that given a dict of url parameters, casts parameters’ language value to int and compares it to default search query value - Language.ALL.

Parameters:params (dict) – parameters to get language value from
Returns:if given parameters’ language is different from Language.ALL or not
Return type:bool
static is_quality(params)[source]

Static method that given a dict of url parameters, casts parameters’ quality value to int and compares it to default search query value - Quality.ALL.

Parameters:params (dict) – parameters to get quality value from
Returns:if given parameters’ quality is different from Quality.ALL or not
Return type:bool
static is_subcategory(params)[source]

Static method that given a dict of url parameters, casts parameters’ subcategory value to int and compares it to default search query value - Category.ALL. Which is also the default ALL search query value for all subcategories.

Parameters:params (dict) – parameters to get subcategory value from
Returns:if given parameters’ subcategory is different from Category.ALL or not
Return type:bool
static parse_date(table_data)[source]

Static method that parses a given table data element with Url.DATE_STRPTIME_FORMAT and creates a date object from td’s text contnet.

Parameters:table_data (lxml.HtmlElement) – table_data tag to parse
Returns:date object from td’s text date
Return type:datetime.date
static parse_first_row(row, url_instance)[source]

Static method that parses a given table row element by executing Parser.FIRST_ROW_XPATH and scrapping torrent’s id, title, tracked by status, category url and torrent url. Used specifically with a torrent’s first table row.

Parameters:
  • row (lxml.HtmlElement) – row to parse
  • url_instance (urls.Url) – Url used to combine base url’s with scrapped links from tr
Returns:

scrapped id, title, tracked by status, category url and torrent url

Return type:

list

static parse_second_row(row, url)[source]

Static method that parses a given table row element by using helper methods Parser.parse_category_subcategory_and_or_quality, Parser.parse_torrent_link and scrapping torrent’s category, subcategory, quality, language, user, user url, torrent link, size, comments, times completed, seeders and leechers. Used specifically with a torrent’s second table row.

Parameters:
  • row (lxml.HtmlElement) – row to parse
  • url_instance (urls.Url) – Url used to combine base url’s with scrapped links from tr
Returns:

scrapped category, subcategory, quality, language, user, user url, torrent link, size, comments, times completed, seeders and leechers

Return type:

list

Static method that parses list of table data, finds all anchor elements and gets the torrent url. However the torrent url is usually hidden behind a fake spam ad url, this is handled.

Parameters:lxml.HtmlElement table_data (list) – table_data tag to parse
Returns:torrent url from anchor (link) element
Return type:str
static parse_torrent_properties(table_datas)[source]

Static method that parses a given list of table data elements and using helper methods Parser.is_subcategory, Parser.is_quality, Parser.is_language, collects torrent properties.

Parameters:lxml.HtmlElement table_datas (list) – table_datas to parse
Returns:identified category, subcategory, quality and languages.
Return type:dict