Demonoid.parser¶

class demonoid.parser.Parser[source]¶

The Parser is a static class, responsible for parsing HTML elements and text. It shouldn’t be used directly.

Attr:	TORRENTS_LIST_XPATH is a XPATH expression used to capture the torrent lists in range [4:-3] from the HTML parent table element. However last() - 3 will be handled with Python slicing in get_torrents_rows method as it’s more DRY than writing a really longer XPATH expression.
Attr:	DATE_TAG_XPATH is a XPATH expression used to capture the HTML parent tr‘s td element holding the date row.
Attr:	DATE_STRPTIME_FORMAT is a datetime-compliant string used to parse the DATE_TAG’s date text.
Attr:	FIRST_ROW_XPATH is a XPATH used to capture the first torrent’s table row’s id, title, tracked_by, category_url and torrent_url (torrents consist of 2 table rows).

static get_date_td(rows)[source]¶

Static method that gets the torrent data element containing the torrents’ date. Executes DATE_TAG_XPATH on given dom.

Parameters:	lxml.HtmlElement rows (list) – the rows to search in
Returns:	table data containg torrents’ date
Return type:	lxml.HtmlElement

static get_params(url, ignore_empty=False)[source]¶

Static method that parses a given url and retrieves url‘s parameters. Could also ignore empty value parameters. Handles parameters-only urls as q=banana&peel=false.

Parameters:	url (str) – url to parse ignore_empty (bool) – ignore empty value parameter or not
Returns:	dictionary of params and their values
Return type:	dict

static get_torrents_rows(dom)[source]¶

Static method that gets the torrent list rows from the given dom by running TORRENTS_LIST_XPATH and trims the last() - 3 non-torrent rows, which are actually sorting preferences rows.

Parameters:	dom (lxml.HtmlElement) – the dom to operate on
Returns:	returns torrent rows
Return type:	list lxml.HtmlElement

static is_language(params)[source]¶

Static method that given a dict of url parameters, casts parameters’ language value to int and compares it to default search query value - Language.ALL.

Parameters:	params (dict) – parameters to get language value from
Returns:	if given parameters’ language is different from Language.ALL or not
Return type:	bool

static is_quality(params)[source]¶

Static method that given a dict of url parameters, casts parameters’ quality value to int and compares it to default search query value - Quality.ALL.

Parameters:	params (dict) – parameters to get quality value from
Returns:	if given parameters’ quality is different from Quality.ALL or not
Return type:	bool

static is_subcategory(params)[source]¶

Static method that given a dict of url parameters, casts parameters’ subcategory value to int and compares it to default search query value - Category.ALL. Which is also the default ALL search query value for all subcategories.

Parameters:	params (dict) – parameters to get subcategory value from
Returns:	if given parameters’ subcategory is different from Category.ALL or not
Return type:	bool

static parse_date(table_data)[source]¶

Static method that parses a given table data element with Url.DATE_STRPTIME_FORMAT and creates a date object from td’s text contnet.

Parameters:	table_data (lxml.HtmlElement) – table_data tag to parse
Returns:	date object from td’s text date
Return type:	datetime.date

static parse_first_row(row, url_instance)[source]¶

Static method that parses a given table row element by executing Parser.FIRST_ROW_XPATH and scrapping torrent’s id, title, tracked by status, category url and torrent url. Used specifically with a torrent’s first table row.

Parameters:	row (lxml.HtmlElement) – row to parse url_instance (urls.Url) – Url used to combine base url’s with scrapped links from tr
Returns:	scrapped id, title, tracked by status, category url and torrent url
Return type:	list

static parse_second_row(row, url)[source]¶

Static method that parses a given table row element by using helper methods Parser.parse_category_subcategory_and_or_quality, Parser.parse_torrent_link and scrapping torrent’s category, subcategory, quality, language, user, user url, torrent link, size, comments, times completed, seeders and leechers. Used specifically with a torrent’s second table row.

Parameters:	row (lxml.HtmlElement) – row to parse url_instance (urls.Url) – Url used to combine base url’s with scrapped links from tr
Returns:	scrapped category, subcategory, quality, language, user, user url, torrent link, size, comments, times completed, seeders and leechers
Return type:	list

static parse_torrent_link(table_data)[source]¶

Static method that parses list of table data, finds all anchor elements and gets the torrent url. However the torrent url is usually hidden behind a fake spam ad url, this is handled.

Parameters:	lxml.HtmlElement table_data (list) – table_data tag to parse
Returns:	torrent url from anchor (link) element
Return type:	str

static parse_torrent_properties(table_datas)[source]¶

Static method that parses a given list of table data elements and using helper methods Parser.is_subcategory, Parser.is_quality, Parser.is_language, collects torrent properties.

Parameters:	lxml.HtmlElement table_datas (list) – table_datas to parse
Returns:	identified category, subcategory, quality and languages.
Return type:	dict