Demonoid.parser¶
- class demonoid.parser.Parser[source]¶
The Parser is a static class, responsible for parsing HTML elements and text. It shouldn’t be used directly.
Attr: TORRENTS_LIST_XPATH is a XPATH expression used to capture the torrent lists in range [4:-3] from the HTML parent table element. However last() - 3 will be handled with Python slicing in get_torrents_rows method as it’s more DRY than writing a really longer XPATH expression. Attr: DATE_TAG_XPATH is a XPATH expression used to capture the HTML parent tr‘s td element holding the date row. Attr: DATE_STRPTIME_FORMAT is a datetime-compliant string used to parse the DATE_TAG’s date text. Attr: FIRST_ROW_XPATH is a XPATH used to capture the first torrent’s table row’s id, title, tracked_by, category_url and torrent_url (torrents consist of 2 table rows). - static get_date_td(rows)[source]¶
Static method that gets the torrent data element containing the torrents’ date. Executes DATE_TAG_XPATH on given dom.
Parameters: lxml.HtmlElement rows (list) – the rows to search in Returns: table data containg torrents’ date Return type: lxml.HtmlElement
- static get_params(url, ignore_empty=False)[source]¶
Static method that parses a given url and retrieves url‘s parameters. Could also ignore empty value parameters. Handles parameters-only urls as q=banana&peel=false.
Parameters: - url (str) – url to parse
- ignore_empty (bool) – ignore empty value parameter or not
Returns: dictionary of params and their values
Return type: dict
- static get_torrents_rows(dom)[source]¶
Static method that gets the torrent list rows from the given dom by running TORRENTS_LIST_XPATH and trims the last() - 3 non-torrent rows, which are actually sorting preferences rows.
Parameters: dom (lxml.HtmlElement) – the dom to operate on Returns: returns torrent rows Return type: list lxml.HtmlElement
- static is_language(params)[source]¶
Static method that given a dict of url parameters, casts parameters’ language value to int and compares it to default search query value - Language.ALL.
Parameters: params (dict) – parameters to get language value from Returns: if given parameters’ language is different from Language.ALL or not Return type: bool
- static is_quality(params)[source]¶
Static method that given a dict of url parameters, casts parameters’ quality value to int and compares it to default search query value - Quality.ALL.
Parameters: params (dict) – parameters to get quality value from Returns: if given parameters’ quality is different from Quality.ALL or not Return type: bool
- static is_subcategory(params)[source]¶
Static method that given a dict of url parameters, casts parameters’ subcategory value to int and compares it to default search query value - Category.ALL. Which is also the default ALL search query value for all subcategories.
Parameters: params (dict) – parameters to get subcategory value from Returns: if given parameters’ subcategory is different from Category.ALL or not Return type: bool
- static parse_date(table_data)[source]¶
Static method that parses a given table data element with Url.DATE_STRPTIME_FORMAT and creates a date object from td’s text contnet.
Parameters: table_data (lxml.HtmlElement) – table_data tag to parse Returns: date object from td’s text date Return type: datetime.date
- static parse_first_row(row, url_instance)[source]¶
Static method that parses a given table row element by executing Parser.FIRST_ROW_XPATH and scrapping torrent’s id, title, tracked by status, category url and torrent url. Used specifically with a torrent’s first table row.
Parameters: - row (lxml.HtmlElement) – row to parse
- url_instance (urls.Url) – Url used to combine base url’s with scrapped links from tr
Returns: scrapped id, title, tracked by status, category url and torrent url
Return type: list
- static parse_second_row(row, url)[source]¶
Static method that parses a given table row element by using helper methods Parser.parse_category_subcategory_and_or_quality, Parser.parse_torrent_link and scrapping torrent’s category, subcategory, quality, language, user, user url, torrent link, size, comments, times completed, seeders and leechers. Used specifically with a torrent’s second table row.
Parameters: - row (lxml.HtmlElement) – row to parse
- url_instance (urls.Url) – Url used to combine base url’s with scrapped links from tr
Returns: scrapped category, subcategory, quality, language, user, user url, torrent link, size, comments, times completed, seeders and leechers
Return type: list
- static parse_torrent_link(table_data)[source]¶
Static method that parses list of table data, finds all anchor elements and gets the torrent url. However the torrent url is usually hidden behind a fake spam ad url, this is handled.
Parameters: lxml.HtmlElement table_data (list) – table_data tag to parse Returns: torrent url from anchor (link) element Return type: str
- static parse_torrent_properties(table_datas)[source]¶
Static method that parses a given list of table data elements and using helper methods Parser.is_subcategory, Parser.is_quality, Parser.is_language, collects torrent properties.
Parameters: lxml.HtmlElement table_datas (list) – table_datas to parse Returns: identified category, subcategory, quality and languages. Return type: dict