scrachy.http_.SeleniumRequest

class scrachy.http_.SeleniumRequest(*args: Any, **kwargs: Any)[source]

Bases: Request

A subclas of scrapy.http.Request that provides extra information for downloading pages using Selenium.

Based off the code from Scrapy-Selenium

A new SeleniumRequest.

Parameters:
  • wait_timeout – The number of seconds to wait before accessing the data.

  • wait_until – One of the “selenium.webdriver.support.expected_conditions”. The response will be returned until the given condition is fulfilled.

  • screenshot – If True, a screenshot of the page will be taken and the data of the screenshot will be returned in the response “meta” attribute.

  • script_executor – A function that takes a webdriver and a response as its parameters and optionally returns a list of new response objects as a side effect of its actions (e.g., executing arbitrary javascript code on the page). Any returned responses will be returned in the request.meta attribute with the key script_result. Note that the returned responses will not be further processed by any other middleware.

__init__(wait_timeout: float | None = None, wait_until: WaitCondition | None = None, screenshot: bool = False, script_executor: ScriptExecutor | None = None, *args, **kwargs)[source]

A new SeleniumRequest.

Parameters:
  • wait_timeout – The number of seconds to wait before accessing the data.

  • wait_until – One of the “selenium.webdriver.support.expected_conditions”. The response will be returned until the given condition is fulfilled.

  • screenshot – If True, a screenshot of the page will be taken and the data of the screenshot will be returned in the response “meta” attribute.

  • script_executor – A function that takes a webdriver and a response as its parameters and optionally returns a list of new response objects as a side effect of its actions (e.g., executing arbitrary javascript code on the page). Any returned responses will be returned in the request.meta attribute with the key script_result. Note that the returned responses will not be further processed by any other middleware.

Methods

__init__([wait_timeout, wait_until, ...])

A new SeleniumRequest.

copy()

from_curl(curl_command[, ignore_unknown_options])

Create a Request object from a string containing a cURL command.

replace(*args, **kwargs)

Create a new Request with the same attributes except for those given new values

to_dict(*[, spider])

Return a dictionary containing the Request's data.

Attributes

attributes

A tuple of str objects containing the name of all public attributes of the class that are also keyword parameters of the __init__ method.

body

cb_kwargs

encoding

meta

url

attributes: Tuple[str, ...] = ('url', 'callback', 'method', 'headers', 'body', 'cookies', 'meta', 'encoding', 'priority', 'dont_filter', 'errback', 'flags', 'cb_kwargs')

A tuple of str objects containing the name of all public attributes of the class that are also keyword parameters of the __init__ method.

Currently used by Request.replace(), Request.to_dict() and request_from_dict().

classmethod from_curl(curl_command: str, ignore_unknown_options: bool = True, **kwargs) RequestTypeVar

Create a Request object from a string containing a cURL command. It populates the HTTP method, the URL, the headers, the cookies and the body. It accepts the same arguments as the Request class, taking preference and overriding the values of the same arguments contained in the cURL command.

Unrecognized options are ignored by default. To raise an error when finding unknown options call this method by passing ignore_unknown_options=False.

Caution

Using from_curl() from Request subclasses, such as JSONRequest, or XmlRpcRequest, as well as having downloader middlewares and spider middlewares enabled, such as DefaultHeadersMiddleware, UserAgentMiddleware, or HttpCompressionMiddleware, may modify the Request object.

To translate a cURL command into a Scrapy request, you may use curl2scrapy.

replace(*args, **kwargs) Request

Create a new Request with the same attributes except for those given new values

to_dict(*, spider: Spider | None = None) dict

Return a dictionary containing the Request’s data.

Use request_from_dict() to convert back into a Request object.

If a spider is given, this method will try to find out the name of the spider methods used as callback and errback and include them in the output dict, raising an exception if they cannot be found.