scrachy.http_.SeleniumRequest
- class scrachy.http_.SeleniumRequest(*args: Any, **kwargs: Any)[source]
Bases:
RequestA subclas of
scrapy.http.Requestthat provides extra information for downloading pages using Selenium.Based off the code from Scrapy-Selenium
A new
SeleniumRequest.- Parameters:
wait_timeout – The number of seconds to wait before accessing the data.
wait_until – One of the “selenium.webdriver.support.expected_conditions”. The response will be returned until the given condition is fulfilled.
screenshot – If
True, a screenshot of the page will be taken and the data of the screenshot will be returned in the response “meta” attribute.script_executor – A function that takes a webdriver and a response as its parameters and optionally returns a list of new response objects as a side effect of its actions (e.g., executing arbitrary javascript code on the page). Any returned responses will be returned in the
request.metaattribute with the keyscript_result. Note that the returned responses will not be further processed by any other middleware.
- __init__(wait_timeout: float | None = None, wait_until: WaitCondition | None = None, screenshot: bool = False, script_executor: ScriptExecutor | None = None, *args, **kwargs)[source]
A new
SeleniumRequest.- Parameters:
wait_timeout – The number of seconds to wait before accessing the data.
wait_until – One of the “selenium.webdriver.support.expected_conditions”. The response will be returned until the given condition is fulfilled.
screenshot – If
True, a screenshot of the page will be taken and the data of the screenshot will be returned in the response “meta” attribute.script_executor – A function that takes a webdriver and a response as its parameters and optionally returns a list of new response objects as a side effect of its actions (e.g., executing arbitrary javascript code on the page). Any returned responses will be returned in the
request.metaattribute with the keyscript_result. Note that the returned responses will not be further processed by any other middleware.
Methods
__init__([wait_timeout, wait_until, ...])A new
SeleniumRequest.copy()from_curl(curl_command[, ignore_unknown_options])Create a Request object from a string containing a cURL command.
replace(*args, **kwargs)Create a new Request with the same attributes except for those given new values
to_dict(*[, spider])Return a dictionary containing the Request's data.
Attributes
A tuple of
strobjects containing the name of all public attributes of the class that are also keyword parameters of the__init__method.bodycb_kwargsencodingmetaurl- attributes: Tuple[str, ...] = ('url', 'callback', 'method', 'headers', 'body', 'cookies', 'meta', 'encoding', 'priority', 'dont_filter', 'errback', 'flags', 'cb_kwargs')
A tuple of
strobjects containing the name of all public attributes of the class that are also keyword parameters of the__init__method.Currently used by
Request.replace(),Request.to_dict()andrequest_from_dict().
- classmethod from_curl(curl_command: str, ignore_unknown_options: bool = True, **kwargs) RequestTypeVar
Create a Request object from a string containing a cURL command. It populates the HTTP method, the URL, the headers, the cookies and the body. It accepts the same arguments as the
Requestclass, taking preference and overriding the values of the same arguments contained in the cURL command.Unrecognized options are ignored by default. To raise an error when finding unknown options call this method by passing
ignore_unknown_options=False.Caution
Using
from_curl()fromRequestsubclasses, such asJSONRequest, orXmlRpcRequest, as well as having downloader middlewares and spider middlewares enabled, such asDefaultHeadersMiddleware,UserAgentMiddleware, orHttpCompressionMiddleware, may modify theRequestobject.To translate a cURL command into a Scrapy request, you may use curl2scrapy.
- replace(*args, **kwargs) Request
Create a new Request with the same attributes except for those given new values
- to_dict(*, spider: Spider | None = None) dict
Return a dictionary containing the Request’s data.
Use
request_from_dict()to convert back into aRequestobject.If a spider is given, this method will try to find out the name of the spider methods used as callback and errback and include them in the output dict, raising an exception if they cannot be found.