scrachy.http_.SeleniumRequest
- class scrachy.http_.SeleniumRequest(*args: Any, **kwargs: Any)[source]
Bases:
Request
A subclas of
scrapy.http.Request
that provides extra information for downloading pages using Selenium.Based off the code from Scrapy-Selenium
A new
SeleniumRequest
.- Parameters:
wait_timeout – The number of seconds to wait before accessing the data.
wait_until – One of the “selenium.webdriver.support.expected_conditions”. The response will be returned until the given condition is fulfilled.
screenshot – If
True
, a screenshot of the page will be taken and the data of the screenshot will be returned in the response “meta” attribute.script_executor – A function that takes a webdriver and a response as its parameters and optionally returns a list of new response objects as a side effect of its actions (e.g., executing arbitrary javascript code on the page). Any returned responses will be returned in the
request.meta
attribute with the keyscript_result
. Note that the returned responses will not be further processed by any other middleware.
- __init__(wait_timeout: float | None = None, wait_until: WaitCondition | None = None, screenshot: bool = False, script_executor: ScriptExecutor | None = None, *args, **kwargs)[source]
A new
SeleniumRequest
.- Parameters:
wait_timeout – The number of seconds to wait before accessing the data.
wait_until – One of the “selenium.webdriver.support.expected_conditions”. The response will be returned until the given condition is fulfilled.
screenshot – If
True
, a screenshot of the page will be taken and the data of the screenshot will be returned in the response “meta” attribute.script_executor – A function that takes a webdriver and a response as its parameters and optionally returns a list of new response objects as a side effect of its actions (e.g., executing arbitrary javascript code on the page). Any returned responses will be returned in the
request.meta
attribute with the keyscript_result
. Note that the returned responses will not be further processed by any other middleware.
Methods
__init__
([wait_timeout, wait_until, ...])A new
SeleniumRequest
.copy
()from_curl
(curl_command[, ignore_unknown_options])Create a Request object from a string containing a cURL command.
replace
(*args, **kwargs)Create a new Request with the same attributes except for those given new values
to_dict
(*[, spider])Return a dictionary containing the Request's data.
Attributes
A tuple of
str
objects containing the name of all public attributes of the class that are also keyword parameters of the__init__
method.body
cb_kwargs
encoding
meta
url
- attributes: Tuple[str, ...] = ('url', 'callback', 'method', 'headers', 'body', 'cookies', 'meta', 'encoding', 'priority', 'dont_filter', 'errback', 'flags', 'cb_kwargs')
A tuple of
str
objects containing the name of all public attributes of the class that are also keyword parameters of the__init__
method.Currently used by
Request.replace()
,Request.to_dict()
andrequest_from_dict()
.
- classmethod from_curl(curl_command: str, ignore_unknown_options: bool = True, **kwargs) RequestTypeVar
Create a Request object from a string containing a cURL command. It populates the HTTP method, the URL, the headers, the cookies and the body. It accepts the same arguments as the
Request
class, taking preference and overriding the values of the same arguments contained in the cURL command.Unrecognized options are ignored by default. To raise an error when finding unknown options call this method by passing
ignore_unknown_options=False
.Caution
Using
from_curl()
fromRequest
subclasses, such asJSONRequest
, orXmlRpcRequest
, as well as having downloader middlewares and spider middlewares enabled, such asDefaultHeadersMiddleware
,UserAgentMiddleware
, orHttpCompressionMiddleware
, may modify theRequest
object.To translate a cURL command into a Scrapy request, you may use curl2scrapy.
- replace(*args, **kwargs) Request
Create a new Request with the same attributes except for those given new values
- to_dict(*, spider: Spider | None = None) dict
Return a dictionary containing the Request’s data.
Use
request_from_dict()
to convert back into aRequest
object.If a spider is given, this method will try to find out the name of the spider methods used as callback and errback and include them in the output dict, raising an exception if they cannot be found.