scrachy.http_.CachedHtmlResponse
- class scrachy.http_.CachedHtmlResponse(*args: Any, **kwargs: Any)[source]
Bases:
CachedResponseMixin
,HtmlResponse
A subclass of
scrapy.http.HttpResponse
that contains a subset of the extra information stored in the cache.- Parameters:
scrape_timestamp – The most recent date the request was scraped.
body_number_of_bytes – The total number of bytes of the downloaded html.
text_number_of_bytes – The number of bytes in the extracted plain text.
body_text – The text extracted from the HTML.
- __init__(*args, **kwargs)[source]
A subclass of
scrapy.http.HttpResponse
that contains a subset of the extra information stored in the cache.- Parameters:
scrape_timestamp – The most recent date the request was scraped.
body_number_of_bytes – The total number of bytes of the downloaded html.
text_number_of_bytes – The number of bytes in the extracted plain text.
body_text – The text extracted from the HTML.
Methods
__init__
(*args, **kwargs)A subclass of
scrapy.http.HttpResponse
that contains a subset of the extra information stored in the cache.copy
()Return a copy of this Response
css
(query)Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).
follow
(url[, callback, method, headers, ...])Return a
Request
instance to follow a linkurl
.follow_all
([urls, callback, method, ...])A generator that produces
Request
instances to follow all links inurls
.jmespath
(query, **kwargs)Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).
json
()New in version 2.2.
replace
(*args, **kwargs)Create a new Response with the same attributes except for those given new values
urljoin
(url)Join this Response's url with a possible relative url to form an absolute interpretation of the latter.
xpath
(query, **kwargs)Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).
Attributes
A tuple of
str
objects containing the name of all public attributes of the class that are also keyword parameters of the__init__
method.body
cb_kwargs
encoding
meta
selector
Body as unicode
url
- attributes: Tuple[str, ...] = ('url', 'status', 'headers', 'body', 'flags', 'request', 'certificate', 'ip_address', 'protocol', 'encoding')
A tuple of
str
objects containing the name of all public attributes of the class that are also keyword parameters of the__init__
method.Currently used by
Response.replace()
.
- copy()
Return a copy of this Response
- css(query)
Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).
- follow(url, callback=None, method='GET', headers=None, body=None, cookies=None, meta=None, encoding=None, priority=0, dont_filter=False, errback=None, cb_kwargs=None, flags=None) Request
Return a
Request
instance to follow a linkurl
. It accepts the same arguments asRequest.__init__
method, buturl
can be not only an absolute URL, but alsoa relative URL
a
Link
object, e.g. the result of Link Extractorsa
Selector
object for a<link>
or<a>
element, e.g.response.css('a.my_link')[0]
an attribute
Selector
(not SelectorList), e.g.response.css('a::attr(href)')[0]
orresponse.xpath('//img/@src')[0]
See A shortcut for creating Requests for usage examples.
- follow_all(urls=None, callback=None, method='GET', headers=None, body=None, cookies=None, meta=None, encoding=None, priority=0, dont_filter=False, errback=None, cb_kwargs=None, flags=None, css=None, xpath=None) Generator[Request, None, None]
A generator that produces
Request
instances to follow all links inurls
. It accepts the same arguments as theRequest
’s__init__
method, except that eachurls
element does not need to be an absolute URL, it can be any of the following:a relative URL
a
Link
object, e.g. the result of Link Extractorsa
Selector
object for a<link>
or<a>
element, e.g.response.css('a.my_link')[0]
an attribute
Selector
(not SelectorList), e.g.response.css('a::attr(href)')[0]
orresponse.xpath('//img/@src')[0]
In addition,
css
andxpath
arguments are accepted to perform the link extraction within thefollow_all
method (only one ofurls
,css
andxpath
is accepted).Note that when passing a
SelectorList
as argument for theurls
parameter or using thecss
orxpath
parameters, this method will not produce requests for selectors from which links cannot be obtained (for instance, anchor tags without anhref
attribute)
- jmespath(query, **kwargs)
Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).
- json()
New in version 2.2.
Deserialize a JSON document to a Python object.
- replace(*args, **kwargs)
Create a new Response with the same attributes except for those given new values
- property text: str
Body as unicode
- urljoin(url)
Join this Response’s url with a possible relative url to form an absolute interpretation of the latter.
- xpath(query, **kwargs)
Shortcut method implemented only by responses whose content is text (subclasses of TextResponse).