scrachy.settings.defaults.storage

The default settings for configuring AlchemyCacheStorage.

Module Attributes

SCRACHY_CACHE_ACTIVATION_SECS

Consider any page that is in the cache stale (do not retrieve it) if it has not been in the cache for at least this many seconds.

SCRACHY_CACHE_ACTIVATION_SECS_PATTERNS

A list of tuples consisting of a pattern and a delay time in seconds.

SCRACHY_CACHE_EXPIRATION_SECS_PATTERNS

Similar to SCRACHY_CACHE_ACTIVATION_SECS_PATTERNS, but overrides HTTPCACHE_EXPIRATION_SECS for matching urls.

SCRACHY_CACHE_EXPIRATION_SCHEDULE

Expire all responses that do not match a schedule pattern in the cache according to this schedule.

SCRACHY_CACHE_EXPIRATION_SCHEDULE_PATTERNS

Expire any response who's URL matches the given pattern according to the corresponding schedule.

SCRACHY_CACHE_DEFAULT_ENCODING

Sometimes it is not possible to determine the encoding of a page because it was not set properly at the source.

SCRACHY_CACHE_RESPONSE_RETRIEVAL_METHOD

The cache stores quite a bit of information about each response.

SCRACHY_DB_DIALECT

This specifies the database dialect to use and must be supported by SQLAlchemy

SCRACHY_DB_DRIVER

This specifies the name of the driver used to connect to the database.

SCRACHY_DB_HOST

The hostname (or ip address) where the database server is running.

SCRACHY_DB_PORT

The port number the database server is listening on.

SCRACHY_DB_DATABASE

For sqlite this is the path to the database file and it will be created if it does not already exist.

SCRACHY_DB_SCHEMA

This will set the schema for databases that support them (e.g., PostgreSQL).

SCRACHY_DB_USERNAME

The username used to connect to the database.

SCRACHY_DB_PASSWORD

The password (if any) used to connect to the database.

SCRACHY_DB_CONNECT_ARGS

Any other arguments that should be passed to sqla.create_engine().

SCRACHY_CACHE_SAVE_HISTORY

Whether or not to store the full scrape history for each page (identified by its fingerprint).

SCRACHY_CONTENT_EXTRACTOR

A class implementing the ContentExtractor protocol or an import path to a class implementing it.

SCRACHY_CONTENT_BS4_PARSER

The parser to use for constructing the DOM.

SCRACHY_BOILERPY_EXTRACTOR

A boilerpy Extractor class or the import path to one of the classes.