Table Config#
“table config” are options whichs are valid at the per-table level, such as:
backup:
tables:
- name: foo # <--- here
Remember that any option defined below can be specified at more general levels of config as a fallback for options which are the same across many/all tables.
All options are intentionally identical, and in most cases are valid for both
backup
and restore
commands, unless specifically noted otherwise.
name
#
When tables
are given as a mapping, defaults to the key-side of the mapping.
When tables
are given as a list, this field is required!
The name
field defines the matching criteria for tables which should be backed
up/ restored. For the simplest case, this can just be the name (or
{schema}.{name}
) of the table. However this name can also be “globbed” to
match multiple tables!
Note
Remember each item in this list has a corresponding query
field which can be
an arbitrary query. This means that you can utilize the same name
more than once.
It’s just a way of matching the source table that each iteration should target.
# This
tables:
public.foo:
'*.*':
# Is the same as this
tables:
- name: public.foo
- name: '*.*'
# Is the same as this
tables:
- public.foo
- '*.*'
Note
name
can also be omitted entirely, with some caveats. The “name” field
populates the {table}
templated into queries and location paths (both
of which default to including the {table}
template value).
Thus, if you omit the “name” field, you must have also provided a concrete “query” and “location” field.
tables:
- query: select * from for_example_a_view
location: backups/public.for_example_a_view
Globbing#
Using common globbing rules:
Pattern |
Meaning |
---|---|
* |
matches everything |
? |
matches any single character |
[seq] |
matches any character in seq |
[!seq] |
matches any character not in seq |
For some common examples:
public.*
: All tables in a schema*.foo
: Tables with a given name in all schemas*_log
: All tables ending with some suffix*_*_log
: Multiple globs
See also the exclude key below.
Note
Globbing was chosen over regex for a much more simplified way of quickly matching table names in a way that is easily grokkable. It’s conceivable that regex matching could be supported in the future, but in most common cases globs with exclusions should be able to match most kinds of cases.
location
#
Defaults to backups/{table}
location
paths use URI protocols for determining (on a per path basis) what
protocol to use for the backup/restore of that path.
Tip
Output files default to being separated into table-specific folder through
{table}
. They can be colocated regardless of folder by removing that template
source e.x. backups/
.
Local files#
Note an otherwise unadorned path will be assumed to be a local file path, for
example path/to/folder
.
For backups, if the path leading up to the leaf folder does not yet exist, it will be automatically created.
S3#
A path is identified as an “S3 path” when it is prefixed with the S3 protocol:
s3://
.
For example s3://bucket/path/to/folder
references a path path/to/folder
inside of a bucket bucket
.
S3 paths make use of the s3 config for authorization
against the included bucket. Alternatively, the common environment variables
recognized by the aws
CLI (i.e. AWS_PROFILE
, AWS_REGION
,
AWS_SECRET_ACCESS_KEY
, AWS_ACCESS_KEY_ID
, etc) will be automatically read.
filename
#
Defaults to {timestamp}.{ext}
.
Coupled with the “location” configuration, a fully templated path will result as
(by default) backups/{table}/{timestamp}.{ext}
. This yields a new file each
time a command is run.
Tip
{timestamp}
is a “variable” template source, meaning a new value will be yielded
each time. In order to reference a static filename, configure a filename without
a variable source, e.x. {table}.{ext}
.
strategy
#
Defaults to use_latest_filename
.
This option is only read during restore
commands and has two valid values:
use_latest_filename
and use_latest_metadata
.
The restore-time “strategy” defines how databudgie should determine which file,
on a per-table basis to read from. Note that each time you run
databudgie backup
, it’s never altering preexisting files, instead it’s writing
new files to disk with a timestamp in the name to disambiguate.
use_latest_filename
will make use of the default file naming scheme which includes write-time timestamps in the name of the file, and chooses the most recent timestamp.use_latest_metadata
will use the Operating System file attributes for file creation time (or equivalent in S3), and chooses the most recent one.
truncate
#
Defaults to false
.
This option is only read during restore
commands. When true
, truncates
the contents of the table before attempting to restore into it.
Note
This can run afoul of foreign key constraints, depending on your table structure.
The tables are intentionally ordered in such a way as to avoid or reduce the possibility of foreign key related issues; however self referential or circular foreign key relationships may encounter issues with this option (on those tables).
query
#
Defaults to select * from {table}
Specify the query to be used on a given match. The default, which simply selects the whole table is the most obviously useful query one might use, to backup the whole table.
There aren’t any constraints on the query to be executed, however, so this field can apply filters, perform joins, alter/obfuscate the data, or otherwise do whatever it wants.
compression
#
Defaults to null
.
Depending on the size of tables, the backups can get quite large. By default compression is disabled, but it can be enabled for any/all table “data” backups.
Valid values include: gzip
.
Note
This automatically appends the compression file extension to the backup files
(i.e. .gz
for gzip), and will only work correctly if both the backup side and
restore side agree on the value of the compression
key.
exclude
#
Defaults to []
.
This is most commonly useful when using globs, particularly when running up against the limitations of glob matching versus regex
tables:
# All log tables
- name: "*_log"
exclude:
- "tree_log"
Note that exclude
list entries can also be globs themselves. So you can use
them to arrive at more complex matching criteria than could be achieved with the
single name
matching glob.
follow_foreign_keys
#
Defaults to false
.
When true
, any foreign keys on the table will be recursively followed when
performing the backup/restore. This allows one to specify only the table one
seeks to backup/restore and any tables related through foreign keys will also be
backed up.
Note
The backup file that is stored/read from will be relative to the explicit table that originated the inclusion of that table in the config.
That is, if your config file includes some table “public.foo” with location
“backups/{table}”,
then any tables backed up as a result of follow_foreign_keys
on behalf of this table
will end up at backups/public.foo/{table}
.
In the event that two tables produce “followed” versions of the same table, only one backup will be produced, under whichever table happens to resolve first (a necessary measure on the restore-side due to foreign key constraints). For a given heirarchy of foreign keys, this should remain constant, but doesnt preclude some future table/foreign key from taking control of that table by virtue of being higher up in the heirarchy.
skip_if_exists
#
Note
Unlike most options, this option only has an effect in the backup-side of the config.
Defaults to false
.
When true
, skips the backing up the table, if there already exists backup data
for the annotated table.