Features and Configuration¶
Features¶
Per-account configuration¶
Allows a user to run GoWF with mulitple configuration files to connect to different Swift accounts and monitor the folders they want.
You can use --config <config_account1> --config <config_account2> ...
in your GoWF systemd file to enable this feature.
Log rotate¶
The option --log-rotate
allow gowf rotates log files automatically. It compresses the current log to gowf-<datetime>.log.gz
and creates a new gowf.log file once it reaches 512MB
. Also, it keeps 20
backup files.
So you might see the log files in below,
$ ls -alh gowf* -rw-r--r-- 1 charles staff 105K Aug 14 11:40 gowf-2019-08-14T03-40-18.073.log.gz -rw-r--r-- 1 charles staff 105K Aug 14 11:40 gowf-2019-08-14T03-40-39.586.log.gz -rw-r--r-- 1 charles staff 104K Aug 14 11:40 gowf-2019-08-14T03-40-57.306.log.gz -rw-r--r-- 1 charles staff 106K Aug 14 12:44 gowf-2019-08-14T12-44-59.826.log.gz -rw-r--r-- 1 charles staff 105K Aug 14 12:45 gowf-2019-08-14T12-45-21.291.log.gz -rw-r--r-- 1 charles staff 103K Aug 14 12:45 gowf-2019-08-14T12-45-42.502.log.gz -rw-r--r-- 1 charles staff 105K Aug 14 12:46 gowf-2019-08-14T12-46-00.206.log.gz -rw-r--r-- 1 charles staff 150K Aug 14 12:47 gowf-2019-08-14T12-47-17.842.log.gz -rw-r--r-- 1 charles staff 354K Aug 14 12:47 gowf.log
List Containers/Objects¶
Allows a user to list containers or objects under specific container via --list
base on your default or desired --config <config_account1>
config file.
# list container base on your default gowf configuration file # Using default /etc/gowf/gowf.conf for listing swift containers $ gowf --list # Using specific gowf.conf for listing swift containers $ gowf --config gowf.conf --list # Using default /etc/gowf/gowf.conf for listing swift objects under swift container # It will stdout on console and generate a timestamp CSV base filename in ./ if --log-level DEBUG $ gowf --list <container name> # using specific gowf.conf for listing swift containers for listing swift objects under swift container # It will stdout on console and generate a timestamp CSV base filename in ./ if --log-level DEBUG $ gowf --config gowf.conf --list <container name>
Download Object¶
Allows a user to download objects under specific container via --download
base on your default or desired --config <config_account1>
config file.
# Download swift object base on your default gowf configuration file (/etc/gowf/gowf.conf) $ gowf --download <container name> <object name> <download directory/file name> # If you don't give <download directory/file name>, it will download to "/tmp/object name" $ gowf --download <container name> <object name> # Using specific gowf.conf for downloading swift object $ gowf --config gowf.conf --download <container name> <object name> <download directory/file name> # If you don't give <download directory/file name>, it will download to "/tmp/object name" $ gowf --config gowf.conf --download <container name> <object name>
Dedup¶
Deduplication is an experimental feature. Used on files that have not already been compressed, it can reduce the storage footprint of large files. The file will be uploaded as an SLO ( static large object ).
Please use --dedup
to enable it. Also, if you would like to change the settings of dedup feature, you can find the settings under the dedup options (in the Folder section of the configuration file).
# run it in debug mode (turn on log-level to debug) for dedup $ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --dedup
Note
Since GoWF dedup uses Swift SLOs, you MUST install the swift-gowf-deduper
middleware from the GitHub repo resource/middleware folder to block dedup segments from being deleted.
Purge Dedup Segments¶
This is an experimental feature, which can delete dedup SLO segments base on the dedup object manifest.
Please use --purgededup
to enable it. Also, if you would like to change the settings of dedup feature for the anchor parameter which is must NOT 1
, you can find the settings in the dedup options (in the Folder section of the configuration file).
# run it in debug mode (turn on log-level to debug) for purge dedup segments $ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --purgededup
Dedup Segments Reference Counting Statistic Information¶
This is an experimental feature, which can get dedup SLO segments reference countting by dedup object manifest.
Please use --statdedup
to enable it. Also, the default dump stdout, but if you set --log-level DEBUG
then it will dump csv under /tmp/gowf/
.
# run it in debug mode (turn on log-level to debug) for get dedup segments reference counting statistic csv file $ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --statdedup
Statistic Report¶
Prometheus Exporter¶
GoWF embedded a Prometheus exporter, so you can find the metrics from port 9988
(that you can change it with --exporter-port <new_port>
).
$ curl 127.0.0.1:9988/metrics 2>/dev/null | grep gowf
# HELP gowf_diagnostic_objects_total Number of objects, partitioned by location and type
# TYPE gowf_diagnostic_objects_total gauge
gowf_diagnostic_objects_total{container="b1_demo",location="local",type="single",uuid="3c:15:c2:d7:4f:30"} 17
gowf_diagnostic_objects_total{container="b1_demo",location="remote",type="single",uuid="3c:15:c2:d7:4f:30"} 53
gowf_diagnostic_objects_total{container="b1_demo2",location="local",type="single",uuid="3c:15:c2:d7:4f:30"} 17
gowf_diagnostic_objects_total{container="b1_demo2",location="remote",type="single",uuid="3c:15:c2:d7:4f:30"} 53
# HELP gowf_diagnostic_proccessing_total Number of upload operations waiting to be processed, partitioned by type
# TYPE gowf_diagnostic_proccessing_total gauge
gowf_diagnostic_proccessing_total{container="b1_demo",type="inqueue",uuid="3c:15:c2:d7:4f:30"} 1
gowf_diagnostic_proccessing_total{container="b1_demo",type="md5calc",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo",type="single",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo",type="slo",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo2",type="inqueue",uuid="3c:15:c2:d7:4f:30"} 1
gowf_diagnostic_proccessing_total{container="b1_demo2",type="md5calc",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo2",type="single",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo2",type="slo",uuid="3c:15:c2:d7:4f:30"} 0
<SKIP>
Local JSON file¶
Provide a statistics report for each container to see how many objects are uploaded and how many are ongoing. You can find <configuration>_<container>.json
under the <statistic_folder>
, as shown below.
$ cat tmp/gowf.conf_test-container.json |python -m json.tool
{
"local": 27,
"remote": 27,
"triggers": {
"user": 1,
"gowf": 0
},
"inqueue": 0,
"inprogress": {
"slo": 0,
"single": 0,
"md5calc": 0
},
"uploads": {
"successes": 1,
"failures": 0,
"disregard": 0
}
}
Please use -stats-folder <statistic_folder>
to enable it.
Autoupdate¶
GoWF has an auto-update feature built-in. If you run gowf --update
, GoWF will try to reach SwiftStack’s GoWF repo and upgrade to the latest version.
$ gowf --update
New version(0.0.7) is ready! Your current version: 0.0.5
Are you going to update(y/N)?y
Done! Next run should be 0.0.7
$ gowf --version
0.0.7
After GoWF has been updated, you’ll need to restart the service for the new version to be used. On a systemd based server, you can restart GoWF with:
$ sudo systemctl restart gowf.service
Configuration file¶
The gowf configuration file gowf.conf
includes two major sections. The Global section starts with [global]
and is only allowed once in the configuration file. The Folder section starts with [<folder directory>]
and can be repeated multiple times in the configuration file. Examples of both global
and folder
sections can be found below. Here is an example for how to configure GoWF:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | [global]
user = demo
auth = https://192.168.190.21/auth/v1.0
key = demo
concurrency = 4
# default segment_size is 100MB
# Allow suffix size string (B, KB, MB, GB)
segment_size = 100MB
segment_container_prefix = .segment_
recursive = False
preserve_path = True
[/tmp/b1]
storage_policy = Standard-Replica
container = b1RR
segment_container = b1RR+mysegment
# `archive` mode:
# Once the local files uploaded to swift, it will
# try to delete these local files in a certain time that
# you set in `keep_local_files`
# `sync` mode:
# Sync local files to Swift, and won't delete local files.
#
mode = sync
# How long you would like to keep these files in local
# This option is only valid when you set `mode = archive`
# y: year, w: week, d: day, h: hour, m: minute, s: second
# keep_local_files = 30m
# split with comma
#file_patterns = *.txt, *.log
#file_patterns = *abc*
# expired remote object in `expired_after`
# y: year, w: week, d: day, h: hour, m: minute, s: second
expired_after = 60d
# Metadata for objects
metadata = key1:val1, key2:val2
# dedup parameters
# Allow suffix size string (B, KB, MB, GB)
anchor = 0
anchor_upper_bound = 512MB
anchor_lower_bound = 1MB
anchor_divider = 128
min_divider = 2
max_multiplier = 2
buffer_read_gate = 512MB
buffer_read_size = 1GB
|
Global section¶
The global
section indicates which Swift endpoint should be used, including the username and password for the account. You can’t have multiple users in a configuration file, but you can setup another configuration file for a second user.
[global]
user = demo
auth = https://swift.example.com/auth/v1.0
key = demo
concurrency = 10
# default segment_size is 100MB
# Allow suffix size string (B, KB, MB, GB)
segment_size = 100MB
segment_container_prefix = .segment_
recursive = True
preserve_path = True
user¶
User name for obtaining an auth token.
auth¶
URL for obtaining an auth token.
key¶
Key for obtaining an auth token.
Note
If you want to use an encrypted file to store the key, please leave it as empty and run
gowf --config <PATH>/<config-file>
to generate an encrypted file.$ gowf --config /tmp/demo.conf Can't find user's password in /tmp/demo.conf! Please enter password (User: charles, Auth: https://swift.example.com/auth/v1.0): ### And you'll see the encrypted file in `.vaults` folder. $ ls -al /tmp/.vaults/13664203caceb3a61d351cbeece72001613de4ac -rw------- 1 charles wheel 519 May 6 16:58 /tmp/.vaults/13664203caceb3a61d351cbeece72001613de4ac ### Run gowf again to make sure everything is good $ gowf --config /tmp/demo.conf --log-file /tmp/gowf.confFor some cases, you might need to regenerate encrypted files due to the key changed for the user, please just run
rm <PATH>/.vaults/*
to remove files.
concurrency¶
Allows watch folder to spawn multiple concurrent threads to handle upload jobs in the queue. The default value is4
.
segment_size¶
The segment size for a Static Large Object (SLO). The default value is100MB
segment_container_prefix¶
The prefix string of a segments container when GoWF creates a new segment container. The default value is.segment_
recursive¶
Watch sub-directories for file changes if set toTrue
. The default value isFalse
.
preserve_path¶
Constructing an object name (include the relative path) when it isTrue
. The default value isTrue
For example, if you have a file called woof.txt
, and it is under the folder /<the_folder_you_watched>/dog/staffy/
, the relative path of the file is dog/staffy/woof.txt
, so the object name in Swift will be dog/staffy/woof.txt
.
When False
, objects will always be named after the basename of the source file (e.g. woof.txt
).
If you have files with the same name (0ab5.db
, 53ef.db
, … etc) in different subfolders (312/0ab5
, 435/0ab5
) as outlined below,
└── subfolder └── 38965 ├── 312 │ ├── 0ab5 │ │ └── 0ab5.db │ ├── 53ef │ │ └── 53ef.db │ ├── 9c3a │ │ └── 9c3a.db │ └── f19ed │ └── f19ed.db └── 435 ├── 0ab5 │ └── 0ab5.db ├── 53ef │ └── 53ef.db ├── 9c3a │ └── 9c3a.db └── f19ed └── f19ed.db
... then, if you set recursive = True
and preserve_path = False
, GoWF will overwrite the target object in remote (0ab5.db
) when one of the local files is updated (subfolder/38965/312/0ab5/0ab5.db
or subfolder/435/312/0ab5/0ab5.db
). This is probably not what you want, as the ‘wrong’ remote object will be overwritten because the path isn’t preserved. Please be aware of this behavior when you configure GoWF.
checker_interval¶
This interval decides how long GoWF do a integrity check for local and remote objects. The default value is5m
.
Folder section¶
[/tmp/b1]
storage_policy = Standard-Replica
container = b1RR
# segment_container = b1RR+mysegment
# `archive` mode:
# Once the local files are uploaded to Swift, it will
# try to delete the local files after the time you
# defined in `keep_local_files`
#
# `sync` mode:
# Sync local files to Swift. Will not delete local files.
#
mode = sync
# How long you would like to keep these files on the local
# file system.
# This option is only valid when you set `mode = archive`
# y: year, w: week, d: day, h: hour, m: minute, s: second
keep_local_files = 1d
# split with comma
file_patterns = *.txt, *.log
# expired remote object in `expired_after`
# y: year, w: week, d: day, h: hour, m: minute, s: second
expired_after = 60s
# Metadata for objects
metadata = key1:val1, key2:val2
# dedup parameters
# Allow suffix size string (B, KB, MB, GB)
anchor = 0
anchor_upper_bound = 512MB
anchor_lower_bound = 1MB
anchor_divider = 128
min_divider = 2
max_multiplier = 2
buffer_read_gate = 512MB
buffer_read_size = 1GB
storage_policy¶
This option allows you to create containers and segment containers under the specified storage policy. Default value is ` ` (Empty) and it’ll use the default policy defined in the Swift cluster.
Note:
If the container or segment container already exists and has a different storage policy, GoWF won’t upload any objects until you correct the policy in the config file or use a new container/segment container.You’ll see some errors like below:
2017-08-28 08:33:01,502 - watch-uploader - INFO - Created Container(post_container) testSWF with Standard-Replica policy 2017-08-28 08:33:01,502 - watch-uploader - INFO - Created Container(post_container) testSWF with Standard-Replica policy 2017-08-28 08:33:01,671 - watch-uploader - INFO - Get(stat_container) container (testSWF_seg), policy: Reduced-Redundancy 2017-08-28 08:33:01,671 - watch-uploader - ERROR - Current container(testSWF_seg) policy(Reduced-Redundancy) is mismatch! (Standard-Replica)
container¶
The remote container name.
segment_container¶
The remote segment container name.
mode¶
archive mode:
Once the local files uhave been ploaded to Swift, it will try to delete the local files after the time you defined inkeep_local_files
sync mode:
Sync local files to Swift, and won’t delete local files.Note
If you don’t specific which mode you want to use, the sync will be the default mode.
keep_local_files¶
How long you would like to keep these files in local.
This option is only valid when you set mode = archive.
The default value of keep_local_files
is 1d
(1 day)
# Delete local files immediately keep_local_files = 0 # keep files in 300 seconds keep_local_files = 300s # keep files in 10 minutes keep_local_files = 10m # keep files in 10 hours keep_local_files = 10h # keep files in 2 days keep_local_files = 2d # keep files in 1 years keep_local_files = 1yNote
The local files might not be deleted immediately due to GoWF won’t check files every second. So you might see delay deletions that should under
checker_interval
+keep_local_files
.
file_patterns¶
Only upload files that match the patterns specified, split by a comma.
file_patterns = *.log.gz file_patterns = *.gz,*.zip
expired_after¶
When you set this option in the folder section, the uploaded files will expire after the value you set. If you want to keep the uploaded files forever, do not use this option in the folder secion.
# 300 seconds expired_after = 300s # 5 minutes expired_after = 5m # 24 hours expired_after = 24h # 90 days expired_after = 90d # 7 years expired_after = 7y # 2 dyas 10 hours expired_after = 2d10h
metadata¶
You can add metadata to each file that you want to upload from this folder.
metadata = key-1:value-1,key-2:value-2
Dedup options (under folder section)¶
Here is an example of using deduplication of your data. The deduplication feature of GoWF is using variable chunking and leverages anchors to keep track of it. In the example below, if we use a 10 MiB file as an example and divider is
128
, you can control your deduplication chunk sizes between65536
~262144
and number of the chunks between40
( 10485760 / 262144 ) ~160
( 10485760 / 65536 ).# dedup parameters # Allow suffix size string (B, KB, MB, GB) anchor = 0 anchor_upper_bound = 512MB anchor_lower_bound = 1MB anchor_divider = 128 min_divider = 2 max_multiplier = 2 buffer_read_gate = 512MB buffer_read_size = 1GB
anchor¶
The dedup anchor is used for tracking dedup rules and generally represents
average chunk size
of your deduplication. If anchor = 0, then we will calculate it fromfile size/anchor_divider
and get upper bound for2 head of X
.X
will be the anchor. However, after the calculation, ifX > anchor_upper_bound
, then we use anchor_upper_bound as the anchor or ifX < anchor_lower_bound
, then we use anchor_lower_bound as the anchor. The pseudo code is as below.if anchor !=0 then anchor = anchor # stop break elif anchor == 0 and 2^Y < (file size / anchor_divider) < 2^X then anchor = X if X > anchor_upper_bound then anchor = anchor_upper_bound elif X < anchor_lower_bound then anchor = anchor_lower_bound
Note
If you don’t apply an anchor in the folder level configuration, GoWF defaults to anchor = 1
, which means GoWF won’t run dedup or purgededup in this folder.
anchor_upper_bound¶
The anchor_upper_bound is applied after the anchor is calculated and found to beLARGER
than the anchor_upper_bound. If that is true, dedup will just use anchor_upper_bound as the anchor.
anchor_lower_bound¶
The anchor_lower_bound is applied after the anchor is calculated and found to beSMALLER
than the anchor_lower_bound. If that is true, dedup will just use anchor_lower_bound as the anchor.
anchor_divider¶
The anchor_divider is the
denominator
( divider ) of the file size for getting the anchor.e.g. anchor calculation example file size = 10485760 10485760 / 128 = 81920 2^16=65536 < 81920 < 2^17=131072 anchor = 17
min_divider¶
After determining the anchor, the dedup function will use it to calculate the chunk lower bound, which can be derived by
anchor / mid_divider
.e.g. smallest chunk size example file size = 10485760 10485760 / 128 = 81920 2^16=65536 < 81920 < 2^17=131072 anchor = 17, average chunk size = 131072 smallest chunk size = 131072 / 2 = 65536
max_multiplier¶
After determining the anchor, the dedup function will use it to calculate the chunk upper bound, which can be derived by
anchor * max_multiplier
.e.g. max chunk size example file size = 10485760 10485760 / 128 = 81920 2^16=65536 < 81920 < 2^17=131072 anchor = 17, average chunk size = 131072 max chunk size = 131072 * 2 = 262144
buffer_read_gate¶
The buffer_read_gate is when your file is large and needs a memory buffer to read buffer by buffer instead of reading all of the content into memory. Thus, if your file size is larger than buffer_read_gate, the file will be read in pieces into buffer, given the buffer size, instead of reading the entire file at once. Of course, this will result in a much slower upload than if the file can be read in one pass.
buffer_read_size¶
The buffer_read_size is the buffer size you would like to use, e.g. buffer_read_size =1073741824
when doing128GiB
file size processing. It will subsequently create read buffer of1GiB
each time and read it128
times.