Features and Configuration

Features

Per-account configuration

Allows a user to run GoWF with mulitple configuration files to connect to different Swift accounts and monitor the folders they want.

You can use --config <config_account1> --config <config_account2> ... in your GoWF systemd file to enable this feature.

Log rotate

The option --log-rotate allow gowf rotates log files automatically. It compresses the current log to gowf-<datetime>.log.gz and creates a new gowf.log file once it reaches 512MB. Also, it keeps 20 backup files.

So you might see the log files in below,

$ ls -alh gowf*
-rw-r--r--  1 charles  staff   105K Aug 14 11:40 gowf-2019-08-14T03-40-18.073.log.gz
-rw-r--r--  1 charles  staff   105K Aug 14 11:40 gowf-2019-08-14T03-40-39.586.log.gz
-rw-r--r--  1 charles  staff   104K Aug 14 11:40 gowf-2019-08-14T03-40-57.306.log.gz
-rw-r--r--  1 charles  staff   106K Aug 14 12:44 gowf-2019-08-14T12-44-59.826.log.gz
-rw-r--r--  1 charles  staff   105K Aug 14 12:45 gowf-2019-08-14T12-45-21.291.log.gz
-rw-r--r--  1 charles  staff   103K Aug 14 12:45 gowf-2019-08-14T12-45-42.502.log.gz
-rw-r--r--  1 charles  staff   105K Aug 14 12:46 gowf-2019-08-14T12-46-00.206.log.gz
-rw-r--r--  1 charles  staff   150K Aug 14 12:47 gowf-2019-08-14T12-47-17.842.log.gz
-rw-r--r--  1 charles  staff   354K Aug 14 12:47 gowf.log

List Containers/Objects

Allows a user to list containers or objects under specific container via --list base on your default or desired --config <config_account1> config file.

# list container base on your default gowf configuration file
# Using default /etc/gowf/gowf.conf for listing swift containers
$ gowf --list

# Using specific gowf.conf for listing swift containers
$ gowf --config gowf.conf --list

# Using default /etc/gowf/gowf.conf for listing swift objects under swift container
# It will stdout on console and generate a timestamp CSV base filename in ./ if --log-level DEBUG
$ gowf --list <container name>

# using specific gowf.conf for listing swift containers for listing swift objects under swift container
# It will stdout on console and generate a timestamp CSV base filename in ./ if --log-level DEBUG
$ gowf --config gowf.conf --list <container name>

Download Object

Allows a user to download objects under specific container via --download base on your default or desired --config <config_account1> config file.

# Download swift object base on your default gowf configuration file (/etc/gowf/gowf.conf)
$ gowf --download <container name> <object name> <download directory/file name>

# If you don't give <download directory/file name>, it will download to "/tmp/object name"
$ gowf --download <container name> <object name>

# Using specific gowf.conf for downloading swift object
$ gowf --config gowf.conf --download <container name> <object name> <download directory/file name>

# If you don't give <download directory/file name>, it will download to "/tmp/object name"
$ gowf --config gowf.conf --download <container name> <object name>

Dedup

Deduplication is an experimental feature. Used on files that have not already been compressed, it can reduce the storage footprint of large files. The file will be uploaded as an SLO ( static large object ).

Please use --dedup to enable it. Also, if you would like to change the settings of dedup feature, you can find the settings under the dedup options (in the Folder section of the configuration file).

# run it in debug mode (turn on log-level to debug) for dedup
$ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --dedup

Note

Since GoWF dedup uses Swift SLOs, you MUST install the swift-gowf-deduper middleware from the GitHub repo resource/middleware folder to block dedup segments from being deleted.

Purge Dedup Segments

This is an experimental feature, which can delete dedup SLO segments base on the dedup object manifest.

Please use --purgededup to enable it. Also, if you would like to change the settings of dedup feature for the anchor parameter which is must NOT 1, you can find the settings in the dedup options (in the Folder section of the configuration file).

# run it in debug mode (turn on log-level to debug) for purge dedup segments
$ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --purgededup

Dedup Segments Reference Counting Statistic Information

This is an experimental feature, which can get dedup SLO segments reference countting by dedup object manifest.

Please use --statdedup to enable it. Also, the default dump stdout, but if you set --log-level DEBUG then it will dump csv under /tmp/gowf/.

# run it in debug mode (turn on log-level to debug) for get dedup segments reference counting statistic csv file
$ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --statdedup

Statistic Report

Prometheus Exporter

GoWF embedded a Prometheus exporter, so you can find the metrics from port 9988 (that you can change it with --exporter-port <new_port>).

$ curl 127.0.0.1:9988/metrics 2>/dev/null | grep gowf
# HELP gowf_diagnostic_objects_total Number of objects, partitioned by location and type
# TYPE gowf_diagnostic_objects_total gauge
gowf_diagnostic_objects_total{container="b1_demo",location="local",type="single",uuid="3c:15:c2:d7:4f:30"} 17
gowf_diagnostic_objects_total{container="b1_demo",location="remote",type="single",uuid="3c:15:c2:d7:4f:30"} 53
gowf_diagnostic_objects_total{container="b1_demo2",location="local",type="single",uuid="3c:15:c2:d7:4f:30"} 17
gowf_diagnostic_objects_total{container="b1_demo2",location="remote",type="single",uuid="3c:15:c2:d7:4f:30"} 53
# HELP gowf_diagnostic_proccessing_total Number of upload operations waiting to be processed, partitioned by type
# TYPE gowf_diagnostic_proccessing_total gauge
gowf_diagnostic_proccessing_total{container="b1_demo",type="inqueue",uuid="3c:15:c2:d7:4f:30"} 1
gowf_diagnostic_proccessing_total{container="b1_demo",type="md5calc",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo",type="single",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo",type="slo",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo2",type="inqueue",uuid="3c:15:c2:d7:4f:30"} 1
gowf_diagnostic_proccessing_total{container="b1_demo2",type="md5calc",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo2",type="single",uuid="3c:15:c2:d7:4f:30"} 0
gowf_diagnostic_proccessing_total{container="b1_demo2",type="slo",uuid="3c:15:c2:d7:4f:30"} 0
<SKIP>

Local JSON file

Provide a statistics report for each container to see how many objects are uploaded and how many are ongoing. You can find <configuration>_<container>.json under the <statistic_folder>, as shown below.

$ cat tmp/gowf.conf_test-container.json |python -m json.tool
{
  "local": 27,
  "remote": 27,
  "triggers": {
      "user": 1,
      "gowf": 0
  },
  "inqueue": 0,
  "inprogress": {
      "slo": 0,
      "single": 0,
      "md5calc": 0
  },
  "uploads": {
      "successes": 1,
      "failures": 0,
      "disregard": 0
  }
}

Please use -stats-folder <statistic_folder> to enable it.

Autoupdate

GoWF has an auto-update feature built-in. If you run gowf --update, GoWF will try to reach SwiftStack’s GoWF repo and upgrade to the latest version.

$ gowf --update
New version(0.0.7) is ready! Your current version: 0.0.5
Are you going to update(y/N)?y
Done! Next run should be 0.0.7

$ gowf --version
0.0.7

After GoWF has been updated, you’ll need to restart the service for the new version to be used. On a systemd based server, you can restart GoWF with:

$ sudo systemctl restart gowf.service

Configuration file

The gowf configuration file gowf.conf includes two major sections. The Global section starts with [global] and is only allowed once in the configuration file. The Folder section starts with [<folder directory>] and can be repeated multiple times in the configuration file. Examples of both global and folder sections can be found below. Here is an example for how to configure GoWF:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
[global]
user = demo
auth = https://192.168.190.21/auth/v1.0
key = demo

concurrency = 4

# default segment_size is 100MB
# Allow suffix size string (B, KB, MB, GB)
segment_size = 100MB
segment_container_prefix = .segment_

recursive = False
preserve_path = True

[/tmp/b1]
storage_policy = Standard-Replica
container = b1RR
segment_container = b1RR+mysegment

# `archive` mode:
#       Once the local files uploaded to swift, it will
#       try to delete these local files in a certain time that
#       you set in `keep_local_files`
# `sync` mode:
#       Sync local files to Swift, and won't delete local files.
#
mode = sync

# How long you would like to keep these files in local
# This option is only valid when you set `mode = archive`
# y: year, w: week, d: day, h: hour, m: minute, s: second
# keep_local_files = 30m

# split with comma
#file_patterns = *.txt, *.log
#file_patterns = *abc*

# expired remote object in `expired_after`
# y: year, w: week, d: day, h: hour, m: minute, s: second
expired_after = 60d

# Metadata for objects
metadata = key1:val1, key2:val2

# dedup parameters
# Allow suffix size string (B, KB, MB, GB)
anchor = 0
anchor_upper_bound = 512MB
anchor_lower_bound = 1MB
anchor_divider = 128
min_divider = 2
max_multiplier = 2
buffer_read_gate = 512MB
buffer_read_size = 1GB

Global section

The global section indicates which Swift endpoint should be used, including the username and password for the account. You can’t have multiple users in a configuration file, but you can setup another configuration file for a second user.

[global]
user = demo
auth = https://swift.example.com/auth/v1.0
key = demo

concurrency = 10

# default segment_size is 100MB
# Allow suffix size string (B, KB, MB, GB)
segment_size = 100MB

segment_container_prefix = .segment_

recursive = True
preserve_path = True

user

User name for obtaining an auth token.

auth

URL for obtaining an auth token.

key

Key for obtaining an auth token.

Note

If you want to use an encrypted file to store the key, please leave it as empty and run gowf --config <PATH>/<config-file> to generate an encrypted file.

$ gowf --config /tmp/demo.conf
Can't find user's password in /tmp/demo.conf!
Please enter password (User: charles, Auth: https://swift.example.com/auth/v1.0):

### And you'll see the encrypted file in `.vaults` folder.
$ ls -al /tmp/.vaults/13664203caceb3a61d351cbeece72001613de4ac
-rw-------  1 charles  wheel  519 May  6 16:58 /tmp/.vaults/13664203caceb3a61d351cbeece72001613de4ac

### Run gowf again to make sure everything is good
$ gowf --config /tmp/demo.conf --log-file /tmp/gowf.conf

For some cases, you might need to regenerate encrypted files due to the key changed for the user, please just run rm <PATH>/.vaults/* to remove files.

concurrency

Allows watch folder to spawn multiple concurrent threads to handle upload jobs in the queue. The default value is 4.

segment_size

The segment size for a Static Large Object (SLO). The default value is 100MB

segment_container_prefix

The prefix string of a segments container when GoWF creates a new segment container. The default value is .segment_

recursive

Watch sub-directories for file changes if set to True. The default value is False.

preserve_path

Constructing an object name (include the relative path) when it is True. The default value is True

For example, if you have a file called woof.txt, and it is under the folder /<the_folder_you_watched>/dog/staffy/, the relative path of the file is dog/staffy/woof.txt, so the object name in Swift will be dog/staffy/woof.txt.

When False, objects will always be named after the basename of the source file (e.g. woof.txt).

If you have files with the same name (0ab5.db, 53ef.db, … etc) in different subfolders (312/0ab5, 435/0ab5) as outlined below,

└── subfolder
          └── 38965
              ├── 312
              │   ├── 0ab5
              │   │   └── 0ab5.db
              │   ├── 53ef
              │   │   └── 53ef.db
              │   ├── 9c3a
              │   │   └── 9c3a.db
              │   └── f19ed
              │       └── f19ed.db
              └── 435
                  ├── 0ab5
                  │   └── 0ab5.db
                  ├── 53ef
                  │   └── 53ef.db
                  ├── 9c3a
                  │   └── 9c3a.db
                  └── f19ed
                      └── f19ed.db

... then, if you set recursive = True and preserve_path = False, GoWF will overwrite the target object in remote (0ab5.db) when one of the local files is updated (subfolder/38965/312/0ab5/0ab5.db or subfolder/435/312/0ab5/0ab5.db). This is probably not what you want, as the ‘wrong’ remote object will be overwritten because the path isn’t preserved. Please be aware of this behavior when you configure GoWF.

checker_interval

This interval decides how long GoWF do a integrity check for local and remote objects. The default value is 5m.

Folder section

[/tmp/b1]
storage_policy = Standard-Replica
container = b1RR
# segment_container = b1RR+mysegment

# `archive` mode:
#       Once the local files are uploaded to Swift, it will
#       try to delete the local files after the time you
#       defined in `keep_local_files`
#
# `sync` mode:
#       Sync local files to Swift. Will not delete local files.
#
mode = sync

# How long you would like to keep these files on the local
# file system.
# This option is only valid when you set `mode = archive`
# y: year, w: week, d: day, h: hour, m: minute, s: second
keep_local_files = 1d

# split with comma
file_patterns = *.txt, *.log

# expired remote object in `expired_after`
# y: year, w: week, d: day, h: hour, m: minute, s: second
expired_after = 60s

# Metadata for objects
metadata = key1:val1, key2:val2

# dedup parameters
# Allow suffix size string (B, KB, MB, GB)
anchor = 0
anchor_upper_bound = 512MB
anchor_lower_bound = 1MB
anchor_divider = 128
min_divider = 2
max_multiplier = 2
buffer_read_gate = 512MB
buffer_read_size = 1GB

storage_policy

This option allows you to create containers and segment containers under the specified storage policy. Default value is ` ` (Empty) and it’ll use the default policy defined in the Swift cluster.

Note:

If the container or segment container already exists and has a different storage policy, GoWF won’t upload any objects until you correct the policy in the config file or use a new container/segment container.

You’ll see some errors like below:

2017-08-28 08:33:01,502 - watch-uploader - INFO - Created Container(post_container) testSWF with Standard-Replica policy
2017-08-28 08:33:01,502 - watch-uploader - INFO - Created Container(post_container) testSWF with Standard-Replica policy
2017-08-28 08:33:01,671 - watch-uploader - INFO - Get(stat_container) container (testSWF_seg), policy: Reduced-Redundancy
2017-08-28 08:33:01,671 - watch-uploader - ERROR - Current container(testSWF_seg) policy(Reduced-Redundancy) is mismatch! (Standard-Replica)

container

The remote container name.

segment_container

The remote segment container name.

mode

archive mode:

Once the local files uhave been ploaded to Swift, it will try to delete the local files after the time you defined in keep_local_files

sync mode:

Sync local files to Swift, and won’t delete local files.

Note

If you don’t specific which mode you want to use, the sync will be the default mode.

keep_local_files

How long you would like to keep these files in local. This option is only valid when you set mode = archive. The default value of keep_local_files is 1d (1 day)

# Delete local files immediately
keep_local_files = 0

# keep files in 300 seconds
keep_local_files = 300s
# keep files in 10 minutes
keep_local_files = 10m
# keep files in 10 hours
keep_local_files = 10h
# keep files in 2 days
keep_local_files = 2d
# keep files in 1 years
keep_local_files = 1y

Note

The local files might not be deleted immediately due to GoWF won’t check files every second. So you might see delay deletions that should under checker_interval + keep_local_files.

file_patterns

Only upload files that match the patterns specified, split by a comma.

file_patterns = *.log.gz
file_patterns = *.gz,*.zip

expired_after

When you set this option in the folder section, the uploaded files will expire after the value you set. If you want to keep the uploaded files forever, do not use this option in the folder secion.

# 300 seconds
expired_after = 300s

# 5 minutes
expired_after = 5m

# 24 hours
expired_after = 24h

# 90 days
expired_after = 90d

# 7 years
expired_after = 7y

# 2 dyas 10 hours
expired_after = 2d10h

metadata

You can add metadata to each file that you want to upload from this folder.

metadata = key-1:value-1,key-2:value-2

Dedup options (under folder section)

Here is an example of using deduplication of your data. The deduplication feature of GoWF is using variable chunking and leverages anchors to keep track of it. In the example below, if we use a 10 MiB file as an example and divider is 128, you can control your deduplication chunk sizes between 65536 ~ 262144 and number of the chunks between 40 ( 10485760 / 262144 ) ~ 160 ( 10485760 / 65536 ).

# dedup parameters
# Allow suffix size string (B, KB, MB, GB)
anchor = 0
anchor_upper_bound = 512MB
anchor_lower_bound = 1MB
anchor_divider = 128
min_divider = 2
max_multiplier = 2
buffer_read_gate = 512MB
buffer_read_size = 1GB

anchor

The dedup anchor is used for tracking dedup rules and generally represents average chunk size of your deduplication. If anchor = 0, then we will calculate it from file size/anchor_divider and get upper bound for 2 head of X. X will be the anchor. However, after the calculation, if X > anchor_upper_bound, then we use anchor_upper_bound as the anchor or if X < anchor_lower_bound, then we use anchor_lower_bound as the anchor. The pseudo code is as below.

if anchor !=0 then
  anchor = anchor
  # stop
  break
elif anchor == 0 and 2^Y <  (file size / anchor_divider) < 2^X then
  anchor = X
  if X > anchor_upper_bound then
    anchor = anchor_upper_bound
  elif X < anchor_lower_bound then
    anchor = anchor_lower_bound

Note

If you don’t apply an anchor in the folder level configuration, GoWF defaults to anchor = 1, which means GoWF won’t run dedup or purgededup in this folder.

anchor_upper_bound

The anchor_upper_bound is applied after the anchor is calculated and found to be LARGER than the anchor_upper_bound. If that is true, dedup will just use anchor_upper_bound as the anchor.

anchor_lower_bound

The anchor_lower_bound is applied after the anchor is calculated and found to be SMALLER than the anchor_lower_bound. If that is true, dedup will just use anchor_lower_bound as the anchor.

anchor_divider

The anchor_divider is the denominator ( divider ) of the file size for getting the anchor.

e.g. anchor calculation example
file size = 10485760
10485760 / 128 = 81920
2^16=65536 < 81920 < 2^17=131072
anchor = 17

min_divider

After determining the anchor, the dedup function will use it to calculate the chunk lower bound, which can be derived by anchor / mid_divider.

e.g. smallest chunk size example
file size = 10485760
10485760 / 128 = 81920
2^16=65536 < 81920 < 2^17=131072
anchor = 17, average chunk size = 131072
smallest chunk size = 131072 / 2 = 65536

max_multiplier

After determining the anchor, the dedup function will use it to calculate the chunk upper bound, which can be derived by anchor * max_multiplier.

e.g. max chunk size example
file size = 10485760
10485760 / 128 = 81920
2^16=65536 < 81920 < 2^17=131072
anchor = 17, average chunk size = 131072
max chunk size = 131072 * 2 = 262144

buffer_read_gate

The buffer_read_gate is when your file is large and needs a memory buffer to read buffer by buffer instead of reading all of the content into memory. Thus, if your file size is larger than buffer_read_gate, the file will be read in pieces into buffer, given the buffer size, instead of reading the entire file at once. Of course, this will result in a much slower upload than if the file can be read in one pass.

buffer_read_size

The buffer_read_size is the buffer size you would like to use, e.g. buffer_read_size = 1073741824 when doing 128GiB file size processing. It will subsequently create read buffer of 1GiB each time and read it 128 times.