Features and Configuration ======================================= Features ----------- Per-account configuration ++++++++++++++++++++++++++++++++++++ Allows a user to run GoWF with mulitple configuration files to connect to different Swift accounts and monitor the folders they want. You can use ``--config --config ...`` in your GoWF systemd file to enable this feature. Log rotate +++++++++++++++ The option ``--log-rotate`` allow gowf rotates log files automatically. It compresses the current log to ``gowf-.log.gz`` and creates a new gowf.log file once it reaches ``512MB``. Also, it keeps ``20`` backup files. So you might see the log files in below, .. code-block:: bash $ ls -alh gowf* -rw-r--r-- 1 charles staff 105K Aug 14 11:40 gowf-2019-08-14T03-40-18.073.log.gz -rw-r--r-- 1 charles staff 105K Aug 14 11:40 gowf-2019-08-14T03-40-39.586.log.gz -rw-r--r-- 1 charles staff 104K Aug 14 11:40 gowf-2019-08-14T03-40-57.306.log.gz -rw-r--r-- 1 charles staff 106K Aug 14 12:44 gowf-2019-08-14T12-44-59.826.log.gz -rw-r--r-- 1 charles staff 105K Aug 14 12:45 gowf-2019-08-14T12-45-21.291.log.gz -rw-r--r-- 1 charles staff 103K Aug 14 12:45 gowf-2019-08-14T12-45-42.502.log.gz -rw-r--r-- 1 charles staff 105K Aug 14 12:46 gowf-2019-08-14T12-46-00.206.log.gz -rw-r--r-- 1 charles staff 150K Aug 14 12:47 gowf-2019-08-14T12-47-17.842.log.gz -rw-r--r-- 1 charles staff 354K Aug 14 12:47 gowf.log List Containers/Objects ++++++++++++++++++++++++++++++++++++++++++ Allows a user to list containers or objects under specific container via ``--list`` base on your default or desired ``--config `` config file. .. code-block:: bash # list container base on your default gowf configuration file # Using default /etc/gowf/gowf.conf for listing swift containers $ gowf --list # Using specific gowf.conf for listing swift containers $ gowf --config gowf.conf --list # Using default /etc/gowf/gowf.conf for listing swift objects under swift container # It will stdout on console and generate a timestamp CSV base filename in ./ if --log-level DEBUG $ gowf --list # using specific gowf.conf for listing swift containers for listing swift objects under swift container # It will stdout on console and generate a timestamp CSV base filename in ./ if --log-level DEBUG $ gowf --config gowf.conf --list Download Object +++++++++++++++ Allows a user to download objects under specific container via ``--download`` base on your default or desired ``--config `` config file. .. code-block:: bash # Download swift object base on your default gowf configuration file (/etc/gowf/gowf.conf) $ gowf --download # If you don't give , it will download to "/tmp/object name" $ gowf --download # Using specific gowf.conf for downloading swift object $ gowf --config gowf.conf --download # If you don't give , it will download to "/tmp/object name" $ gowf --config gowf.conf --download Dedup ++++++++++++++++++ Deduplication is an experimental feature. Used on files that have not already been compressed, it can reduce the storage footprint of large files. The file will be uploaded as an **SLO ( static large object )**. Please use ``--dedup`` to enable it. Also, if you would like to change the settings of dedup feature, you can find the settings under the dedup options (in the Folder section of the configuration file). .. code-block:: bash # run it in debug mode (turn on log-level to debug) for dedup $ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --dedup .. note:: Since GoWF dedup uses Swift SLOs, you **MUST** install the ``swift-gowf-deduper`` middleware from the GitHub repo **resource/middleware** folder to block dedup segments from being deleted. Purge Dedup Segments +++++++++++++++++++++ This is an experimental feature, which can delete dedup SLO segments base on the dedup object manifest. Please use ``--purgededup`` to enable it. Also, if you would like to change the settings of dedup feature for the anchor parameter which is must **NOT** ``1``, you can find the settings in the dedup options (in the Folder section of the configuration file). .. code-block:: bash # run it in debug mode (turn on log-level to debug) for purge dedup segments $ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --purgededup Dedup Segments Reference Counting Statistic Information +++++++++++++++++++++++++++++++++++++++++++++++++++++++ This is an experimental feature, which can get dedup SLO segments reference countting by dedup object manifest. Please use ``--statdedup`` to enable it. Also, the default dump stdout, but if you set ``--log-level DEBUG`` then it will dump csv under ``/tmp/gowf/``. .. code-block:: bash # run it in debug mode (turn on log-level to debug) for get dedup segments reference counting statistic csv file $ gowf --config gowf.conf --log-file log.txt --log-level DEBUG --statdedup Statistic Report ++++++++++++++++++ Prometheus Exporter #################### GoWF embedded a Prometheus exporter, so you can find the metrics from port ``9988`` (that you can change it with ``--exporter-port ``). .. code-block:: console $ curl 127.0.0.1:9988/metrics 2>/dev/null | grep gowf # HELP gowf_diagnostic_objects_total Number of objects, partitioned by location and type # TYPE gowf_diagnostic_objects_total gauge gowf_diagnostic_objects_total{container="b1_demo",location="local",type="single",uuid="3c:15:c2:d7:4f:30"} 17 gowf_diagnostic_objects_total{container="b1_demo",location="remote",type="single",uuid="3c:15:c2:d7:4f:30"} 53 gowf_diagnostic_objects_total{container="b1_demo2",location="local",type="single",uuid="3c:15:c2:d7:4f:30"} 17 gowf_diagnostic_objects_total{container="b1_demo2",location="remote",type="single",uuid="3c:15:c2:d7:4f:30"} 53 # HELP gowf_diagnostic_proccessing_total Number of upload operations waiting to be processed, partitioned by type # TYPE gowf_diagnostic_proccessing_total gauge gowf_diagnostic_proccessing_total{container="b1_demo",type="inqueue",uuid="3c:15:c2:d7:4f:30"} 1 gowf_diagnostic_proccessing_total{container="b1_demo",type="md5calc",uuid="3c:15:c2:d7:4f:30"} 0 gowf_diagnostic_proccessing_total{container="b1_demo",type="single",uuid="3c:15:c2:d7:4f:30"} 0 gowf_diagnostic_proccessing_total{container="b1_demo",type="slo",uuid="3c:15:c2:d7:4f:30"} 0 gowf_diagnostic_proccessing_total{container="b1_demo2",type="inqueue",uuid="3c:15:c2:d7:4f:30"} 1 gowf_diagnostic_proccessing_total{container="b1_demo2",type="md5calc",uuid="3c:15:c2:d7:4f:30"} 0 gowf_diagnostic_proccessing_total{container="b1_demo2",type="single",uuid="3c:15:c2:d7:4f:30"} 0 gowf_diagnostic_proccessing_total{container="b1_demo2",type="slo",uuid="3c:15:c2:d7:4f:30"} 0 Local JSON file ################## Provide a statistics report for each container to see how many objects are uploaded and how many are ongoing. You can find ``_.json`` under the ````, as shown below. .. code-block:: console $ cat tmp/gowf.conf_test-container.json |python -m json.tool { "local": 27, "remote": 27, "triggers": { "user": 1, "gowf": 0 }, "inqueue": 0, "inprogress": { "slo": 0, "single": 0, "md5calc": 0 }, "uploads": { "successes": 1, "failures": 0, "disregard": 0 } } Please use ``-stats-folder `` to enable it. Autoupdate ++++++++++++++++++ GoWF has an auto-update feature built-in. If you run ``gowf --update``, GoWF will try to reach SwiftStack's GoWF repo and upgrade to the latest version. .. code-block:: console $ gowf --update New version(0.0.7) is ready! Your current version: 0.0.5 Are you going to update(y/N)?y Done! Next run should be 0.0.7 $ gowf --version 0.0.7 After GoWF has been updated, you'll need to restart the service for the new version to be used. On a systemd based server, you can restart GoWF with: .. code-block:: console $ sudo systemctl restart gowf.service Configuration file ------------------------------------ The gowf configuration file ``gowf.conf`` includes two major sections. The **Global section** starts with ``[global]`` and is only allowed once in the configuration file. The **Folder section** starts with ``[]`` and can be repeated multiple times in the configuration file. Examples of both ``global`` and ``folder`` sections can be found below. Here is an example for how to configure GoWF: .. literalinclude:: ../../gowf.conf :language: bash :linenos: Global section ++++++++++++++++++ The ``global`` section indicates which Swift endpoint should be used, including the username and password for the account. You can't have multiple users in a configuration file, but you can setup another configuration file for a second user. .. code-block:: bash [global] user = demo auth = https://swift.example.com/auth/v1.0 key = demo concurrency = 10 # default segment_size is 100MB # Allow suffix size string (B, KB, MB, GB) segment_size = 100MB segment_container_prefix = .segment_ recursive = True preserve_path = True user ################ User name for obtaining an auth token. auth ################ URL for obtaining an auth token. .. _encrypted_password: key ################ Key for obtaining an auth token. .. note:: If you want to use an encrypted file to store the key, please leave it as empty and run ``gowf --config /`` to generate an encrypted file. .. code-block:: bash $ gowf --config /tmp/demo.conf Can't find user's password in /tmp/demo.conf! Please enter password (User: charles, Auth: https://swift.example.com/auth/v1.0): ### And you'll see the encrypted file in `.vaults` folder. $ ls -al /tmp/.vaults/13664203caceb3a61d351cbeece72001613de4ac -rw------- 1 charles wheel 519 May 6 16:58 /tmp/.vaults/13664203caceb3a61d351cbeece72001613de4ac ### Run gowf again to make sure everything is good $ gowf --config /tmp/demo.conf --log-file /tmp/gowf.conf For some cases, you might need to regenerate encrypted files due to the key changed for the user, please just run ``rm /.vaults/*`` to remove files. concurrency ################ Allows watch folder to spawn multiple concurrent threads to handle upload jobs in the queue. The default value is ``4``. segment_size ################ The segment size for a Static Large Object (SLO). The default value is ``100MB`` segment_container_prefix ######################### The prefix string of a segments container when GoWF creates a new segment container. The default value is ``.segment_`` recursive ################ Watch sub-directories for file changes if set to ``True``. The default value is ``False``. preserve_path ################ Constructing an object name (include the relative path) when it is ``True``. The default value is ``True`` For example, if you have a file called ``woof.txt``, and it is under the folder ``//dog/staffy/``, the relative path of the file is ``dog/staffy/woof.txt``, so the object name in Swift will be ``dog/staffy/woof.txt``. When ``False``, objects will always be named after the basename of the source file (e.g. ``woof.txt``). If you have files with the same name (``0ab5.db``, ``53ef.db``, … etc) in different subfolders (``312/0ab5``, ``435/0ab5``) as outlined below, .. code-block:: console └── subfolder └── 38965 ├── 312 │   ├── 0ab5 │   │   └── 0ab5.db │   ├── 53ef │   │   └── 53ef.db │   ├── 9c3a │   │   └── 9c3a.db │   └── f19ed │   └── f19ed.db └── 435 ├── 0ab5 │   └── 0ab5.db ├── 53ef │   └── 53ef.db ├── 9c3a │   └── 9c3a.db └── f19ed └── f19ed.db ... then, if you set ``recursive = True`` and ``preserve_path = False``, GoWF will overwrite the target object in remote (``0ab5.db``) when one of the local files is updated (``subfolder/38965/312/0ab5/0ab5.db`` or ``subfolder/435/312/0ab5/0ab5.db``). This is probably not what you want, as the 'wrong' remote object will be overwritten because the path isn't preserved. Please be aware of this behavior when you configure GoWF. checker_interval ################## This interval decides how long GoWF do a integrity check for local and remote objects. The default value is ``5m``. Folder section +++++++++++++++++ .. code-block:: bash [/tmp/b1] storage_policy = Standard-Replica container = b1RR # segment_container = b1RR+mysegment # `archive` mode: # Once the local files are uploaded to Swift, it will # try to delete the local files after the time you # defined in `keep_local_files` # # `sync` mode: # Sync local files to Swift. Will not delete local files. # mode = sync # How long you would like to keep these files on the local # file system. # This option is only valid when you set `mode = archive` # y: year, w: week, d: day, h: hour, m: minute, s: second keep_local_files = 1d # split with comma file_patterns = *.txt, *.log # expired remote object in `expired_after` # y: year, w: week, d: day, h: hour, m: minute, s: second expired_after = 60s # Metadata for objects metadata = key1:val1, key2:val2 # dedup parameters # Allow suffix size string (B, KB, MB, GB) anchor = 0 anchor_upper_bound = 512MB anchor_lower_bound = 1MB anchor_divider = 128 min_divider = 2 max_multiplier = 2 buffer_read_gate = 512MB buffer_read_size = 1GB storage_policy ################ This option allows you to create containers and segment containers under the specified storage policy. Default value is ` ` (Empty) and it'll use the default policy defined in the Swift cluster. **Note:** If the container or segment container already exists and has a different storage policy, GoWF won't upload any objects until you correct the policy in the config file or use a new container/segment container. You'll see some errors like below: .. code-block:: go 2017-08-28 08:33:01,502 - watch-uploader - INFO - Created Container(post_container) testSWF with Standard-Replica policy 2017-08-28 08:33:01,502 - watch-uploader - INFO - Created Container(post_container) testSWF with Standard-Replica policy 2017-08-28 08:33:01,671 - watch-uploader - INFO - Get(stat_container) container (testSWF_seg), policy: Reduced-Redundancy 2017-08-28 08:33:01,671 - watch-uploader - ERROR - Current container(testSWF_seg) policy(Reduced-Redundancy) is mismatch! (Standard-Replica) container ########## The remote container name. segment_container ################## The remote segment container name. mode ################## **archive** mode: Once the local files uhave been ploaded to Swift, it will try to delete the local files after the time you defined in ``keep_local_files`` **sync** mode: Sync local files to Swift, and won't delete local files. .. note:: If you don't specific which mode you want to use, the **sync** will be the default mode. keep_local_files ################### How long you would like to keep these files in local. This option is only valid when you set **mode = archive**. The default value of ``keep_local_files`` is ``1d`` (1 day) .. code-block:: bash # Delete local files immediately keep_local_files = 0 # keep files in 300 seconds keep_local_files = 300s # keep files in 10 minutes keep_local_files = 10m # keep files in 10 hours keep_local_files = 10h # keep files in 2 days keep_local_files = 2d # keep files in 1 years keep_local_files = 1y .. note:: The local files might not be deleted immediately due to GoWF won't check files every second. So you might see delay deletions that should under ``checker_interval`` + ``keep_local_files``. file_patterns ################ Only upload files that match the patterns specified, split by a comma. .. code-block:: bash file_patterns = *.log.gz file_patterns = *.gz,*.zip expired_after ################ When you set this option in the folder section, the uploaded files will expire after the value you set. If you want to keep the uploaded files forever, do not use this option in the folder secion. .. code-block:: bash # 300 seconds expired_after = 300s # 5 minutes expired_after = 5m # 24 hours expired_after = 24h # 90 days expired_after = 90d # 7 years expired_after = 7y # 2 dyas 10 hours expired_after = 2d10h metadata ################ You can add metadata to each file that you want to upload from this folder. .. code-block:: bash metadata = key-1:value-1,key-2:value-2 Dedup options (under folder section) +++++++++++++++++++++++++++++++++++++ Here is an example of using deduplication of your data. The deduplication feature of GoWF is using variable chunking and leverages anchors to keep track of it. In the example below, if we use a 10 MiB file as an example and divider is ``128``, you can control your deduplication chunk sizes between ``65536`` ~ ``262144`` and number of the chunks between ``40`` ( 10485760 / 262144 ) ~ ``160`` ( 10485760 / 65536 ). .. code-block:: bash # dedup parameters # Allow suffix size string (B, KB, MB, GB) anchor = 0 anchor_upper_bound = 512MB anchor_lower_bound = 1MB anchor_divider = 128 min_divider = 2 max_multiplier = 2 buffer_read_gate = 512MB buffer_read_size = 1GB anchor ################ The dedup **anchor** is used for tracking dedup rules and generally represents ``average chunk size`` of your deduplication. If **anchor** = 0, then we will calculate it from ``file size/anchor_divider`` and get upper bound for ``2 head of X``. ``X`` will be the **anchor**. However, after the calculation, if ``X > anchor_upper_bound``, then we use **anchor_upper_bound** as the **anchor** or if ``X < anchor_lower_bound``, then we use **anchor_lower_bound** as the **anchor**. The pseudo code is as below. .. code-block:: bash if anchor !=0 then anchor = anchor # stop break elif anchor == 0 and 2^Y < (file size / anchor_divider) < 2^X then anchor = X if X > anchor_upper_bound then anchor = anchor_upper_bound elif X < anchor_lower_bound then anchor = anchor_lower_bound .. note:: If you don't apply an **anchor** in the folder level configuration, GoWF defaults to ``anchor = 1``, which means GoWF won't run **dedup** or **purgededup** in this folder. anchor_upper_bound ################### The **anchor_upper_bound** is applied after the **anchor** is calculated and found to be ``LARGER`` than the **anchor_upper_bound**. If that is true, dedup will just use **anchor_upper_bound** as the **anchor**. anchor_lower_bound ################### The **anchor_lower_bound** is applied after the **anchor** is calculated and found to be ``SMALLER`` than the **anchor_lower_bound**. If that is true, dedup will just use **anchor_lower_bound** as the **anchor**. anchor_divider ################### The **anchor_divider** is the ``denominator`` ( **divider** ) of the **file size** for getting the anchor. .. code-block:: bash e.g. anchor calculation example file size = 10485760 10485760 / 128 = 81920 2^16=65536 < 81920 < 2^17=131072 anchor = 17 min_divider ################### After determining the **anchor**, the dedup function will use it to calculate the **chunk lower bound**, which can be derived by ``anchor / mid_divider``. .. code-block:: bash e.g. smallest chunk size example file size = 10485760 10485760 / 128 = 81920 2^16=65536 < 81920 < 2^17=131072 anchor = 17, average chunk size = 131072 smallest chunk size = 131072 / 2 = 65536 max_multiplier ################### After determining the **anchor**, the dedup function will use it to calculate the **chunk upper bound**, which can be derived by ``anchor * max_multiplier``. .. code-block:: bash e.g. max chunk size example file size = 10485760 10485760 / 128 = 81920 2^16=65536 < 81920 < 2^17=131072 anchor = 17, average chunk size = 131072 max chunk size = 131072 * 2 = 262144 buffer_read_gate ################### The **buffer_read_gate** is when your file is large and needs a memory buffer to read buffer by buffer instead of reading all of the content into memory. Thus, if your file size is larger than **buffer_read_gate**, the file will be read in pieces into buffer, given the buffer size, instead of reading the entire file at once. Of course, this will result in a much slower upload than if the file can be read in one pass. buffer_read_size ################### The **buffer_read_size** is the buffer size you would like to use, e.g. **buffer_read_size** = ``1073741824`` when doing ``128GiB`` file size processing. It will subsequently create read buffer of ``1GiB`` each time and read it ``128`` times.