datastore: implement sync-level tuning for datastores

currently, we don't (f)sync on chunk insertion (or at any point after that), which can lead to broken chunks in case of e.g. an unexpected powerloss. To fix that, offer a tuning option for datastores that controls the level of syncs it does: * None (default): same as current state, no (f)syncs done at any point * Filesystem: at the end of a backup, the datastore issues a syncfs(2) to the filesystem of the datastore * File: issues an fsync on each chunk as they get inserted (using our 'replace_file' helper) and a fsync on the directory handle a small benchmark showed the following (times in mm:ss): setup: virtual pbs, 4 cores, 8GiB memory, ext4 on spinner size none filesystem file 2GiB (fits in ram) 00:13 0:41 01:00 33GiB 05:21 05:31 13:45 so if the backup fits in memory, there is a large difference between all of the modes (expected), but as soon as it exceeds the memory size, the difference between not syncing and syncing the fs at the end becomes much smaller. i also tested on an nvme, but there the syncs basically made no difference Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2025-10-24 11:36:02 +00:00 · 2022-10-20 09:40:56 +02:00 · 2022-10-20 09:40:56 +02:00 · e0fb53e41d
commit e0fb53e41d
parent 4a13373c4b
1 changed files with 37 additions and 0 deletions
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@ -168,6 +168,42 @@ pub enum ChunkOrder {
    Inode,
 }

+#[api]
+#[derive(PartialEq, Serialize, Deserialize)]
+#[serde(rename_all = "lowercase")]
+/// The level of syncing that is done when writing into a datastore.
+pub enum DatastoreFSyncLevel {
+    /// No special fsync or syncfs calls are triggered. The system default dirty write back
+    /// mechanism ensures that data gets is flushed eventually via the `dirty_writeback_centisecs`
+    /// and `dirty_expire_centisecs` kernel sysctls, defaulting to ~ 30s.
+    ///
+    /// This mode provides generally the best performance, as all write back can happen async,
+    /// which reduces IO pressure.
+    /// But it may cause losing data on powerloss or system crash without any uninterruptible power
+    /// supply.
+    None,
+    /// Triggers a fsync after writing any chunk on the datastore. While this can slow down
+    /// backups significantly, depending on the underlying file system and storage used, it
+    /// will ensure fine-grained consistency. Depending on the exact setup, there might be no
+    /// benefits over the file system level sync, so if the setup allows it, you should prefer
+    /// that one. Despite the possible negative impact in performance, it's the most consistent
+    /// mode.
+    File,
+    /// Trigger a filesystem wide sync after all backup data got written but before finishing the
+    /// task. This allows that every finished backup is fully written back to storage
+    /// while reducing the impact on many file systems in contrast to the file level sync.
+    /// Depending on the setup, it might have a negative impact on unrelated write operations
+    /// of the underlying filesystem, but it is generally a good compromise between performance
+    /// and consitency.
+    Filesystem,
+}
+
+impl Default for DatastoreFSyncLevel {
+    fn default() -> Self {
+        DatastoreFSyncLevel::None
+    }
+}
+
 #[api(
    properties: {
        "chunk-order": {
@ -182,6 +218,7 @@ pub enum ChunkOrder {
 pub struct DatastoreTuning {
    /// Iterate chunks in this order
    pub chunk_order: Option<ChunkOrder>,
+    pub sync_level: Option<DatastoreFSyncLevel>,
 }

 pub const DATASTORE_TUNING_STRING_SCHEMA: Schema = StringSchema::new("Datastore tuning options")