SkySync Best Practices and Requirements

Introduction

SkySync is a highly scalable file and folder transfer engine capable of managing synchronization and copy processing operations between many different file management platforms.  It offers significant flexibility around all aspects of enterprise file logistics.  However, with flexibility comes a level of complexity in certain product configuration and deployment options. The whitepaper provides prescriptive guidance around a series of scenario-based deployment models.  The goal is to use these models to highlight certain configuration aspects around these models.  The ultimate purpose of this document is to educate migration/synchronization administrators on the concepts necessary for a successful and well-performing SkySync deployment.

Transfer Performance Factors

Before this document addresses the various performance models, it is important to understand the factors that influence transfer performance.  These variables all have a significant effect, positive or negative, on the throughput that any migration will achieve. Below is a list of the most critical factors that influence transfer performance:

1)Corpus Profile:  For the purposes of this document, the collection of all documents and folders located in any given storage platform is known as the “Corpus”.  The constitution of the corpus can have a significant impact on transfer throughput. Particularly when at least one of the transfer endpoints employs an API for managing content (basically any endpoint other than a standard server file system), the size and number of documents can have a dramatic effect on performance. 

Given 100GB of data, if that 100GB consists of (10,240) 10MB files, transfer throughput will be considerably higher than if that data consisted of (209,715) 500KB files.  This is because SkySync will have to make approximately 200,000 more API calls to transfer the 100GB of 500KB files vs 100GB of 10MB files.

So if the corpus is weighted more towards many small files versus relatively fewer large files, it should be expected that the transfer throughput will generally be lower due to the latency expense of significantly more API calls.

2) Source or Destination Rate Limiting:  When at least one of the transfer endpoints employs an API for managing content, those systems will generally employ some form of API “rate limiting”.  Essentially, most API based endpoint platforms implement algorithms that throttle the number of API calls that can be made in a specified length of time.

This is a necessary device to ensure that all tenant environments have an opportunity to experience the same level of performance.  It also provides a mechanism to guard against Denial of Service (DoS) attacks that could potentially cause all tenant environments to become inoperable.

Typically, when a highly scalable transfer application such as SkySync “trips” the API throttle mechanism, the API platform will send a rate limit message to the calling application.  When this happens, SkySync has no choice but to “back off” and wait for an increasing amount of time until throttle messages are no longer encountered.  This process of throttling and backing off can result in a significant decrease in transfer throughput.  SkySync utilizes “Smart-Throttling” technology to pull and push data as fast as it possibly can while respecting these rate limit messages.

3) Source Platform read performance:  It is possible to saturate a source platform with requests such that adding additional job processing servers and threads no longer improves transfer        performance.  Also, when saturation occurs, end users are typically being affected.  So, we must balance transfer performance with the ability of end users to continue performing job functions.

4) Number of Parallel Writes.  The number of parallel writes configured in SkySync determines how hard SkySync hits the destination platform.  If that platform is API based, the number of parallel writes directly correlates to the likelihood of seeing rate limit messages.  When the destination connector supports multiple user connection pooling, this is much less of a factor and Parallel Writes can often be increased.  If the destination connector does not support multiple user connection pooling, the likelihood of seeing rate limit messages is much higher and often the number of Parallel Writes must be decreased in the SkySync configuration.

5) Number of Processing Threads.  Each SkySync server can be configured for a specific number of processing threads.  By default, SQLCE based deployments allow up to (2) processing threads.  By default, SQL Server Express and standard SQL Server based deployments allow up to (6) processing threads.  In general, if rate limits are not being experienced and the SQL Server CPU, RAM, and disk I/O are not being taxed, then the number of processing threads can be increased.

6) Number of SkySync servers.  More processing servers running more job processing threads typically results in a higher overall transfer throughput and a lower overall transfer duration at the expense of additional hardware and administrative overhead.  However, there is a limit to this as mentioned above.

7) SQL Server Performance.  SQL Server storage I/O, CPU, and RAM resources all affect the performance of the migration from end to end.  During the data transfers, SkySync audits source and destination file system objects allowing it to manage copy and synchronization processes while respecting delete propagation and conflict resolution rules.  However, logging operations can impact transfer throughput when the SQL Server is not properly scaled based on the amount of content being tracked in the database and the number of SkySync servers and threads that are processing work.  For this reason, SQL disk I/O should be optimized for highly intense READ and WRITE operations (generally enterprise class Solid State disk based storage subsystems are recommended for data volumes).  This is also one of the reasons that ensuring the SkySync database is configured for “Simple” mode recovery model is recommended.  This allows for reduced log file I/O.

8) Network performance.  Network performance affects all aspects of the migration.  For example, legacy document retrieval, exported binary storage, SQL Server call latency, import server binary retrieval, and uploading to Office 365 are all affected by network performance. SkySync provides network utilization controls to manage network saturation.

9) Network / Cloud Connection Performance. The level of available bandwidth to connect to the source and destination endpoints is a significant factor in transfer throughput.  For on premise to on premise transfers, network latency is much less of a factor.  However, when a cloud storage platform is in play as a source, destination, or both, then internet bandwidth and latency becomes a significant factor in transfer performance.

For cloud to cloud migrations, it is strongly recommended that Azure or AWS VMs be used to host SkySync.  In addition to significant potential benefits with data center co-location between SkySync processing and the source or destination endpoint, it simply doesn’t make a lot of sense to pull data down from the cloud, on premise, and then push it back up to the cloud.  This transfer path is very inefficient. This concept will be expanded upon in additional prescriptive guidance later in the document.

Ultimately, it is impossible to accurately predict transfer throughput performance until the end-to-end infrastructure is configured and testing is performed. Based on various transfer throughput metrics, it is possible to make small adjustments to configuration or major adjustments like adding hardware resources to improve the weakest link within the throughput performance chain.  Initial phases of deployment and configuration will include a series of performance tuning configuration adjustments.  Throughput metrics should not be gathered until performance tuning has been completed.

Guidance Models

SkySync can scale from very small, easy to implement single server deployments to very large multi-server clustered deployments.  For simple implementations with a few million documents or less where transfer performance isn’t of primary concern, SkySync can be implemented very easily.  But for larger corpus transfer solutions or when throughput is of high value, it’s important to understand the correct architectural considerations.

Because there are significant differences in the deployment guidance based on the size and throughput requirements of various transfer solutions, this guidance will be broken down into (3) different architectural models.  Actual transfer throughput will not be identified as part of this guidance due to the significant variables involved in achievable throughput as identified in Transfer Performance Factors above. 

The models below are provided for the purposes of providing general architectural examples that serve as a good starting point for solution design.  SkySync can be easily scaled up if necessary and prudent.

Model A – Low Volume / Modest Throughput

This model represents deployments that include the transfer or synchronization of (3) million file and folder objects or less.  Achievable throughput will be somewhat modest and generally isn’t the top priority for the solution.  This will be an easy configuration with minimal planning requirements. While this whitepaper can help and deployments can be fully managed by typical administrative staff, SkySync recommends that the Client Solutions Silver Launch Package be utilized to ensure solution success.

Model A SkySync Architecture Guidance

The following architecture configuration concepts apply to Model A:

  • This is a single server model with no clustering configured

  • 8+GB RAM, 60GB+ system drive, dual core processor or better

  • Windows Server 2008 SP2 or newer, fully patched, with .NET framework 4.5 or newer

  • SkySync installation generally consists of accepting the default answers which results in the creation and usage of a local SQL CE database

  • SQL CE supports a maximum database size of 4GB

  • If the 4GB SQL CE database size is reached, it is possible to convert the database to a full SQL Server database format

  • When SkySync is deployed to use SQL CE for the database, a maximum of (2) jobs may be run simultaneously

  • This model is intended for ease of implementation as opposed to high throughput and/or redundancy

Model B – Moderate Volume and/or Moderate Throughput

This model represents deployments that include the transfer or synchronization of (10) million file and folder objects or less at a moderate throughput.  This model essentially represents the scale of what can be reasonably accomplished with a single server solution that is provisioned with better than standard resources.  This solution will require some planning around database configuration and overall server specifications. Again, this whitepaper will be useful in deploying this solution model, but SkySync strongly encourages all customers to take advantage of the Client Solutions Gold Launch Package be utilized to ensure solution success.

Model B SkySync Architecture Guidance

The following architecture configuration concepts apply to Model B:

  • This is a single server model with no clustering configured

  • 32+GB RAM, 60GB+ system drive (solid state if possible), 4 or 8 core processor or better

  • Windows Server 2008 SP2 or newer, fully patched, with .NET framework 4.5 or newer

  • Before SkySync installation, SQL Server Express or full SQL Server Standard/Enterprise should be deployed on the SkySync processing server

  • Database Planning and Tuning Concepts (below) should also be implemented

  • SQL Server Express supports a maximum database size of 10GB

  • SQL Server Standard/Enterprise maximum database size is not a factor for this model

  • When SkySync is deployed to use SQL Server Express/Standard/Enterprise for the database, a maximum of (6) jobs may be run simultaneously by default

  • SkySync Tuning Concepts (below) may be employed to increase throughput when processing server resources are available and rate limiting is not a factor

  • This model is intended for advanced, single server implementation as opposed to the simpler default install that uses SQL CE

Model C – High Volume and/or High Throughput

This model represents deployments that include the transfer or synchronization of more than (10) million file and folder objects and/or transfers or synchronizations that require a very high throughput.  Achievable throughput will be significantly determined by available bandwidth, network latency, SQL Server I/O and other resources, number of processing servers/threads and the level of API rate limiting on either the source or destination. This solution will require very careful planning around database configuration, SkySync cluster configuration, cluster specifications, and cluster location with respect to source and destination location.  To be able to achieve the highest levels of throughput, SkySync Client Solutions will need to be engaged to assist with planning, deployment and cluster tuning processes.

Model C SkySync Architecture Guidance

Exact cluster sizing guidance can be provided by SkySync Client Solutions based on an assessment of customer requirements.  For the purposes of this document, a (4) server cluster is specified.  With proper configuration (and guidance from SkySync Client Solutions), an extreme scale SkySync cluster serviced by a single high performance SQL Server (or Availability Group) has been tested to support up to (10) high performance processing servers. 

The following architecture configuration concepts apply to Model C:

  • This is a (3) processing server and (1) database server cluster model

  • SkySync Processing Servers (3): 16+GB RAM, 60GB+ system drive, 4 core processor or better

  • Microsoft SQL Server [Standard Scale] (1): 32+GB RAM, 60GB+ system drive, 200GB data drive, 8 core processor or better

  • Microsoft SQL Server [Extreme Scale] (1): 128+GB RAM, 60GB+ system drive, 300GB solid state data drive (high IOPS/low latency), 300GB solid state tempdb/snapshot drive (high IOPS/low latency) to support report processing.  SQL Server I/O performance will be a significant driving factor in cluster transfer throughput and stability.

  • Microsoft SQL Server [Extreme Scale] additional guidance: The Performance Best Practices for SQL Server in Azure Virtual Machines whitepaper from Microsoft provides additional guidance for building a high scale SQL Server instance in an Azure VM.  Much of this guidance also applies to an on premise high scale SQL Server as well.

  • SkySync Processing Server OS: Windows Server 2008 SP2 or newer, fully patched, with .NET framework 4.5 or newer

  • Before SkySync installation, SQL Server Standard/Enterprise should be deployed

  • In order to deploy SkySync in a clustered configuration, follow the Guide to Installation and Upgrade of Multiple SkySync Nodes.

  • Database Planning and Tuning Concepts (below) should also be implemented

  • SQL Server Standard/Enterprise maximum database size is not a factor for this model

  • When SkySync is deployed to use SQL Server Express/Standard/Enterprise for the database, a maximum of (6) jobs may be run simultaneously by default

  • SkySync Tuning Concepts (below) may be employed to increase throughput when processing server resources are available and rate limiting is not a factor

Database Planning and Tuning Concepts

The database subsystem can have a significant positive or negative impact on performance and scalability of the SkySync solution.  That said, not all SkySync solutions are required to be high scale implementations.  The concepts in this section walk through database deployment and tuning guidance depending on the amount of content being transferred and the desired performance of the solution.

At the most basic level, SkySync can function with a simple SQL Server CE database that is deployed automatically by the SkySync installer application.  SkySync also supports using SQL Server Express/Standard/Enterprise versions 2012, 2014 and 2016.  There are however significant differences that can impact a SkySync synchronization or migration configuration in each of these versions. 

Microsoft SQL Server Compact Edition (CE)

Microsoft SQL CE is a small footprint relational database that is useful for low scale applications.  It is a freely licensed database that works well with SkySync solutions that do not require high throughput levels and that will manage a lower number of file and folder objects (3 million or less).  The 3 million is just a rough number and there are other factors.  The primary limiting factor of SQL CE is the 4GB database file size limitation.

It is not recommended to use SQL CE when the database has over 1 million objects, as it can result in large performance issues.

Microsoft SQL Server Express

Microsoft SQL Express is also free to download and use.  It is a more robust database engine that is essentially a “light” version of the full SQL Server Standard database engine with many of the same capabilities.  SQL Express can manage many more file and folder objects in a SkySync solution (up to approximately 10 million).  It does, however, have some key limitations that affect solution scalability.  These limitations make SQL Express a reasonable candidate for a single server SkySync solution when SQL Server licensing cost is an issue but they do limit its effectiveness as a database that can support a multi-node SkySync cluster.  Those limitations include:

  • A maximum database size of 10GB (SQL Server 2008 R2 Express and higher)

  • No SQL Server Agent Service to handle job scheduling and automated tasks.  Database maintenance and backups are generally a manual effort involving command line operations and Windows Scheduler.

  • Will consume a maximum of 1GB RAM even when more is available

  • Limited to 1 physical CPU.  Note that it can consume up to 4 CPU cores available in a single physical CPU.

Microsoft SQL Server Standard

Microsoft SQL Standard is the typical SQL Server platform for a standard multi-node SkySync clustered solution.  It supports up to 24 CPU cores, 128GB RAM, a maximum database size of 524PB (petabyte), database backup compression, and database snapshots.  So it can easily handle extreme scale SkySync solutions when properly configured.  That said, it does have a couple of key limitations that affect extreme scale SkySync solutions:

  • Does not support AlwaysOn Availability Groups (AG) redundancy

  • Does not support online table index rebuilds

  • Limited RDBMS Scalability and Performance features such as Resource Governance

Microsoft SQL Server Enterprise

Microsoft SQL Enterprise is the gold standard SQL platform for a standard or extreme scale multi-node SkySync clustered solution.  CPU and RAM limitations are only bound by Operating System maximums.  Database file size can be as large as 524PB (petabyte).  Database backup compression and SQL snapshots are supported.  AlwaysOn Availability groups, online table index rebuilds and full resource governance is also available.

SQL Enterprise will always be the best choice for extreme scale SkySync solutions particularly when custom reporting is required, which is almost always the case in larger synchronization / migration solutions.  The ability to run maintenance jobs that run online index rebuilds without interrupting 24/7 transfer processing is key to maintaining throughput performance.  The ability to query large processing tables using SQL snapshots also ensures that SQL interaction from continuous transfer operations are not impacted.

Disk I/O Performance

Disk I/O performance will have a direct impact on the number of SkySync servers and total processing threads that can be used to execute transfer jobs.  Because more concurrent transfer jobs (usually) means higher total throughput, database resources, particularly disk I/O, must be carefully planned for any SkySync solution where performance is important. The most important storage resource for SkySync transfer performance and scalability is the disk volume where the SkySync database data file is stored.  Also, the disk volume where the SQL Server TEMPDB is stored can be very important as well if custom reporting packages will be processing against SkySync data tables or SkySync database snapshot data tables.

“Standard” storage performance for a given organization disk subsystem will be sufficient when there are from approximately 1 to 3 or maybe 4 processing servers.  Anything more than that or when a very large volume of file and folder objects will be under transfer management (10’s of millions) and the storage volume performance must be better than standard. For these extreme scale SkySync solutions with (5) or more clustered processing servers, it’s good to start thinking about Solid State Drive (SSD) class or Tier 1 class storage.  Some organizations consider SSD class performance to be Tier 0.  The point here is that very high IOPS support and very low latency becomes the single most important factor in ensuring high sustained transfer throughput and overall solution stability throughout the SkySync processing cluster.  The SkySync transfer engine can be massively parallelized and given the amount of data logging that occurs during data transfer operations, this can put tremendous READ/WRITE pressure on the SQL Server database disk subsystem.

Guidance for disk performance is simple for extreme scale solutions.  If throughput is the highest priority, then the SkySync SQL Server should be provisioned with the best available disk the organization has access to.  If the disk is strong enough, then additional processing servers can be added to the cluster.  If not, then adding additional processing servers will actually make transfer performance or cluster stability worse.  So the key then becomes how to understand whether or not the disk subsystem is “keeping up” with requests.

The easiest way to monitor the SQL disk I/O subsystem to ensure that it is servicing requests fast enough is to monitor the Disk Queue Length available in Resource Monitor.  The easiest way to do this is to launch the Windows Task Manager, navigate to the “Performance” tab where basic system performance can be viewed first.  Then click the “Open Resource Monitor” link at the bottom of the Task Manager

Once the Resource Monitor is open, navigate to the “Disk” tab and observe the disk queue length for the logical disk associated with the volume where the SkySync database and TempDB database data files are stored. The general rule of thumb is that Disk Queue Length should be less than the total number of physical drives that comprise the volume.  That can get complicated with certain storage arrays where the number of disks for a given volume is obfuscated or, in the case of an SSD volume, there may only be a single drive. 

In general, the ideal Disk Queue Length is less than 1.  Anything in the single digit range during heavy processing is generally OK.  If Disk Queue Length is in the low double-digit range then the disk is a little under powered and is beginning to impact performance.  But if Disk Queue continues to climb, or is in the hundreds or thousands range, then your disk subsystem is certainly insufficient to handle incoming requests.  In this situation, the disk subsystem will need to be upgraded to improve transfer throughput.  If this is not possible, then it is recommended to shut down one or more SkySync processing servers while monitoring Disk Queue Length until reasonable numbers are achieved and maximum transfer throughput is identified.

SQL Instant File Initialization

As data and log files grow in a SQL Server, by default, during the extent operation SQL Server will overwrite any data in the new segment with zeros.  This is a security measure designed to make sure that data from any old files, that once consumed the same space on disk, have no possibility of being read by SQL administrators.  It is a pretty edge case scenario but it is a very minor security risk so, by default, Instant File Initialization is not enable by default which allows the “zero overwrite” to occur.

This becomes a relevant concept as SQL databases grow by substantial amounts (another topic addressed below).  An extent operation of several hundred megabytes or even gigabytes can take many seconds or even minutes to occur under the default condition when zeros must be written.  During this time, the tables in the database are blocked.  Blocked tables are very bad for SkySync’s highly tuned transfer scheduler engine.  This condition can cause unexpected behavior as job processing threads try to figure out how to continue.

There are (2) ways to combat this condition.  The best way to ensure fast extent operations is to implement Instant File Initialization.  Another way, also recommended as a best practice, is to pre-create multiple data files that are pre-sized in anticipation of the amount of data that may potentially be stored in the database.  This latter option will be addressed below in further detail. 

The issue with the second option is that it can be difficult for inexperienced SkySync Administrators and DBAs to determine just how big those pre-sized files should be.  So, the tendency is to over-allocate which can waste precious high performance disk space.  Essentially, the best solution is a combination of the two.  Pre-create and conservatively pre-size the SkySync and TempDB database data files to minimize extent operations.  Then enable Instant File Initialization to ensure that if there must be extent operations, they happen quickly.

Steps to Enable Instant File Initialization

This topic is discussed in greater detail on blog sites from leading SQL experts such as Kimberly Tripp and Brent Ozar as well as others.  But these are the general steps are included below:

1) Ensure that the SQL Server Service is running as a domain service account or at least a standard local service account (essentially not System Account or another “special” account).

2) Launch the Local Security Policy editor by typing and then executing “secpol.msc” after clicking the Start button.

3) Navigate to Local Policies è User Rights Assignment and then edit the “Perform Volume Maintenance Tasks” policy.

4) Add the SQL Server service account to this list and click OK.

5) Restart the SQL Server service

Pre-Creating and Pre-Sizing SQL Data Files

SQL Server experts suggest that multiple database files can have a meaningful impact on database performance.  Paul Randal is one of the leading authorities on all things SQL Server.  His company, SQLSkills, maintains a website full of useful and reliable information regarding SQL Server.  Among the many articles in his “In Recovery…” blog is this one which highlights the benefits of properly scaling out database data files.  It also draws attention to potential performance pitfalls of not doing it correctly.

When a database is first created in SQL Server through the user interface, the database will be created with a single “mdf” data file by default.  It is possible to create this database with the primary “mdf” data file and then multiple “ndf” data files as well.  This gives the database an opportunity to store content across multiple data files. Configuring SQL data files in this way allows for improved data access performance because it allows SQL Server to multi-thread disk I/O operations.  This is particularly helpful when the administrator has the freedom to store each of the data files on a unique, high performance disk volume.  This technique can be used to enhance the ability of the SkySync solution to scale out with many processing servers. 

The general rule of thumb for the number of data files that should be created is ¼ to ½ the number of physical cores available to SQL Server.  For an 8 core SQL Server, deploying (2) to (4) data files is ideal.  For a 16 core SQL Server (4) to (8) files would be good.  For fewer cores, trend towards the “1/2” number.  For a machine with 16 cores or more, trending towards the “1/4” number is generally typical. 

SkySync Database Maintenance

Given the highly transactional nature of any migration platform, database maintenance plans become very important.  Without proper maintenance, indexes can quickly become fragmented resulting in reduced performance.

It is beyond the scope of this document to provide specific scripts for database maintenance.  However, it is important to run index defragmentation and reorganization scripts at least twice per week to ensure that SkySync tables are properly maintained.  If maintenance scripts are not standardized in the organization, Ola Hallengren provides very useful scripts that can rebuild indexes across all tables in a database but only when necessary.

SkySync Database Recovery Mode

Like any migration solution, the data in the database can generally be considered transient.  In other words, even if the database is lost, organizational content is not lost.  This means that while database backups are useful to minimize any lost migration or synchronization processing time, recent log backups are not necessary.

For these reasons, it is recommended that the SkySync database be configured for Simple Recovery model.  There is no need to waste storage space and processing resources on data logging. 

SkySync Database Backups

With the SkySync database operating in the Simple Recovery model, database backups are straightforward.  For high throughput solutions, a nightly full backup is generally sufficient.  However, if the Recovery Point Objective (RPO) for the organization indicates that a full day of migration/synchronization processing is too much time loss, then additional incremental backups can be scheduled at intervals throughout the day.

Ideally, database backups should be executed with compression enabled to minimize backup storage size and backup duration.  However, this will come at a cost of increased CPU utilization on the SQL Server which is important to consider.

SkySync Tuning Concepts

SkySync includes a wide array of options for both scale up and scale out solutions to drive higher overall transfer throughput. 

Parallel Processing Concepts

Simply having a multi-threaded transfer processing engine is not good enough in the complex world of migration and synchronization.  This is because, for any given transfer operation, source and destination systems operate at different speeds.  The SkySync solution manages this inequity by providing a single processing thread for each overall job but then it also provides an additional pool of “parallel write” processors that are available to all job threads. 

This architecture allows for the transfer solution to scale as high as possible on both the content source (job processing threads) and the transfer destination (parallel writes).  The advantage of this architecture is that both source retrieval and destination insert processing is maximized. 

There are limits to scaling (up and out) the number of processing threads as well as the number of parallel writes.  These limits are discussed below.

Scaling Up

In the world of server processing, conventional wisdom indicates that when 80% of any one server resource are at capacity, it is time to consider scaling server resources (either up or out). 

In terms of SkySync processing servers, the most important server resource is usually network capacity followed by CPU and then RAM.  Since SkySync is a stream based content transfer engine, local disk performance is not much of a concern other than for local job logging when necessary. 

In general, it is typically more efficient and easier to manage fewer servers that are scaled up higher than it is to manage more servers that aren’t taking advantage of available resources. 

If scalability is a requirement, it is best to start with a 4 CPU server and 8/16GB of RAM.  From there, begin testing the execution of the default number of concurrent jobs (6) to evaluate the impact to system resources while also tracking transfer performance.  All aspects of the SkySync solution (including database metrics) should be monitored during this process so that the weak link can be identified for each given test.  Note that it may be necessary to test increasing parallel writes at the same time to ensure that the bottleneck isn’t caused by destination performance (see Parallel Writes below to identify limits). 

Assuming that throughput performance is good with 6 concurrent jobs, the SkySync administrator can begin to increase this number to maximize performance on a single server.  In order to change the number of concurrent jobs, follow this procedure:

  • Close the SkySync user interface and stop the SkySync service

  • Navigate to c:\Program Files (x86)\SkySync

  • Open “appSettings.config” using a text editor in administrator mode

  • Add this key to the “<appSettings>” configuration node:

    <add key="quartz.threadPool.threadCount" value="10" />

  • Start the SkySync service and open the SkySync user interface as necessary 

The procedure above will allow for control of current jobs.  Note that if the quartz.threadPool.threadCount value is changed, the SkySync service must be restarted. 

Continue to ratchet up the number of concurrent jobs while managing parallel writes and monitoring processing server and database server resources.  As long as throughput continues to increase without rate limiting (see Parallel Writes below) and both the processing server and the database server still have resource headroom, then the number of concurrent jobs can continue to be increased. 

With just (1) processing server in the SkySync solution, scaling up the concurrent jobs and parallel writes will eventually cause the server to reach 80% capacity for network, CPU, or RAM.  When this happens, the admin can either add additional resources to the server, or consider a scale OUT solution. 

Scaling Out

Once the optimal number of parallel processing threads is identified for a given processing server hardware configuration, it is best to scale out using the exact same configuration.  Having the processing servers use the same configuration will make it easier to manage transfer processing tests and eventually the transfer jobs during active migration. 

Start with adding a 2nd processing server using the exact same specifications and SkySync configuration.  It is very possible that parallel writes may begin to once again present an issue with rate limiting now that double the throughput is available to push to the destination.  Continue to manage Parallel Writes (see below) as much as possible. 

If adding the 2nd server results in rate limiting, then the maximum number of SkySync servers has been reached.  Back down the number of concurrent jobs on each server and the number of parallel writes until consistent peak performance without rate limiting is achieved.  

If the 2nd server once again reaches 80% capacity and testing indicates that there is still performance headroom on parallel writes and with the SQL Server, then continue to repeat the process of adding additional servers until:

     1)Rate limiting becomes unavoidable

     2)The SQL Server begins to show strain, particularly in the I/O subsystem

     3)Throughput is sufficient for the project and additional servers would add unacceptable administrative overhead

     4)The cost to operate additional processing servers is unacceptable 

Once one of the above conditions has been reached, then there is no need to continue adding processing servers. 

The SkySync database is highly optimized.  If the SQL Server is properly scaled, particularly in the I/O subsystem, a tremendous amount of parallelism is possible.  A transfer solution of over 200 concurrent processing jobs has been successfully implemented resulting in a cloud to cloud transfer peak rate of over 22TB per day.  This occurred under very carefully managed conditions, but it is a very real and proven throughput. 

Parallel Writes

The concept of “Parallel Writes” is an interesting concept that can have a significant effect, positively or negatively, on transfer throughput.  This setting allows SkySync to determine exactly “how hard” (how parallel) the API of the destination solution is called.  

Many cloud platforms implement throttling techniques to rate limit API calls by user or by tenant.  By dialing the parallel writes up or down, SkySync can maximize the number of calls made to the destination platform in attempt to maximize throughput while minimizing rate limiting. 

Throttle algorithms can vary widely from provider to provider and are often dynamic depending on the time of day or load on the destination platform.  Many throttle algorithms operate “by user”.  For many SkySync connectors, SkySync allows the implementation of a “user pool”. 

When user pooling is possible, it can be very powerful.  It allows SkySync to share API calls across many user instances.  This can dramatically increase the number of API calls, and thus throughput, allowed for transfer operations.  However, even this solution has limits.  At some point, this enhanced level of user parallelization can show up on destination system threat radar as a form of Denial of Service (DoS) attack.  When this happens, many cloud platform systems have manual override capabilities that can completely shut down, or severely throttle the migration.  

The best practice is to increase concurrent jobs and parallel writes until throughput targets for the transfer project have been reached, then go a little bit higher to account for down time and let that be the limit for throughput.  Remember that all cloud platforms are shared hardware solutions.  It is always good to be a responsible tenant! 

To increase parallel writes, follow this procedure:

      1) In the SkySync management application click the settings icon in the upper right corner: 

             

     2)Click on the performance tab in the SkySync configuration window:

        

    3)Increase or decrease parallel writes as necessary by (1) or (2) at a time. 

Parallel writes can be increased as long as rate limiting is not observed and as long as performance continues to increase.  If performance does not continue to increase or if rate limiting begins to happen then parallel writes should be decreased. 

When multiple servers are scaled up and/or out, it is common to see parallel writes at (4) or (6). But given the right conditions, they may be able to go higher.  For example, when the destination is a Network File Share, then rate limiting is not really a factor.  As long as I/O is not saturated, parallel writes can be increased. 

Rate Limits

As mentioned above, rate limiting or throttling can happen when too many calls are made to a cloud platform API in too short of a time.  When this occurs, most cloud providers require an incremental back-off strategy.  This means that SkySync must wait a specified period of time if a rate limit is encountered before retrying.  Once the retry happens, if SkySync receives another rate limit error message, then it must double the original wait time before trying again.  This pattern continues until the rate limit error is not encountered.  However, given the increasing back-off strategy, this can dramatically reduce transfer throughput. 

Ultimately, this means that it is very important to configure SkySync in such a way that rate limits do not occur. 

If rate limits do occur, they will be represented as warnings in the Job History as shown below.

SkySync High Availability Concepts

There are two primary architectural nodes in a SkySync configuration.  While the database node and the SkySync processing server node can both exist on the same server, they can also be deployed in such a way as to facilitate high availability 

Database Node

As Mentioned above, the SkySync database can operate on SQL Server Express or even SQL CE on the same box as a SkySync processing node.  However, for high availability, it should be deployed in a fault tolerant solution. 

It is beyond the scope of this document to provide architectural guidance for SQL Server high availability, but standard high availability solutions are supported by the SkySync platform.  This would include SQL Server Availability Groups or other clustered SQL Server solutions.  However, it is important to understand that any delays in transaction processing may impact overall SkySync solution scalability and throughput.  For example, a synchronous mirroring solution over a long distance or otherwise high latency link could have significant negative impacts on transfer throughput. 

SkySync Processing Node

All SkySync processing servers talk directly to the database.  There is no centralized communication server or other single point of failure in the SkySync solution.  If there are at least 2 processing servers operating on independent hardware, the SkySync Processing solution is highly available.  If one server becomes unavailable, another available server will eventually pick up where the failing server left off and continue processing the job.Summary

The purpose of this document was to provide SkySync customers with the background information and prescriptive guidance necessary to maximize transfer throughput while also minimizing the overhead of managing the SkySync solution.  The information in this document will be instrumental for organizations that are able to manage their own transfer jobs.  

SkySync is carefully architected to ensure that the software will never be the bottleneck of any transfer solution.  But it can be very difficult to manage all of the variables that affect transfer rate in order to find the best configuration. 

It is also important to know that SkySync Client Solutions is highly experienced in the art of transfer solution optimization.  They are available to help ensure transfer projects are executed as efficiently and effectively as possible.

SkySync Connections and Security 

SkySync stores connection information (ex. user name, password, url, unc, etc.) for some platforms encrypted in its database. This makes it important to secure the server running SkySync’s local file system to add additional protection to this information. 

Future versions of SkySync may use additional strategies to encrypt and protect connection information like the DPAPI (data protection API) or DPAPI-NG (next generation), along with customer controlled encryption with an encryption key provided by the user at install time. When migrating SkySync to another server it may be necessary to copy encryption key files manually to ensure shared databases can correctly decrypt this sensitive data.


Cloud platforms, use other authentication mechanisms like OAuth2 and only present an access token to SkySync instead of allowing visibility into credentials.


In general, SkySync does not control, edit, move, modify, or otherwise interact with security directly. SkySync acts as an external user to storage systems, interacting with them via their public API’s.

For more information on SkySync and security please see the SkySync Security document.

image010.png

Recommendations:

• Determine up front as part of your security design how SkySync will access the content you will ask it to transfer. Consider SkySync another “power user” in your organization manipulating content.
• Integrated Authentication scenarios can be tricky to configure due to Windows Service Identity and other (ultimately) permissions based issues.
• Avoid creating lots of connections by correctly defining security permissions, groups, and proxy users. SkySync works best with platforms that have rich API support for concepts such as On Behalf Of, setting time stamps, and methods of ownership preservation.

Summary

The purpose of this document was to provide SkySync customers with the background information and prescriptive guidance necessary to maximize transfer throughput while also minimizing the overhead of managing the SkySync solution.  The information in this document will be instrumental for organizations that are able to manage their own transfer jobs.  

SkySync is carefully architected to ensure that the software will never be the bottleneck of any transfer solution.  But it can be very difficult to manage all of the variables that affect transfer rate in order to find the best configuration. 

It is also important to know that SkySync Client Solutions is highly experienced in the art of transfer solution optimization.  They are available to help ensure transfer projects are executed as efficiently and effectively as possible.