This article covers the primary performance considerations when migrating files between cloud providers
1. Microsoft Import Service instead of CSOM
For migration to Microsoft 365, Cloudiway is leveraging the Microsoft Import Service.
It is usually the bottleneck of the migration.
Why are we using this service?
Files can be migrated to SharePoint (SharePoint sites, Team sites, OneDrive) using the CSOM (Client Side Object Model) APIs or the Microsoft Import Service.
Microsoft has put huge throttling policies in place that prevent to use CSOM for migrations (a large number of users by a large number of files can result in millions of calls that will be throttled).
Instead, Microsoft has implemented the Microsoft Import Service for uploading data to SharePoint as it has better performance.
2. Limitations of this service
2.1. Blob storage lifetime
Data are first uploaded (and encrypted) into a Blob Storage hosted in the Microsoft infrastructure.
This Blob Storage has a short term, preventing sending hundreds of jobs.
If you submit too many jobs and Microsoft doesn’t have the time to process them, the jobs are lost.
2.2. Slowness of the service
The jobs are submitted asynchronously and Microsoft provides no ETA for the execution.
A new job may stay scheduled for several hours if many jobs have been submitted.
Once the job is started, it may take hours to complete. For instance, if you submit a job containing 1000 folders to create in SharePoint, the job may take 1 or 2 hours to complete.
2.3. Maximum number of files to submit in a job
Microsoft recommends submitting no more than 1000 files per job.
However, if you submit 1000 files in a very complex folder structure, you may have to submit 5000 or 10,000 nodes, not only 1000.
For instance:
/folder/subfolder/subfolder/subfolder/subfolder/subfolder/subfolder/etc/fil1
/folder/subfolder1/subfolder/subfolder/subfolder/subfolder/subfolder/etc/fil2
/folder/subfolder3/subfolder/subfolder/subfolder/subfolder/subfolder/etc/fil3
All the folders have to be created and must exist in order to migrate the files.
Either you have to submit jobs in advance in order to pre-create the folders, or you have to submit the request to also synchronize the folders in the jobs and you end up submitting 10,000 creation requests in a single job, even if you submit only 1000 files.
Another problem is that the Microsoft import job stops processing after hitting 100 errors (even cosmetic errors).
When this happens, Microsoft doesn’t tell you which files have been successfully processed.
You will have to resubmit the job entirely, even if it took you hours to extract them from the source.
For these various reasons, Cloudiway is submitting jobs with no more than 200 files at a time.
2.4. Maximum number of submitted jobs
Based on our experience, we know that we can’t submit hundreds of jobs simultaneously.
In the past, we tried to submit 1000 jobs. Microsoft processed the jobs so slowly that the Blob Storage expired before the jobs were processed and we had to recreate the jobs (and download the files from the source again).
For this reason, we are sending no more than 10 jobs per user to migrate.
When 10 jobs are submitted and not processed yet by Microsoft, we keep preparing additional jobs, but we are not submitting them yet. As soon as we detect that Microsoft has processed new jobs, we submit new jobs.
When we have accumulated more than 50 non-submitted jobs, we must stop the migration, sleep for a while (between 15 to 90 minutes), and restart.
This may cause extra delays, but the primary reason is that the Microsoft Import Service has not processed the jobs.
-
No more than 10 submitted jobs per user
-
No more than 50 non-submitted jobs per user
-
Sleep 15 minutes (Pending state) and restart migration when the limit of 50 jobs is reached.
3. Discovery Implementation
As explained above, we have to pre-create the folders.
Otherwise, we would have to include the folder hierarchy in every submitted job, and the processing of each job by the Microsoft Import Service would be excessively slow.
Pre-creating the folder structure has performance impacts.
Each migration starts, a discovery process must iterate through each folder, extract the list, and get the metadata and permissions.
Let’s say that you can process 2 folders per second (depending on the source, it may require more than 1 call per folder to bind to the folder, get the permissions, and the createdby/modifiedby fields). That said, 500 ms per folder is a good ratio.
If you have a drive containing 30,000 folders, it will take 15,000 seconds ( 250 minutes / 4 hours) to perform the discovery before starting the migration.
Every time you start a delta pass, or the migration goes to pending, this time will have to be spent again (new folders may have been created).
-
Discovery is mandatory each time a job is starting.
-
Not doing so will cause additional delays ( adding the folder structure in each submitted job may add an extra time of 15 or 30 minutes in each job processed by Microsoft, multiplied by the hundreds of jobs submitted).
4. Performance considerations
Drives are different by their content.
- A drive with 1 file whose size is 15 Gb will take less time to migrate than a drive with 15,000 files of 1 Mb, even if the drive size is the same.
- A drive with many folders (ex 30,000) and a few files (ex 500) may take 1 day to migrate.
- On the contrary, a drive with few folders (ex 50) and 10,000 files may also take only 1 day.
Therefore, it is challenging to compare, and you/we can’t make any assumptions because small drives in volume may take longer than larger drives. Drives with more files will migrate quicker than drives with fewer files (but more folders), etc.
From our experience, it is very challenging:
- To migrate a drive than contains more than 15,000 folders.
- To migrate a drive that contains a large number of files. A drive with more than 100,000 files may take weeks to complete or even not succeed if it contains more than 300,000 or 400,000 files.
5. Conclusion
Migration of large drives and large SharePoint libraries is complex due to managing the Microsoft Import Service.
Cloudiway is constantly trying to deal with its limitations and throttling issues, migrating at the maximum possible speed.
If you are hitting a performance issue for a particular drive or site, try to grab as much information as possible regarding its content:
- Number of files to migrate
- Nature of the files (small, large)
- Number of folders