Link Detection and Remediation



On This Page

Overview

Link detection scans files and identifies any links in the files. This option is enabled when setting the behaviors for the job during job creation. It will run for both simulation and transfer jobs. The Job reports will display link information when available for the job. Once all the job runs have completed, you can then execute link remediation to update the links, so they don’t have to be edited manually.

Link Detection File Handling

The DryvIQ scans for links while the document is in memory when migrating the file. DryvIQ does not make additional API calls as part of the link detection feature. When doing the content analysis for link detection, DryvIQ needs a seekable stream. To obtain that, DryvIQ downloads the file into memory if it is small enough or into a temp location on the processing node if the file is too large. DryvIQ analyzes that stream, resets it, and uploads the file to the destination. After the transfer is complete, DryvIQ removes the temp file if it was needed to for file analysis.

 

Link detection only scans the latest version of each file and reports the links detected. It does not scan previous versions.

Supported File Types

Link detection currently only identifies links in files with the following extensions:

  • DOCX (available in Microsoft Word 2007 and newer)

  • PPTX (available in Microsoft PowerPoint 2007 and newer)

  • XLSX (available in Microsoft Excel 2007 and newer).

  • Hyperlinks: These are links to websites or documents. Hyperlinks can be http/https/ftp/ftps URLs or links to files.
    In Microsoft Word, Excel, and PowerPoint files, these links are created using the Link option on the Insert tab or by right-clicking on selected text/cell and selecting Link from the shortcut menu.

  • References to other Excel spreadsheets: In Microsoft Excel files, these are links to cells in other Microsoft Excel files. These links are made by creating a formula that references a cell or range of cells in another Microsoft Excel file. The cells are formatted similar to the following examples:

    • =[AnotherSpreadsheet.xlsx]SheetName!A1

    • ='C:\Absolute\Path\To\[AnotherSpreadsheet.xlsx]SheetName'!B1

  • Links documents/object: In Microsoft PowerPoint files, this is content that has been imported into the presentation. This content is imported using the Object option on the Insert tab or using the Paste Special option to insert a link to a Microsoft Word Document Object.

  • Unformatted links: DryvIQ will not count unformatted links (URLs that are added as plain text in the document).

  • IncludeText fields: In Microsoft Word files, link detection does not support links added through IncludeText fields using the Insert Quick Parts option.

 

Job filter exclusions take precedence over Link Detection. Therefore, if a job filter exclusion is set to ignore DOCX, PPTX, or XLSX files, Link Detection will also ignore these files.

Link Detection Impact on Job Performance

Simulation Jobs: When link detection is enabled on Simulation jobs, the simulation job execution will be longer because the document must be downloaded into memory to detect links. (Files are not normally loaded into memory during a simulation job because they are not actually being migrated.) DryvIQ estimates a 5-10% impact.

Transfer Jobs: As noted above, DryvIQ scans for links while the document is in memory while migrating the file. Therefore, the impact on job time is minimal. The document's size has a negligible impact on link detection times unless the file size is very large (GBs in size). Link detection will cause a nominal amount of CPU utilization to detect links. Memory is not affected.

Enabling Link Detection in the UI

Link Detection is available as one of the Behavior options when creating a job. This feature is disabled by default. Select the Allow link detection on supported files toggle to enable the feature.

Viewing Link Information

When enabled, link detection will identify the links in files and make the information available for review on the individual Job reports and the roll-up reports. Information is available on the Content Insights, Items, and Log pages.

 

It is important to note that link counts for spreadsheets will not always match depending on how the link was added to a cell. If the links are added to multiple cells at the same time, DryvIQ reads the link as one link shared across cells. In this instance, all shared links count as one link. If the links are added to multiple cells separately (one cell at a time), DryvIQ counts each cell as separate. In this instance, each link is counted individually.

Content Insights

The bottom of the Content Insights page for jobs that have Link Remediation enabled will display a “Link remediation status overview” section. This section lists the number of files without links, the number of links identified that need to be remediated, the number of links for which remediation has been completed, the number of links where remediation failed and needs to be executed again, and the number of links for which remediation failed. Specific details about the individual links can be viewed on the Items page and Links page.

This information can be exported to a csv file for further review using the Export this report link. The export includes the following information.

Field

Description

Field

Description

source_id

The ID assigned to the file on the source platform

source_name

The filename on the source platform. The source and destination file names may not match if DryvIQ needed to sanitize the the filename due to character or length restrictions for the destination platform.

source_path

The path where the file is located on the source platform.

destination_id

The ID assigned to the file on the destination platform

destination_name

The filename on the destination platform. The source and destination file names may not match if DryvIQ needed to sanitize the the filename due to character or length restrictions for the destination platform.

destination_path

The path where the file is located on the destination platform.

link

The URL for the link detected.

count

The number of times the link was found in the file.

Link counts for spreadsheets will not always match depending on how the link was added to a cell. If the links are added to multiple cells at the same time, DryvIQ reads the link as one link shared across cells. In this instance, all shared links count as one link. If the links are added to multiple cells separately (one cell at a time), DryvIQ counts each cell as separate. In this instance, each link is counted individually.

Items

A link remediation status is assigned to every file included in a migration even if link detection isn’t enabled for a job. You can configure the Items page to display the status by changing the third or fourth column header to Link remediation status.

The column will display the link remediation status for every file. There are five statuses:

  • Nothing to remediate: No links were detected in the file.

  • Remediation needed: Links were detected in the file and require remediation to be executed to update the links.

  • Complete: Remediation was executed and finished processing. Regular URLs and unsupported URLs will also be considered “Complete” as there is no action to take against them.

  • Retry: Remediation was triggered but was not completed. Link remediation needs to be executed again to remediate the link.

  • Failed: At least one link in the file failed to be remediated. Failed files will not be reprocessed during subsequent link remediation executions unless the status is changed to “Retry.”

You also have the option of filtering the Items page based on a specific link remediation status. This allows you narrow the results to display only files that need to be remediated, retried, etc.

The Links page provides information about each link identified. There will be an entry for each link identified; therefore, you will see the source item listed multiple times if multiple links were identified within the file. You can edit the second, third, and fourth columns to display the information most relevant to your review. Review the table below for a summary of the available column options.

Heading

Description

Heading

Description

Source Item

This is the header for the first column. It cannot be changed. It displays the item on the source that contains the identified link.

Detected Link

This displays the link that was found.

Remediated link

This is the new link that was created to reflect the item’s new location on the destination.

Destination item

This is the item on the destination. The file name should match between the source and destination unless the file required revision during transfer due to restrictions enforced by the destination platform.

Linked Source item

This is the item the link points to on the source platform.

Linked Destination item

This is the item the link points to on the destination platform.

Remediation status

This displays the current remediation status for the link.

Remediated date

This displays the date and time when the link was remediated.

Count

This is the number of times the identified link appears in the file.

Remediating Links

You must manually trigger link remediation for the job(s) that contain links. When link remediation runs, it will remediate the linked URL so it matches the new location of the linked file.

  1. Choose the job(s) by selecting the box in front of the job name.

  2. Click More options and select Execute link remediation in the menu that displays.

     

  3. The job will be queued to run.

  4. Once the job is finished running, the link remediation status will be Complete if remediation was successful for all identified links.
    If the link remediation status is Retry, link remediation did not run. You need to execute link remediation against the job again.
    If the link remediation status is Failed, at least one link could not be remediated. You need to edit the link manually.

  5. The link detection information on the Content Insights, Items, and Links pages will be updated to reflect the current link information.

Link Remediation Impact

Link remediation does not affect the transfer times or speed of migration jobs because it is a separate process executed after migration when link detection is completed. It does, however, entail making additional calls to the destination and source platform, so platforms with caps or overage charges may be impacted. Link remediation does add time to the overall migration project because it adds a separate process that requires execution. The link remediation process is roughly equivalent to the extra time it would take to do another delta run on a document count basis. For example, remediating links in 1000 files in a job takes about as much time as running a delta run with modifications to a 1000 files. This should be factored in when planning your project if you plan on using link detection and remediation.


REST API

When creating the job through the REST API, you can enable Link Detection in the transfer block by sending “link_detection” set to “true.”

POST {{url}}v1/jobs

POST {{url}}v1/jobs

{ "name":"Job Name", "kind": "transfer", "transfer": { "transfer_type": "copy", "link_detection": "true", "source": { "connection": { "id": "source_connection_id" }, "target": { "path": "source_connection_path" } }, "destination": { "connection": { "id": "destination_connection_id" }, "target": { "path": "destination_connection_path" } } } }

 

The code below will export the link detection report for a specified job (if the job has any embedded links). The report will be exported to a CSV file.

GET {{url}}v1/transfers/{job}/links.csv

You can specify multiple jobs by separating the job IDs with commas.

GET {{url}}v1/transfers/jobs=jobID1, JobID2, JobID3/links.csv

The code below exports the link detection report for all jobs (if the jobs has any embedded links). The report will be exported to a CSV file.

DryvIQ Migrate Version: 5.6.3.4210
Release Date: April 4, 2024