I am trying to use regular expressions (-imatch) to search for hyperlinks in word documents and output a list of filepath/hyperlink pairs found. Please don't laugh at my code for finding the hyperlinks. It is probably far from optimal, but it is working. What I can't get to work is probably the easier part of the project - building the table of results. What I tried to do is create a custom object called $Output with two NoteProperty members called FlPth (for the filepaths) and Hyperlink (for the hyperlinks). The I have an array called $colOutput which is supposed to collect all the individual $Outputs as I step through files containing hyperlinks. In the end I just get an array full of 20 identical pairs of FlPth/Hyperlink pairs. I know as it steps through them I am getting the right values for FlPth and Hyperlink, but something is wrong with the way I am adding them to my array. Can you tell me where my error is?
$Path = "M:\My Documents\GoodSync\SOPSnap\QA" # Temporary path to concentrate on a problem file!#$Path = "M:\My Documents\GoodSync\SOPSnap" #This is the general path.
$Regex = '\x13\s*HYPERLINK.*\x14' # Regex to catch Hyperlinks in word docs. It works!
$Excludes = '(\\_gsdata_\\|\\Archive\\|\\Drafts\\|\\Folder Settings\\|\\SOPSignOff\\|\\SOPSignOffHistory\\|\\Trash\\|lnk$)' # paths to skip
[System.Collections.ArrayList]$colOutput = @() # Make an arraylist for the results
$idx = 0 #array index counter
$output = New-Object -TypeName PSObject
$Output | Add-Member -type NoteProperty -Name FlPth -Value ""
$Output | Add-Member -type NoteProperty -Name Hyperlink -Value ""
$output | Get-Member | Format-Table
# Get all the files in $Path that end in ".doc". Expand this later to include .docx and .docm files as well and add the code to get into them
Get-ChildItem $Path -Filter "*.doc" -Recurse | #only doc files
Where-Object { $_.Attributes -ne "Directory" -and $_.FullName -notmatch $Excludes} |
ForEach-Object {
$FlPthTemp = $_.FullName
$MatchedLine = Get-Content $FlPthTemp | Select-String -Pattern $Regex -Allmatches |
ForEach-Object {
$_ -imatch '(?<=\x13\s*HYPERLINK).*(?=\x14)' | out-null; $Hlink = $Matches[0]
$idx
$Output.FlPth = $FlPthTemp
$Output.Hyperlink = $Hlink
$colOutput.Add($Output)
}
}
#Export the results to a csv
$colOutput | Format-List