Parse Text With Regular Expression

Hello, Dear Colleagues.

Could you help me please parse text with regular expression:

--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/plain; charset="KOI8-R"
Content-Transfer-Encoding: base64

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
--=_alternative XXXXXXXXXXXXXX_=--
--=_related XXXXXXXXXXXXXX_=--_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
--=_related XXXXXXXXXXXXXX_=--

I want to get separately:

1). text between  Content-Transfer-Encoding: base64 and --=_alternative, if there is above line Content-Type: text/html 

2). text between  Content-Transfer-Encoding: base64 and --=_related, if there is two lines above line Content-Type: image/jpeg

In this example it will be

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

If to use "greedy" RegEx

$regex = "(?ms).+?Content-Type: image/jpeg(.+?)--=_related"

I get all text between first Content-Type: image/jpeg and last --=_related

Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

How to make RegEx "lazy" and get all pieces separately?

Thanks.





  • Edited by fapw Monday, July 20, 2015 2:59 PM
July 20th, 2015 1:57pm

Thanks you, guys.

Answered by Jessie Westlake in this thread.

Code:

$files = Get-ChildItem -Path "\\<SERVER_NAME>\mailroot\Drop"
Foreach ($file in $files){
    $text = Get-Content $file.FullName

    $RegexText = '(?:Content-Type: text/html.+?Content-Transfer-Encoding: base64(.+?)(?:--=_))'
    $RegexImage = '(?:Content-Type: image/jpeg.+?Content-Transfer-Encoding: base64(.+?)(?:--=_))'

    $TextMatches = [Regex]::Matches($text, $RegexText, [System.Text.RegularExpressions.RegexOptions]::Singleline)
    $ImageMatches = [Regex]::Matches($text, $RegexImage, [System.Text.RegularExpressions.RegexOptions]::Singleline)

    If ($TextMatches[0].Success)
    {
        Write-Host "Found $($TextMatches.Count) Text Matches:"
        Write-Output $TextMatches.ForEach({$_.Groups[1].Value})
    }
    If ($ImageMatches[0].Success)
    {
        Write-Host "Found $($ImageMatches.Count) Image Matches:"
        Write-Output $ImageMatches.ForEach({$_.Groups[1].Value})
    }
}


  • Marked as answer by fapw 21 hours 6 minutes ago
  • Edited by fapw 21 hours 5 minutes ago
  • Unmarked as answer by fapw 15 hours 52 minutes ago
  • Marked as answer by fapw 15 hours 51 minutes ago
Free Windows Admin Tool Kit Click here and download it now
July 21st, 2015 6:06am

Thanks for your reply, D'Thompson.

Maybe I didn't clearly described what I want. Sorry for that. I want something like this. Take a look please.

$text = @"
--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

111111111111111111111111111111111111111111111111111111

--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

222222222222222222222222222222222222222222222222222222
--=_alternative XXXXXXXXXXXXXX_=--
--=_related XXXXXXXXXXXXXX_=--_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

333333333333333333333333333333333333333333333333333333
--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64
444444444444444444444444444444444444444444444444444444

--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

555555555555555555555555555555555555555555555555555555
--=_related XXXXXXXXXXXXXX_=--
"@

$regex1 = "(?ms).+?Content-Transfer-Encoding: base64(.+?)--=_alternative"
$text1 = ([regex]::Matches($text,$regex1) | foreach {$_.groups[1].value})
Write-Host "text1 : " -fore red
Write-Host  $text1

#I want to get as output elements (of array, maybe, or one after another)
#1). text between  Content-Transfer-Encoding: base64 and --=_alternative, if there is above line Content-Type: text/html
#this
#1111111111111111111111111111111111111111111111111111111
#then this
#2222222222222222222222222222222222222222222222222222222

$regex2 = "(?ms).+?Content-Transfer-Encoding: base64(.+?)--=_related"
$text2 = ([regex]::Matches($text,$regex2) | foreach {$_.groups[1].value})
#I want to get as output elements (of array, maybe, or one after another)
#2). text between  Content-Transfer-Encoding: base64 and --=_related, if there is two lines above line Content-Type: image/jpeg
#this
#3333333333333333333333333333333333333333333333333333333
#then this
#4444444444444444444444444444444444444444444444444444444
#then this
#5555555555555555555555555555555555555555555555555555555
Write-Host "text2 : " -fore red
Write-Host  $text2
  • Edited by fapw Tuesday, July 21, 2015 6:53 AM
  • Marked as answer by fapw 15 hours 52 minutes ago
  • Unmarked as answer by fapw 15 hours 52 minutes ago
July 21st, 2015 6:07am

Thanks for your reply, D'Thompson.

Maybe I didn't clearly described what I want. Sorry for that. I want something like this. Take a look please.

$text = @"
--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

111111111111111111111111111111111111111111111111111111

--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

222222222222222222222222222222222222222222222222222222
--=_alternative XXXXXXXXXXXXXX_=--
--=_related XXXXXXXXXXXXXX_=--_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

333333333333333333333333333333333333333333333333333333
--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64
444444444444444444444444444444444444444444444444444444

--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

555555555555555555555555555555555555555555555555555555
--=_related XXXXXXXXXXXXXX_=--
"@

$regex1 = "(?ms).+?Content-Transfer-Encoding: base64(.+?)--=_alternative"
$text1 = ([regex]::Matches($text,$regex1) | foreach {$_.groups[1].value})
Write-Host "text1 : " -fore red
Write-Host  $text1

#I want to get as output elements (of array, maybe, or one after another)
#1). text between  Content-Transfer-Encoding: base64 and --=_alternative, if there is above line Content-Type: text/html
#this
#1111111111111111111111111111111111111111111111111111111
#then this
#2222222222222222222222222222222222222222222222222222222

$regex2 = "(?ms).+?Content-Transfer-Encoding: base64(.+?)--=_related"
$text2 = ([regex]::Matches($text,$regex2) | foreach {$_.groups[1].value})
#I want to get as output elements (of array, maybe, or one after another)
#2). text between  Content-Transfer-Encoding: base64 and --=_related, if there is two lines above line Content-Type: image/jpeg
#this
#3333333333333333333333333333333333333333333333333333333
#then this
#4444444444444444444444444444444444444444444444444444444
#then this
#5555555555555555555555555555555555555555555555555555555
Write-Host "text2 : " -fore red
Write-Host  $text2
  • Edited by fapw Tuesday, July 21, 2015 6:53 AM
  • Marked as answer by fapw Tuesday, July 21, 2015 3:18 PM
  • Unmarked as answer by fapw Tuesday, July 21, 2015 3:18 PM
Free Windows Admin Tool Kit Click here and download it now
July 21st, 2015 6:07am

Thanks you, guys.

Answered by Jessie Westlake in this thread.

Code:

$files = Get-ChildItem -Path "\\<SERVER_NAME>\mailroot\Drop"
Foreach ($file in $files){
    $text = Get-Content $file.FullName

    $RegexText = '(?:Content-Type: text/html.+?Content-Transfer-Encoding: base64(.+?)(?:--=_))'
    $RegexImage = '(?:Content-Type: image/jpeg.+?Content-Transfer-Encoding: base64(.+?)(?:--=_))'

    $TextMatches = [Regex]::Matches($text, $RegexText, [System.Text.RegularExpressions.RegexOptions]::Singleline)
    $ImageMatches = [Regex]::Matches($text, $RegexImage, [System.Text.RegularExpressions.RegexOptions]::Singleline)

    If ($TextMatches[0].Success)
    {
        Write-Host "Found $($TextMatches.Count) Text Matches:"
        Write-Output $TextMatches.ForEach({$_.Groups[1].Value})
    }
    If ($ImageMatches[0].Success)
    {
        Write-Host "Found $($ImageMatches.Count) Image Matches:"
        Write-Output $ImageMatches.ForEach({$_.Groups[1].Value})
    }
}


  • Marked as answer by fapw Tuesday, July 21, 2015 10:04 AM
  • Edited by fapw Tuesday, July 21, 2015 10:05 AM
  • Unmarked as answer by fapw Tuesday, July 21, 2015 3:18 PM
  • Marked as answer by fapw Tuesday, July 21, 2015 3:19 PM
July 21st, 2015 10:04am

This works for me:

$text = @"
--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

111111111111111111111111111111111111111111111111111111

--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

222222222222222222222222222222222222222222222222222222
--=_alternative XXXXXXXXXXXXXX_=--
--=_related XXXXXXXXXXXXXX_=--_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

333333333333333333333333333333333333333333333333333333
--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64
444444444444444444444444444444444444444444444444444444

--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

555555555555555555555555555555555555555555555555555555
--=_related XXXXXXXXXXXXXX_=--
"@
$regex1 = "(?ms).+?Content-Transfer-Encoding: base64(.+?)--=_(?:alternative|related)"
$text | Select-String -Pattern $regex1 -AllMatches | % Matches | % {$_.Groups[1].Value}


  • Edited by D'Thompson 16 hours 4 minutes ago
  • Marked as answer by fapw 15 hours 52 minutes ago
Free Windows Admin Tool Kit Click here and download it now
July 21st, 2015 11:01am

This works for me:

$text = @"
--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

111111111111111111111111111111111111111111111111111111

--=_alternative XXXXXXXXXXXXXX_=
Content-Type: text/html; charset="KOI8-R"
Content-Transfer-Encoding: base64

222222222222222222222222222222222222222222222222222222
--=_alternative XXXXXXXXXXXXXX_=--
--=_related XXXXXXXXXXXXXX_=--_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

333333333333333333333333333333333333333333333333333333
--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64
444444444444444444444444444444444444444444444444444444

--=_related XXXXXXXXXXXXXX_=
Content-Type: image/jpeg
Content-ID: <_2_XXXXXXXXXXXXXX>
Content-Transfer-Encoding: base64

555555555555555555555555555555555555555555555555555555
--=_related XXXXXXXXXXXXXX_=--
"@
$regex1 = "(?ms).+?Content-Transfer-Encoding: base64(.+?)--=_(?:alternative|related)"
$text | Select-String -Pattern $regex1 -AllMatches | % Matches | % {$_.Groups[1].Value}


  • Edited by D'Thompson Tuesday, July 21, 2015 3:06 PM
  • Marked as answer by fapw Tuesday, July 21, 2015 3:18 PM
July 21st, 2015 2:59pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics