Hi John,
jrv is absolutely right, that this is really the job for a database. For the sake of PowerShell technique though, here's an example on how you can do this without overloading your memory usage:
The basic concept is to keep it all within a single pipeline, so that the lines of text flow through and get emitted by the final function as they come. This means, all the data may only flow through the process part of each involved function/cmdlet. Here's
an example:
function Select-Set
{
[CmdletBinding()]
Param (
[Parameter(ValueFromPipeline = $true)]
$InputObject,
[int]
$Count
)
Begin
{
$list = New-Object PSObject -Property @{ List = @() }
}
Process
{
foreach ($Object in $InputObject)
{
$list.List += $Object
if ($list.List.Length -ge $Count)
{
$list
$list = New-Object PSObject -Property @{ List = @() }
}
}
}
End
{
if ($list.Length -gt 0)
{
$list
}
}
}
function Set-SerialContent
{
[CmdletBinding()]
Param (
[Parameter(ValueFromPipeline = $true)]
$InputObject,
[string]
$Path,
[string]
$Name,
[string]
$Extension
)
Begin
{
$count = 0
}
Process
{
foreach ($Object in $InputObject)
{
$Object.List | Set-Content "$Path\$($Name)$($count).$($Extension)"
$count++
}
}
}
Get-Content "*.csv" | Select-Set -Count 1000000 | Set-SerialContent -Path "C:\Output" -Name "Csv_" -Extension "csv"
Please note, that this is not designed to be the most aesthetic code, but a proof of concept. The two helper functions are designed according to the basic rule I explained above: All objects passed through only do so on the pipeline. By dint of necessity
Select-Set keeps up to the number specified in memory (so you need that amount of memory available).
Furthermore, for a tailored fit, I could have combined the two functions into one function (and saved me some trouble with collections and pipelines), but I like to keep separate functionalities separated between functions. A bit of pointless vanity perhaps?
Anyway, this may work, but even if it does, there's no way its performance will top (or even approach) a database.
Cheers,
Fred