How can I export table row in internet explorer?

I need to export a single table row on a website and I can't figure out how to do it.  The source view for the row I need is:

tr class="alt"> <td id="16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE">16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE</td> <td></td> <td></td> <td>0.00065227</td> <td>0.01233629</td> <td>0.00371003</td> </tr>

I can get the table ID using the code below, but I don't know how to get the rest of the values. The table ID does not change but the numerical values do.

$ie = New-Object -com InternetExplorer.Application
$ie.silent = $false
$ie.navigate2("mywebsite.com")
$ie.Document.getElementById("16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE")

January 10th, 2014 6:10pm

Hi Tom,

this may not be quite the perfect solution, but it works for me at least. I'm not using the IE ComObject, but rather the .NET Webclient for it ...

# Load downloader function
function Get-WebContent
{
	<#
		.SYNOPSIS
			Downloads a file
	
		.DESCRIPTION
			Download any file using a valid weblink and either store it locally or return its content
		
		.PARAMETER webLink
			The full link to the file (Example: "http://www.example.com/files/examplefile.dat"). Adds "http://" if webLink starts with "www".
	
		.PARAMETER destination
			The target where you want to store the file, including the filename (Example: "C:\Example\examplefile.dat"). Folder needs not exist but path must be valid. Optional.
	
		.PARAMETER getContent
			Switch that controls whether the function returns the file content.
	
		.EXAMPLE
			Get-WebContent -webLink "http://www.technet.com" -destination "C:\Example\technet.html"
			This will download the technet website and store it as a html file to the target location
	
		.EXAMPLE
			Get-WebContent -webLink "www.technet.com" -getContent
			This will download the technet website and return its content (as a string)
	#>
	Param(
	[Parameter(Mandatory=$true,Position="0")]
	[Alias('from')]
	[string]
	$WebLink,
	
	[Parameter(Position="1")]
	[Alias('to')]
	[string]
	$Destination,
	
	[Alias('grab')]
	[switch]
	$GetContent
	)
	
	# Correct WebLink for typical errors
	if ($webLink.StartsWith("www") -or $webLink.StartsWith("WWW")){$webLink = "http://" + $webLink}
	
	$webclient = New-Object Net.Webclient
	$file = $webclient.DownloadString($webLink)
	if ($destination -ne "")
	{
		try {Set-Content -Path $destination -Value $file -Force}
		catch {}
	}
	if ($getContent){return $file}
}

# Download website
$website = Get-WebContent -WebLink "http://www.mywebsite.com" -GetContent

# Cut away everything before the relevant part
$string = $website.SubString($website.IndexOf('<td id="16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE">'))

# Cut away everything after the row
$string = $string.SubString(0,$string.IndexOf('</tr>'))

# Split the string into each individual line
$lines = $string.Split("`n")

# Prepareing result variable
$results = @()

# For each line, cut away the clutter
foreach ($line in $lines)
{
	$temp = $line.SubString(4,($line.length - 10))
	
	# for the first line, the td has an id, which this compensates for
	if ($temp -like 'id="16ZwhxLjCN8fafA8wuYEnMFtGJGrFy6qcE">*'){$temp = $temp.SubString(($temp.IndexOf(">") + 1))}
	
	# Add cleaned line to results
	$results += $temp
}

You may need to adapt the string parsing beneath the function, if the text you posted is not literally identical to the way this function returns it. It worked for a string block acquired via copy&paste from your post anyway. :)

I certainly would be more than happy to read a more elegant version, if someone has one to offer.

Cheers,
Fred

Free Windows Admin Tool Kit Click here and download it now
January 10th, 2014 6:50pm

Two issues which I see.  The table ID will likely be different every time you download the page.

IF the page is XHTML or HTML5 then it can be parsed most easily as XML.  YOu can just find all nodes that are 'table' and pick the one you want.  All <tr> elements will be the rows.  From this point it is very easy to convert the table into data objects.

You can alos directly import HTML tables in Excel and MSAccess.

January 10th, 2014 10:45pm

I modified the end of your script. Instead of parsing, I just used:

$lines = $string.Split("`n")
$lines = $lines.Replace("<td>", "")
$lines = $lines.Replace("</td>", "") | Out-File -FilePath c:\test\test.csv -Append

-----------------------------------

Everything after that I removed and it's nice and clean. Thank you very much for your help!!

Free Windows Admin Tool Kit Click here and download it now
January 10th, 2014 11:04pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics