Parsing Web Data with PowerShell

The need for parsing web data or consuming Internet data is increasing with the move of many applications to the cloud. Driven by the thirst for data analysis and automation, parsing web data with PowerShell is relatively simple and incredibly useful. I will explain how to achieve this with pre-formatted data, in this case JSON and unformatted data, like a web page.

Parsing Web Data: JSON

Let’s use the weather for this example. You need to pull the latest temperature from the internet to help you decide if you need a jumper on when you head outside. We start with a weather provider with formatted data. The Australian Bureau of Meteorology provides this feature freely on their website.

Heading over to http://www.bom.gov.au/catalogue/data-feeds.shtml and scrolling down to “Observations – individual stations” you can select any State or City you wish (we’ll be using Sydney area).

Area Selection
Area Selection Table

This takes us to a page of all the Sydney area weather stations, for this example we will select “Newcastle Nobbys”.

Weather Stations
Weather Station List

There will be a large list of weather data, but we want the raw data not the web page data. Scrolling to the bottom of the page we find the JSON url that we will be using in our PowerShell.

JSON URL
JSON URL Link

If you click on this JSON url and view it in your browser it will look like a dump of keys and values. This is what we need in order to turn it into something useful in PowerShell. To consume this JSON data we require two commands, Invoke-WebRequest and ConvertFromJson. The Invoke-WebRequest command connects to the url and downloads the JSON file. ConvertFromJson parses this raw JSON data and converts it into a set of PowerShell objects following the reference of the keys and values.

JSON Raw
JSON Raw

Here is the command, looks fairly simple doesn’t it? What this actually does is connect to the BOM website, pull down the JSON data, convert to PowerShell object and store it in a variable called $weather.

$weather = Invoke-WebRequest -Uri “http://reg.bom.gov.au/fwo/IDN60901/IDN60901.94774.json” | ConvertFrom-Json

Invoke-WebRequest
The Full Invoke-WebRequest Command

If you enter the variable $weather in the PowerShell window, you will notice it returns a set of objects. These objects are what we need to enumerate to get the required data. To do this we add a period and then select the next data node, so the finished line would look like the following.

$weather.observations.data

Viewing the $weather Variable

This will still return a long list of data similar to the webpage. To make some use of the information we would adjust the command. Lets say we wanted to know the temperature and humidity, we’d adjust the command to look something like this.

$weather.observations.data | select local_date_time,air_temp,rel_hum

Weather Variable Filtered
Weather Variable Filtered

This command gives us a much clearer and simpler view of the data. We could even go as far to get the temperature at a certain date and time and make a function out of it for repeatability and automation!

function Get-TempForDate {
param($datetime)

$weather = Invoke-WebRequest -Uri “http://reg.bom.gov.au/fwo/IDN60901/IDN60901.94774.json” | ConvertFrom-Json

$weather.observations.data | Where local_date_time -eq $datetime | Select air_temp
}

Get-TempForDate Function
Get-TempForDate Function

Parsing Web Data: HTML

Now I hear you saying “That is a great party trick! But what about parsing web data with a real world use case and that isn’t pre-formatted like JSON?”. Let’s say you have a script that requires the current public IP of your network. If you were to manually check this you’d go to somewhere like https://www.whatismyipaddress.com and view the page noting the public IP. So let’s use that site, grab only the IP and consume it in PowerShell.

Firstly we need to find the elements to gain access to the IP address. Launching the website and opening Developer Tools ( HowTo: Chrome, Safari, Internet Explorer) we use the Element Selector, highlighted in yellow and hover over to our IP address and left click. This shows us exactly where the IP address is in HTML code, and what elements we need enumerated to get to it. In this case a unique “div” tag called “section_left” shown by the black arrow and the first child “a” tag shown by the blue arrow.

whatismyipaddress.com webpage with developer tools
Whatismyipaddress.com Showing Elements

Now for the PowerShell. We start with an Invoke-WebRequest again, however this time we don’t require the Convert-FromJSON.

$html = Invoke-WebRequest -Uri https://www.whatismyipaddress.com

Invoke-WebRequest Command

We know the element name we are looking for is “section_left” and it is within the HTML elements, so we tell PowerShell we want to get an element by name and retrieve it from the parsed html.

$html.ParsedHtml.getElementsByName(‘section_left’)

getElementsByName Command

Now to retain all of the sub-objects and methods we need to evaluate the line above as if it’s a sub expression. This gives us a new object with the same original methods intact (more information: The Complete Guide to PowerShell Punctuation). Therefore to achieve this we place a $( ) around the line and create a new variable called $section_left.

$section_left = $($html.ParsedHtml.getElementsByName(‘section_left’))

Modified getElementsByName Command

We know the last element we needed was the “a” tag, and it was the first “a” element. Because we know arrays start at 0, we write the following to get the “a” tag data out of $section_left.

$section_left.getElementsByTagName(‘a’)[0]

a Tag Without innertext

While this contains the IP we want, we only need the inner text of the returned objects. So let’s just get the inner text of the “a” tag, which should only be the IP address.

$section_left.getElementsByTagName(‘a’)[0].innertext

a Tag With innertext

As a result, the IP address by itself with no HTML surrounding it!

If we wanted to have it formatted as a nice little one liner this is what it would look like, returning our public facing IP!

$((Invoke-WebRequest -Uri http://www.whatismyipaddress.com).ParsedHtml.getElementsByName(‘section_left’)).getElementsByTagName(‘a’)[0].innertext

One Line Public IP from PowerShell

Now there are plenty of other uses and ways to pull out tables, images and even hyperlinks. If you’d like to learn more drop me a line! Also to note, when I’m looking to get the public IP for my scripts I use the following instead.

(Invoke-WebRequest -Uri http://checkip.dyndns.com).content -replace ‘[^\d\.]’

However this would not have given you a more well rounded understanding on how to get what you need from a web request.

Joshua Bauer
Operations Manager – Vantage Networks

If you have any questions about Invoke-WebRequest or PowerShell for parsing web data, reach out.

Don’t forget to Follow us on LinkedIn!