NOAA RSS
NOAA RSS Feeds
Well, actually they’re Atom feeds I guess, but they’re basically interchangeable, the library I use handles both automagically, and people know what RSS is. This is more about parsing the data than a web format, so here’s a wikipedia link if you care about that.
So, NOAA publishes a ton of data using their feeds, in a few formats. Some is historical data, some is re-analyzed data, graphics, GIS, etc etc. For this project, the most interesting is the Public Advisories, which are issued regularly and contain key statistics and forecaster commentary. The model isn’t using the statistics dirctly from th Advisories, but the data is there. The Discussion section is interesting, and now gets included on hurricane pages as updates happen. But, the most useful part of the Advisories is as an update event source: NOAA issues Advisories every 6 hours for active storms, and every 3 hours for major storms.
Reading Atom in Go
Fetching the feed in Go was fairly easy, as was parsing it according to the Atom spec. gofeed took all the work out of it, put in a URL, get out a feed object.
import "github.com/mmcdole/gofeed"
fp := gofeed.NewParser()
feed, _ := fp.ParseURL("https://www.nhc.noaa.gov/index-at.xml")
The URL there will fetch all the “active” feed items as maps, which each have a text area where the notable content goes. Which NOAA didn’t use, they put the text into the description field, odd choice but oh well. From there, loop through the items, sort out which ones are the Public Advisories, and parse the text data into structs. Sounds simple, but next comes Regex.
Regex Parsing in Go
The Advisories follow a format, ish, but the best tool I could come up with to extact the data was muddling through a regex matcher. And as with all non-trivial regexes, it’s a jumbled mess (citation needed). Behold the hubris!
var publicAdvRegexOfDoom = regexp.MustCompile(`(?s).*(AL[0-9][0-9][0-9][0-9][0-9][0-9])\n(.*)\n.\n.*\n.\n.*SUMMARY.*LOCATION\.\.\.([0-9]?[0-9]?[0-9]?\.[0-9]?[0-9]?)([NS]) ([0-9]?[0-9]?[0-9]?\.[0-9]?[0-9]?)([EW]).*MAXIMUM.*\.\.\.([0-9]?[0-9]?[0-9]?) MPH.*PRESENT.*OR ([0-9]?[0-9]?[0-9]?).*AT ([0-9]?[0-9]?[0-9]?) MPH.*MINIMUM CENTRAL PRESSURE\.\.\.([0-9]?[0-9]?[0-9]?[0-9]?) MB.*DISCUSSION AND OUTLOOK\n(?:[-]*\n)(.*)(?:\n[[:blank:]]\n[[:blank:]]\n).*(?:\n[[:blank:]]\n[[:blank:]]\n).*`)
Yes I named it publicAdvRegexOfDoom
in the actual code, yes I’m sticking to that. It is not efficent (I assume) nor robust (I know), but it does the thing I need it to. After using FindAllStringSubmatch
, each part of the Advisory text that matches where the Regex is in parenthesis gets pulled out in a slice. Using a less confusing example:
var regex = regexp.MustCompile(`Test ([0-9])([0-9])`)
match [][]string = regex.FindAllStringSubmatch("Test 12", -1)
// match[0][0] is 1
// match[0][1] is 2
The parenthesis mark groups that get extracted out when the regex is able to match the text. The matches are only returned as text, so the code needs to deal with conversion to numeric types, parsing out the Latitude/Longitude format, etc. I won’t put all of that rather boring code here, you can view that here. Its worth noting that Go doesn’t support some of the more complex regex features due efficiency concerns, which for me showed up as no support for “match everything except ‘x’”.
Final Stucture
The final struct at the end looks like this
type StormFeedInfo struct {
Name string
StormID string
AdvNumber int
Timestamp time.Time
LatY float64
LonX float64
BearingDeg float64
ForwardSpeedKts float64
VMaxKts float64
MinCpMb float64
Discussion string
Graphics []string
Sources []string
}
and this is dumped as a document into two Firestore locations: one in a “pending” collection, and one in a more formal archive location, to be later picked up by the build process.
Where It Lives
The parser itself is run every 2 hours in Cloud Functions, kicked off by Cloud Scheduler, with PubSub as the glue layer. Cloud Scheduler is pretty straightforward, cron style format pointed to a PubSub topic, the UI and docs will work for setting that up. The Cloud Function is triggered by new messages in the topic, but the message content isn’t used for anything. Creating the Cloud Function itself is largely pasting the code into the UI, and making sure the entry point is defined (which is provided by the initial template). The only gotcha here was making sure to also update the go.mod
file with any dependancies (gofeed and Firestore in this case).