NetLogger is an application designed to create logged ham radio nets and have those logs available online in real time. If you are serious about radio nets – it is highly likely you have run into it before. It lets a Net Controller create a net log and allows others to monitor that log while the net is in actual session. Pretty handy!
NetLogger publishes an XML REST Api which can be used by third-party applications to gather data about nets in progress and past nets. The documentation is a little behind the curve as an XML query to the NetLogger server returns some data fields not covered in the documentation. No problem really, XML is designed to make expansion easy; no sweat on my brow here. But I did manage to look at the official NetLogger.exe application and noticed some fields displayed there that are not accounted for in the XML Api or the actual xml data. Piqued – a little investigation was in order.
I long had surmised that the NetLogger Status field was being used as a catchall field for all kinds of ‘status flags’ and such. What you see on screen in the Status field is filtered, because some of the flags in it are not meant to be displayed literally as they are used to control things like which log entry is currently highlighted and so on. Others, like Net Control status “(nc)” are indeed displayed within the status field. Interesting stuff, but you must grok it yourself, it is not documented. No probs here, but it just begged the question, “How is NetLogger itself handling these things?”, because it seems to have access to more data than the XML interface provides. “RST Sent”, “RST Received”; these fields just do not exist in the XML feed but there they are in NetLogger.exe. I must investigate.
Naturally, the simple thing to do is to capture some HTTP traffic and inspect it. Easy enough to do with a program like Wireshark – it has been around a long time and it is freely available. So I went and captured some ‘conversations’ that NetLogger had with its server, looked at them, and it turns out, NetLogger.exe is not using XML at all itself. I Guess XML is for the hoi-polloi… Anyway, what the NetLogger server is doing is simply cramming data into the <body> of an html page and surrounding it with some HTML comments. Hmm… did not expect that.
Is it even ‘kosher’ to do that kind of (cram it between the comments) data interchange for a client/server application? I do not know. I am old school enough to have eschewed a lot of web technologies in the past due to their inherent fragilities. (Of course, most of my stuff back in the day was embedded firmware – code that must work no matter what). Web technologies have soft failure modes – things can just fall over in ways that you cannot predict, from causes you have no control over. It is the way of the world for a lot of technologies these days, but it is inherently fragile. But enough wailing – what to do?
The data in those comments is simple enough – it looks very much like the data you get when you save a net in NetLogger.exe to a file on your local PC. The data is simply lines of delimited data in the form of :
value | value | value |
and so on. Easy enough. I can work with that. But what to do about the HTML “reply” itself? Do I simply use some sort of regular expression to locate the data in the HTML response, or should I use an HTML DOM parser? And where can I get a DOM parser for C++, other than the old Internet Explorer Windows COM Object? (Which is no longer actively supported). But, before someone tells me to use JavaScript or some other garbage-collected or interpreted language, you can stop right there.
Building a Browser-based or node.js JavaScript app simply just does not appeal to me Sorry; I am too old and crusty and used to native code development. Those script languages let you knock out trivial applications easily, but for more than that they lose their advantage. And creating HTML is such a pain-in-the-ass that nobody really hand codes HTML anymore – websites today are mostly generated by PhP server-side code and peppered with JavaScript code to make the HTML page actually do something useful, and it is all a Dog’s Breakfast. Press the F12 key in your browser if you don’t believe me.
I have a clue how the NetLogger server guys are doing things though. Some of the HTML requests NetLogger.exe makes of its server are obviously to a Perl script (.pl), so they must be running Perl regular expressions to manipulate data. So that is how they are doing it.
The question left is, should I just use a regular expression to capture the data for an HTML response? This is a doable approach – if things do not change much. The other approach is to use an HTML parser, and there is the freely available ‘gumbo‘ package from Google, and there is one other free one also available on GitHub, Lexbor. There are commercial libraries as well – but I am in hobby mode here – so those are out.
Another issue is: I have the feeling that the NetLogger authors want to let their sleeping dogs lie. The XML data interface is a bit behind, and they have extraordinarily little traffic discussing any of the inner workings of their servers. Not exactly a welcome mat for a third-party developer – but I cannot blame them for it. It is their baby – and they likely do not have the time to support an outside developer, let alone have much of their own time to fiddle with things.
I think using a regular expression is the way to go – but I am leaning toward abandoning messing with the NetLogger (servers) for application development any further, since it is just not being indicated to me as friendly to third party developers, excepting the XML data interface. I should set this NetLogger-type application aside for the moment. Then, I can go and play with the new HTML parser I downloaded from GitHub to see how that thing works and use it in some other application’s context.
So far, Lexbor, which is the name of this parser (it renders too but I am not interested in that part) that I have decided to play with, uses CMake to build itself. Ugh. Here we go again. There is nothing more fragile than a command-line based build tool. Do not be fooled by the siren songs of CMake claiming the ability to build everything on any platform – because that is just smoke and mirrors. Anything more than a trivial project and CMake is going to take some serious configuration scripting. Afterall, CMake is just (yet) another DSL (Domain Specific Language), and like all text-based tools (looking at you, Unix/Linux peeps) can fall flat on its face in an New York Instant. Which it did. Which took me a few hours to grok and get it working correctly. There is that weasel fragility raising its ugly head again… and don’t get me started on Git….