Mmm, software: April 2007

Tuesday, April 24, 2007

My list of stuff to learn

Frequently I find that there are dozens of technologies I want to learn, but really just don't have the time. As a list to myself (and anybody else who might care) of the stuff that I want to learn, but just haven't had the time, here's a short enumerated list:

FreeMarker
Scheme
Groovy
Grails
Ruby
Rails
Python
JPA (Java Persistence API)
EJB Annotations (primarily for Hibernate)
XSLT

And that's just the stuff I can remember off the top of my head as I write this. I'm sure that I'll add more later.

Monday, April 23, 2007

Reading blogs pays off

I've recently been trying to read more and more developer blogs in the hopes of improving myself as a developer. One of the blogs I came across is Brian Burridge's, and he recently had a good post about a little utility called Denim. It's a storyboarding/diagramming tool that was originally meant for web developers to quickly model their sites, but I see no reason why it can't be used for user interfaces in general. The program's made in Java so not unexpectedly) its Swing UI is ugly as shit, but it's so functional it's insane. This is how I'd make an interface (though I'd try to pretty mine up), but they got the basics right: functionality before beauty. If you're a serious developer, I think you should at least give it a shot and play around with it. It's free, open-source, and made at an academic institution. BTW, I jacked the latter link right from his page to save anybody reading this some time, not to take away from his blog, which I think you should check out.

Monday, April 16, 2007

The difference between links and forms ... at least for Spring

This post is really just a mental note to myself: the difference between links and forms is that links do not submit form variables. (Duh!) Therefore, you can't bind varibles in forms to Form Backing Objects in spring if you're going to use a link to do the transition rather than a form button.

Friday, April 13, 2007

(Web)Harvesting the web

I was recently assigned a task whereby I had to obtain data from the website of one of our commercial services, but the website is completely Web 1.0 and has no APIs of any kind that
we could hook into. To give you a bit of context, the site performs transactions on our behalf and we need information about those transactions. As my boss saw it, there were only two solutions really available :
1) Have somebody sit at a computer and download the transactions every 15 minutes
2) Have a computer set up to run a screen macro to log in and download the transactions file every 15 minutes (better, but still no where near ideal)

I think without even realizing it, my boss gave me the idea for option three. He continually mentioned the idea of screen-scraping the page to retrieve the information. Traditionally this means taking a visual representation of something and extracting information from what's essentially a picture. After doing some reading, I interpreted his suggestions to mean that I should find a way to do a web-scrape on the page (subtle difference). After doing some research, I found the perfect library for doing it in Java.

It's a project called WebHarvest, and it's fairly simple yet really powerful. You start off by writing a Scraper configuration in XML, load it in code, run the scraper, and it will store your data for you in the Scraper to retrieve when need it. The library itself works by doing either POSTs or GETs to a page, taking the response data, (almost certainly) doing a transform to convert the HTML to well-formed XHTML, and then running XPath queries and regular expressions on the result to get the data you need (ie rows of a table). It's incredibly powerful and if you need a solution where you want to automate logging into a website and retrieving data, then this is a great way to do it.