|
|
|
WebscrapeA Web 'Screen Scraper'To get the Euro / US Dollar Exchange Rate The web page http://www.x-rates.com has a conversion table which contains some of the main currencies. We shall look into getting PageScrape to web-clip the current Euro/Dollar exchange rate, we will then write a small Perl script which will convert a Dollar value to its Euro equivalent. Looking at the main page we see the various conversion rates arranged in a table. The Dollar / Euro exchange rate is in a table cell located on the first column beside the EU flag. Looking at the source code we see that the required exchange rate is the Value of an <A> Tag within the table cell. The associated href gives us an easy way to anchor our search as the URL contains the text EUR/USD. Using this anchor, and the fact that the required rate is the Value of the <A> Tag, we can design a search. The basic target HTML format is: <A some attributes>the exchange rate</A> The href attribute contains the anchor we will use in our search: <A some stuff EUR/USD some more stuff>the exchange rate</A> So, we could use the following (non-greedy) RegularExpression: EUR/USD.*>(.*)< As it is better to avoid the use of .* (it literally means 'anything', and can lead to strange results) we rephrase the expression to: EUR/USD[^>]*>([^<]+)< This expression is safer and can be happily used in a greedy search. It can be read as: EUR/USD followed by anything but > followed by > followed by anything but < followed by < The exchange rate is a decimal number, we can modify the Regular Expression to reflect this, and thereby produce an even more discriminating search: EUR/USD[^>]*>([+\-.0-9]+)< The associated PageScrape command line is: pscrape -u"www.x-rates.com" -e"EUR/USD[^>]*>([+\-.0-9]+)<"
Retrieving all of the Euro Exchange Rates If we want to get all of the Euro exchange rates we can specify EUR/... rather than EUR/USD and use the -m option to tell PageScrape to search for multiple matches. EUR/... specifies EUR/ followed by any three characters, to be more precise we could specify EUR/[A-Z]{3}. pscrape -u"www.x-rates.com" -e"/d/EUR/[A-Z]{3}[^>]*>([+\-.0-9]+)<" -m This returns a list of rates, but it is not obvious to which currency each rate applies. To return each currency name as well as each exchange rate we can do the following: pscrape -u"www.x-rates.com" -e"/d/EUR/([A-Z]{3})[^>]*>([+\-.0-9]+)<" -f"\$1 \$2" -m This returns a list which contains each exchange rate along with a currency identifier for each, it should look something like: USD 0.834168 The -f option tells PageScrape how to format the output; \$1 refers to the first Regular Expression buffer/register while \$2 refers to the second, so using -f"\$1 \$2" results in PageScrape outputting the currency name followed by the rate for each currency. Perl Script to Convert from Dollars to Euro Using the above, a simple Perl script to convert a Dollar amount into its Euro equivalent could look like the following: $dollarAmount = $ARGV[0]; # Execute the command, storing the
result in $rate
|