Using Perl and Typical Expressions to Approach Html Files – Part 2

Using Perl and Typical Expressions to Approach Html Files – Part 2

In this posting we will talk about how to transform the contents of an HTML file by jogging a Perl script on it.

The file we are going to method is identified as file1.htm:

Observe: To be certain that the code is shown properly, in the example code proven in this posting, sq. brackets ‘[..]’ are made use of in HTML tags rather of angle brackets ”.

[head]Using Perl and Typical Expressions to Approach Html Files – Part 2Sample HTML File[/title]
[url rel=”stylesheet” form=”text/css” rel=”nofollow” onclick=”javascript:ga(‘send’, ‘pageview’, ‘/outgoing/posting_exit_url/362029’)” href=”fashion.css”]
[p]Welcome to the earth of Perl and standard expressions[/p]
[h2]Programming Languages[/h2]
[table border=”1″ width=”400″]
[tr][th colspan=”2″]Programming Languages[/th][/tr]
[tr][td]Language[/td][td]Normal use[/td][/tr]
[tr][td]JavaScript[/td][td]Shopper-side scripts[/td][/tr]
[tr][td]Perl[/td][td]Processing HTML data files[/td][/tr]
[tr][td]PHP[/td][td]Server-side scripts[/td][/tr]
[p]JavaScript, Perl, and PHP are all interpreted programming languages.[/p]

Imagine that we want to transform both occurrences of [h1]heading[/h1] to [h1 class=”large”]heading[/h1]. Not a large transform and a little something that could be quickly carried out manually or by executing a straightforward search and swap. But we’re just receiving commenced here.

To do this, we could use the subsequent Perl script (

1 open up (IN, “file1.htm”)
2 open up (OUT, “>new_file1.htm”)
three whilst ($line = [IN])
four $line =~ s/[h1]/[h1 class=”large”]/
5 (print OUT $line)
7 near (IN)
8 near (OUT)

Observe: You will not want to enter the line figures. I have incorporated them simply just so that I can reference unique lines in the script.

Let’s glance at each individual line of the script.

Line 1
In this line file1.htm is opened so that it can be processed by the script. In buy to method the file, Perl takes advantage of a little something identified as a filehandle, which delivers a form of url amongst the script and the working method, made up of information and facts about the file that is staying processed. I have identified as this “opening” filehandle ‘IN’, but I could have made use of just about anything inside of reason. Filehandles are ordinarily in capitals.

Line 2
This line generates a new file identified as ‘new_file1.htm’, which is composed to by utilizing an additional filehandle, OUT. The ‘>’ just before the filename implies that the file will be composed to.

Line three
This line sets up a loop in which each individual line in file1.htm will be examined separately.

Line four
This is the standard expression. It lookups for a person incidence of [h1] on each individual line of file1.htm and, if it finds it, adjustments it to [h1 class=”large”].

Hunting at Line four in extra depth:

    • $line – This is a variable that contains a line of text. It gets modified if the substitution is effective.
    • =~ is identified as the comparison operator.
    • s is the substitution operator.
    • [h1] is what requirements to be substituted (replaced).
    • [h1 class=”large”] is what [h1] has to be altered to.

Line 5
This line requires the contents of the $line variable and, by way of the OUT file handle, writes the line to new_file1.htm.

Line 6
This line closes the ‘while’ loop. The loop is repeated until all the lines in file1.htm have been examined.

Lines 7 and 8
These two lines near the two file handles that have been made use of in the script. If you missed off these two lines the script would nevertheless do the job, but it is good programming observe to near file handles, thus releasing up the file handle names so they can be made use of, for example, by an additional file.

Jogging the Script

As the goal of this posting is to demonstrate how to use standard expressions to method HTML data files, and not automatically how to use Perl, I will not want to spend much too prolonged describing how to run Perl scripts. Suffice to say that you can run them in several ways, for example, from inside of a text editor such as TextPad, by double-clicking the perl script (, or by jogging the script from an MS-DOS window.

(The locale of the Perl interpreter will want to be in your Route statement so that you can run Perl scripts from any locale on your computer and not just from inside of the listing in which the interpreter (perl.exe) by itself is set up.)

So, to run our script we could open up an MS-DOS window and navigate to the locale in which the script and the HTML file are positioned. To continue to keep life straightforward I have assumed that these two data files are in the similar folder (or listing). The command to run the script is:


If the script does do the job (and ideally it will), a new file (new_file1.htm) is established in the similar folder as file1.htm. If you open up the file you can see the the two lines that contained [h1] tags have been modified so that they now browse [h1 class=”large”].

In Part three we’ll glance at how to handle a number of data files.

Comments are closed.