Philipp's Homepage

Usage of importsci.pl

How to add new references from Web of Science to existing .bib files

As a first step, you must download the html files from Web of Science and save them to files in a directory on a read-write mounted volume (i.e. hard drive). In many browsers, this is done by right-clicking in the frame containing the publication details, and selecting "Save frame content as" or similar from the menu that appears. Users of Mozilla-based browsers (this may apply to Netscape - untested) need to select View Source, copy the html source into a text editor and save the file, again, as an .html or .htm document. There must not be any other .html or .htm files in the directory you are using to store the downloaded files.

(When you choose "Save source" in Mozilla, the browser requests the same web page a second time. This behaviour is not allowed by the server, which delivers a "Not found" web page instead. You will not be aware of this until you try to convert your html into .bib (which will fail for lack of data)!

In a console aka terminal, change to said directory and call importsci.pl without any arguments. It will output a file refs.bib which you can append to your existing .bib file by calling cat refs.bib >> your_older_refs.bib if you have the required tools available. (Windows users: while you're installing Win32 Perl, why not install GNU for Windows? Make your Windows (almost) as versatile as Linux!) Finally, you can sort your file by calling sortrefs.pl your_references.bib if you have sortrefs.pl available.

Finally, a quick word about how importsci.pl generates reference handles: For up to three authors, the concatenation of author surnames and the last two digits of the publication year are used. For more than three authors, only the first and senior author surnames and year are used.

Known bugs/to do/limitations: importsci.pl does not, as yet, handle special issues, supplements etc.  sortrefs.pl does not support more than 26 references from the same author or amalgam of author names in the same year, i.e. king03z is the maximum.

Dependencies: Perl 5 (probably works under Perl 4) and modules HTML::TreeBuilder, HTML::FormatText.