README

Note: This is an old version of the README file.


Database Advisor      Last Updated: October 7, 1998

 Copyright (c) 1998 Regents of the University of California
--------------------------------------------------------------------------------

Table of Contents:
------------------

	I.	Overview

	II.	License Information

	III.	Required Files

	IV.	Other Vital Files

	V.	How the pieces fit together
		A.	The Engine
		B.	Server Pushing: Netscape vs Microsoft
		C.	The Database Interfaces
		D.	The Profiles
		E.	The Subject Files

	VI.	How to Add a Database Interface

	VII.	Error Messages in Database Interfaces

	VIII.	The Message Queue

	IX.	Signal Handling

	X.	Running Multiple Libraries off the Same Engine

	XI.     Passwords

	Appendix - Authors and Contact Points

--------------------------------------------------------------------------------
Section I:

Overview: 
-----------------

       This software was developed by the web programmers of the Science Libraries
       at the University of California, San Diego Campus.
       
       Database Advisor(DBA) was created to aid database users in selecting the 
       best database for their query.  DBA spawns a search process for each 
       database vendor, and returns the hits on the query to the user.  It sorts
       these results so the user can see where each database stands relative to
       the others.  Each database has a link which can be followed to access the
       database (though the terms of usage that the vendor sets still apply)
       Each database has a Profile which stores information about the database.

--------------------------------------------------------------------------------
Section II:

License Information:
--------------------

	Copyright (c) 1998 Regents of the University of California 
	
	This program is free software; you can redistribute it and/or
  	modify it under the terms of the GNU General Public License
	as published by the Free Software Foundation; either version 2
	of the License, or (at your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
	GNU General Public License (available at 
        http://www.gnu.ai.mit.edu/copyleft/gpl.html) for more details. 

  	You should have received a copy of the GNU General Public License
	along with this program; if not, write to the Free Software
	Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, 		
	USA.

	This software was developed by the Science Libraries at the University of 	
	California, San Diego.  For more information, contact Christy Hightower 	
	 at the Science & Engineering Library, 0175E, 		
	University of California, San Diego, 9500 Gilman Drive, La Jolla, 		
	California, 92093-0175.


--------------------------------------------------------------------------------
Section III:

Required Files:
---------------

   CGI packages
	- These packages, found in Perl 5, contains useful packages such as
	  URL.pm, CGI.pm, Headers.pm, Request.pm, etc...
   dbaLocal.pl
	- This is a group of directory identifiers
   dbaPasswd.pl
	- This contains variables which will be your user name and password
	  for various databases.
   Display.pl
	- Display.pl has all the different display strings which are used
	  in the program (ie, YouSearchedFor, dbaFooter, etc...).  Because
	  they are in this file, you can use multiple different displays.
   loaddb.pl
	- This file holds the functions which will load the database into
	  a hash which the various database interfaces will access.
   nph-dba
	- This is the engine of Database Advisor.  It has several auxiliary
	  files which it needs in order to run.
   nph-profile
	- This file generates the profiles from the .db files found in the
	  /dba/dbfiles directory.
   push.pl
	- This handles the underlying code for the server pushes
   stopwords.db    
	- This is the database file which stopwords.pl will parse
   stopwords.pl
	- This parses a list of various "stopwords": Words, symbols, or phrases 
	  which will cause problems in the search engines (ie, AND, OR, &, etc)
   subjects.db
	- This has a list of all the subjects and the databases which are 
	  classified under it.
   subjects.pl
	- subjects.pl reads subjects.db and parses out the various subjects
	  and therefore which databases will be searched.


DBA dependencies:

	Here is information on obtaining PERL5 and the LWP modules which
	are required to run DBA.   Z39.50 is required to run the Melvyl
	module as well as other Z39.50 databases.   Once you create
	zclient it should be installed in this directory under z39
	as zclient.

Z 39.50 Client
---------------------------------------------------------------------- 
	"the Z39.50 API zclient" software for Z39.50 connections.  
	You can obtain the code from 

 	http://lindy.stanford.edu/~hrf/z3950/www_gateway.html

 	source of this code is not ucsd/dba project.  it is:

         name: Finkbeiner, Harold R
       e-mail: harold_finkbeiner@Stanford.Edu
   department: Info Tech Systems & Services
     position: Sys Sw Developer,Prin
      address: Polya Hall, Rm. 208
        phone: (650) 725-3353
          fax: (650) 723-3253
    mail-code: 4136
 date-updated: Dec 12 1997 12:15AM
----------------------------------------------------------------------


GNU C compiler
----------------------------------------------------------------------
	to compile the zclient code for your machine.

  	go to http://www.delorie.com/gnu/ for more information on the 
	GNU project and a list of FTP sites for GNU software.

----------------------------------------------------------------------

perl
----------------------------------------------------------------------
	perl, version 5.004_01 or later.
	
	You can get the lastest version of perl and the modules listed
	below from:

		http://www.perl.com

	modules:
		CGI.pm

	the LWP (formerly known as "libwww") Module 
	(this includes: HTTP::Request; LWP::UserAgent; URI::URL)
	
	Note that LWP has its own dependencies which are documented
	when you download and install it.  these perl modules are also
	available from by going to the www.perl.com site  above.
	these LWP prerequisites include:

  		MIME-Base64
  		HTML-Parser
  		libnet
  		MD5

	wwwurl.pl (should be included with your perl distribution)

	cgi-lib.pl (available from

		http://www.seas.upenn.edu/~mengwong/forms/cgi-lib.pl.txt)

--------------------------------------------------------------------------------
Section IV:

Other Vital Files:
------------------

   Various profiles
	- The profiles are what the user sees when they would like to know
	  more about the database in question.  They also contain useful 
	  knowlege about what URL's to use as well as short descriptions
	  and subjects that they belong to.
   Various Database Interfaces
	- Without the database interfaces, DBA does nothing.  These scripts 
	  actually contact the vendor, and request the information.  There
	  are three ways it can do this: Z39.50, telnet, and web.  The Z39.50
	  interfaces are fairly simple, using a single connection, query, 
	  and response.  The telnet and web interfaces are based on pattern
	  matching and take longer as they must first traverse several layers
	  of information before being able to input the desired query.
   z3950.pl
	- This contains the commands to use when connected to a Z39.50 server

--------------------------------------------------------------------------------
Section V:

How the pieces fit together: 
----------------------------

-------------
A. The Engine
-------------

To understand Database Advisor(DBA), you must first understand the engine.
The engine keeps track of various things:

	Timing - How long will DBA run before timing out?
	Library id - Which Library is running DBA; and therefore which 
		display strings and databases should we be using?
	Subjects - Which subjects did the user select, and therefore what
		databases did they select
	Logging - Should we log this session in the DBA logfile?
	Server Pushing - What information should we send to the "push" functions
	Child Processes - Which child processes (searches) have checked back
		in during the time allotment?
	IP Checking - Should we warn users that they are not using an on-campus
		machine? (therefore they might not have access to the databases)

The main job of the engine, however, is to run the database interfaces and
wait for their replies.  Each database interface contains a fork which allows
the main engine to continue running while the new search process gathers the
data.  The program runs in a loop until the time allotment has been met.  While
in the loop, there is a message queue waiting for inter-process communications.
If it receives information, then it unpacks the data and sorts it according 
to the number of hits (with previously received messages).  Then it performs
a server push, which will allow the user to see the data as it comes in.
After the time limit is up, it will consider any remaining processes to be 
timed out and it performs a final server push with this new information.

----------------------------------------
B. Server Pushing: Netscape vs Microsoft
----------------------------------------

One of the nice features of Database Advisor is the "real time" hits display.
This is accomplished via Server Pushing.  Unfortunately Netscape and Microsoft
deal with Server Pushing differently.

In Netscape Navigator, when we push information to the browser, the browser 
clears the screen and displays the new information.  In this way, we can send 
the current results to the user, allowing them to halt the program and choose
a database (if desired).

In Microsoft Internet Explorer(MSIE), the Server Push results in appending the 
information onto the current document, creating a long and repetitive
results screen.  We modified the "push" functions to handle MSIE by appending
a message stating that results had been *received* from the database, but
it did not specify the amount of hits.  On the final server push, the engine
writes a temporary file and then refreshes the MSIE browser to that location.

***Note***: This creates files that need to be deleted using a cron or some 
other scheduled script.

--------------------------
C. The Database Interfaces
--------------------------

The Database Interfaces do the actual work of Database advisor.  There are
three different types of database interfaces: Z39.50, telnet, and web.

The Z39.50 protocol is used by Melvyl(r) and other vendors to provide a quick 
way to access the data in the databases.  Usually, there is so little time
expended in running the searches that we run the searches serially instead
of running them in parallel.  This cuts down the overhead of opening a Z39.50 
server for each database to retrieve the information.  For more technical info
on Z39.50, please see the Melvyl.pl file (which is an actual Z39.50 database
interface), or see the documentation on Z39.50.

The Telnet protocol is used in databases such as BIOSIS, where there is no 
web version availiable as of yet.  Most of these are currently being transferred
to the web.  For these, we open a socket to the destination, and wait until the
pattern we are looking for (often a prompt) appears.  This is rather unreliable
as we don't know when we have received all the information.  For more technical
info on Telnet, see the Telnet.pl file (which is an actual Telnet database
interface).

The Web protocol is used by the rest of the databases.  Each search is limited
by the speed of the webserver.  Depending on the database implementation, there
is the possibility of accessing the search engine directly from the web.  
Sometimes it is necessary to travel through the various pages to reach the 
point where the search query can be input.  The nice part of the web interfaces
is the HTTP packages which enable the interfaces to have very little code 
regarding connections in it.  Also, the information is all sent at once
(contrary to telnet, where we don't know when the end of transmission is), and
can be parsed out after receiving the reply.  

Each type starts off by forking off a varying number (depending on how many
databases are offered by that vendor) of database searches so the engine can
proceed with running other database interfaces.  Then it finds out the 
information (via Z39.50,telnet, or web), and returns this information to the
engine via an interprocess communication.  After it sends the message, the
process dies.  If the database limits concurrent users, then the interface
should also properly log out of the database.

Each process has it's own timeout method as well.  The time for timeout
is taken from $main'timeout (which can be specified by the user).  This way,
there are not processes which run forever.

---------------
D. The Profiles
---------------

The profiles are an important part of Database Advisor.  In them is the
information that the user needs to see, as well as information DBA needs
in order to complete the queries.  The important field in the profiles 
(to DBA) is the "DBA URL" field.  It contains the URL which the database 
interface will use to start the search query.  This is useful if the interface
runs several databases from the same vendor, some of which require different
URLs.

--------------------
E. The Subject Files
--------------------

The subjects.db file lists the various subjects as well as the databases
which are considered in that subject.  To facilitate multi-library use, the
subjects.db file has a library id which is declared before every database
name.  For formatting details, consult the subjects.db file.  This file is
then parsed and put into an associative array by library and by subject.  
This array is then used in the engine and the database interfaces to determine
which databases the user wants searched.

--------------------------------------------------------------------------------
Section VI:

How to Add a Database Interface:
--------------------------------

If you can not find the database interface you are looking for on the web,
then you can attempt to create one on your own.  The easiest way to accomplish
this is to take a look at existing interfaces and model the new one on it.
Once you have either created a new interface, or appended an extra fork
on an existing interface, you need to create a profile for it.  To avoid
parsing errors, please copy an existing profile and edit it to suit your
database.  

Then you will need to edit the subjects.db file to include your profile 
under one or more of the subject headings found there.  Note: If you want
it to show up in the All Subjects catagory, you must include it there
as well.  It will not search through the subjects for non-declared databases.

Once this is done, add the interface (yourdbname.pl) to the list of Database
Interface Includes.  This will allow DBA to "see" the interface file.  If the
file doesn't compile, then DBA will crash as well (you will get a "Document
Contains no Data" error most likely).  

After you have required the file, you will need to go to where it
starts the child processes and add a execute function line.  If your interface
has multiple database accesses within it, you will need to pass the associative
variable %databases to it.  %databases is a list of all the profile names under
the subject(s) the user defined.  If your interface just searches one database,
you might want to run an "if" statement to see if your profile was defined.

After you do this, you are done, the database has been added.

--------------------------------------------------------------------------------
Section VII:

Error Messages in Database Interfaces:
--------------------------------------

Often, a database will have a Remote server error, or the interface will parse
the data incorrectly.  Both result in an error.  If a connection fails between
DBA and the host computer, then that is considered a Remote Server Error.  If
the data is parsed incorrectly, that is considered a Local Server Error (which
means that the DBA host can fix it (and should!)).

When the database interface finds the number of hits, it adds 10 
(or whatever is in $main'baseReturnValue) to this value.  Adding 10
allows for special return types, which are used for errors.  These 10 values
are used as indexes in an array of error strings.  For now, 1 is reserved
for Local Errors, 2 is for Remote Errors, 4 is for Timeouts, and 6 is for
Too many Users.
 
If the DBA support team finds other errors (such as Service Unavailable), 
then they can add errors themselves and use alternate return values in the 
interface code.

--------------------------------------------------------------------------------
Section VIII:

The Message Queue:
------------------

The message queue is what ties the child processes (database interfaces) to the
parent process (DBA engine).  After the database interface has attempted to 
retrieve the number of hits on a user's query, it returns the name of the 
database and the results (which are padded by $main'baseReturnValue to account 
for errors)  using "pack".  Then, it uses "msgsnd" to send the message back to
the parent process where it is unpacked and displayed.

--------------------------------------------------------------------------------
Section IX:

Signal Handling:
----------------

There is an important piece of code in DBA called "handler".  This handles 
the signals that come from the user.  If the user halts the browser before 
the message queue is closed, then there will be an orphan inter-process 
communication left.  If left untended, this can result in a complete breakdown 
of DBA (the user will hit submit, and they will get only the first push).

--------------------------------------------------------------------------------
Section X:

Running Multiple Libraries off the Same Engine:
-----------------------------------------------

For expansion purposes, Database Advisor was created so it can run multiple 
request types if need be.  If, for instance, there was one Library that wanted 
Database Advisor for science databases and another which wanted humanities 
subjects covered, both could be accomodated using only one copy of Database 
Advisor.  

The change is made in the subjects.db file.  This is the file which stores all
the various subjects and what databases fall under them.  By adding a new 
Library ID, a library can set up their own specifications for Database Advisor.
Then, a new DBA HTML page must be created with the new subjects and with the
hidden element "libid" set to the new Library ID.  

When the search is started using this new HTML page, the engine will select the
databases which correspond to the "libid".  Thus, you can have various subjects,
and various *sets* of subjects.

-------------------------------------------------------------------------------
Section XI:

Passwords
---------

The file dbaPasswd.pl is available to house usernames and passwords for
those databases which require them.   This version of DBA includes
database interfaces which require passwords for ASFA, METADEX and SocAbs.
Calls to these interfaces have been disabled by commenting out calls to
them from nph-dba.   To include them in your system, edit dbaPasswd.pl to 
include your usernames and passwords and uncomment the require and 
execute calls in nph-dba.

--------------------------------------------------------------------------------
Appendix:

Authors and Contact Points:
---------------------------


Authorship of Database Advisor (in chronological order):

   Neil Spring 
      - DBA Engine Author, Z39.50 Connection Interfaces

   Greg Kogut 
      - Telnet Database Interfaces

   Scott Petersen 
      - Web Database Interfaces, Implementation, Documentation 


Maintenance and Contact Point

   UCSD Science Library Web Programmers (techies@scilib.ucsd.edu)


Project Management and Supervision

   Christy Hightower, UCSD Science Librarian chightow@ucsd.edu


Database Advisor Interface Design Team

   Christy Hightower, UCSD Science Librarian 
   Jennifer Reiswig, UCSD Biomedical Librarian 
   Susan Berteaux, Scripps Inst. of Oceanography Librarian 

--------------------------------------------------------------------------------