Google

udmSearch - News Extension README

Date: 2000-04-19
Author: Heiko Stoermer <heiko.stoermer@innominate.de>
 

Introduction:

Since 3.10 udmSearch comes with an integrated extension to archive news servers. (currently MySQL only! see restrictions)
This means that you can now download all messages from a news server an save them completely in a database.
 

Benefits:

  • you can expire the messages on the news server to keep it slim and fast
  • you can search the complete message base with all the features that regular udmSearch offers
  • you can still browse discussion threads over the complete archive

Restrictions:

  • currently mysql only (I would have really liked to do this for postgresql, but some really annoying restrictions concerning query size and field size in postgresql finally made me switch to mysql.)
  • perl-frontend only
  • single dict only (because mysql-perl frontend does not support multi-dict)

Future:

No new features are planned for this thing. It works the way it is (at least as far as I can see) and does everything I wanted it to do.
What I will do is to make the code a bit more portable to other databases and fix the few very tiny bugs in the frontend.
Of course newly discovered bugs will be fixed. I'm maintaining it as good as I can.
 

Performance:

Of course, important questions always are: how fast.../how big.../how long....
  • Our local intranet installation of udmSearch says the following:
          UdmSearch statistics
 
    Status    Expired      Total
   -----------------------------
       200      76132      76132 OK
       404        119        119 Not found
       503         17         17 Service Unavailable
       504        802        802 Gateway Timeout
   -----------------------------
     Total      77070      77070
which means that roughly 77.000 messages are archived in the database
  • Current database size is:  423 Megabytes
  • The dict table has 6.076.462 entries
  • It's run on an AMD K6 400  with 64 MBs of RAM (very tiny thing)
  • typical queries take between 2 and 10 seconds.

Installation

Compile:

Unpack the udmSearch distribution archive.
start the configure script with the options --enable-news-extension and --with-mysql
make and make install as described in the regular install instructions
 

Create Database:

The news extension uses a slightly different database layout. The create files can be found in frontends/mysql-perl-news/create/
(Of course you have to do mysqladmin create udmsearch first and set permissions to the account the web-frontend and indexer are run as)
 
 

Install indexer.conf:

an indexer.conf for incremental news archiving (messages hardly ever change...) can be found in frontends/mysql-perl-news/etc/ together with a sample cron shell script that can be run once a day or so.
Please see indexer.conf for detailed description of the indexing process.
 

Install perl frontend:

copy frontends/mysql-perl-news/*.pl and frontends/mysql-perl-news/*.htm* to your cgi-bin directory.
copy frontends/mysql-perl-news/*.pm to your site's perl library dir (site_perl or so) where the modules can be found by the perl scripts.
edit search.htm and change the included database login information
The Perl frontend has additional features that allow you to browse message threads. You will see.
 
 

Now you are set and can run indexer for the first time according to the instructions you can find in indexer.conf.

I hope this is a nice feature for you.
If anyone is interested in porting this to other databases/multidict mode/the PHP frontend, PLEASE DO SO! I would be pleased and will assist you.


Heiko Stoermer
Last changed: Wed Apr 19 12:43:23 CEST 2000
EOF