FAQs - kSearch v. 1.6
kSearch's most Frequently Asked Questions. The following are the most FAQs with kSearch. This page contains the same contents as that of the faqs.html in your kSearch download package.
Installation and Usage Troubleshooting
Common Problems and Returned Errors
- What is kSearch?
kSearch is a fast, efficient, and highly configurable FREE website search engine and website indexer. It includes more features than are available in the most popular commercial search engines, yet its completely free. Key features include unlimited page search capabilities, phrase searching, case sensitivity, whole word searches, + and - logical operators, searching within previous results, a completely customizable HTML template, and much more.
- How does kSearch work?
The kSearch indexer starts at a specified directory and crawls through all of its files and subdirectories except for those listed in a special ignore-files list. For each file, the indexer removes HTML tags and embedded scripts and styles, translates Latin-1 characters to english equivalents, and then indexes the remaining contents. The title, description, links, meta-description, meta-keywords, meta-author, modification date, and file path are saved using a DBM database or flat file. To increase the search speed, the contents of each file can also be saved in the database or in a flat-file. The search script finds the number of query terms in each document, ignoring terms in a special stop-terms file, by directly accessing the file information stored in the DBM database or flat-file, or by searching on-the-fly. The results are sorted according to the users preference and are then returned to the user via a customized HTML template.
- What do I need to use kSearch?
kSearch can be installed on Unix or Windows platforms. Perl (5.003 or higher) must be installed on the OS platform to run both the indexer and search scripts. Most Unix web Hosts already include Perl. The search script uses the Benchmark and CGI perl modules which are included in the standard Perl distribution. kSearch uses the best perl DBM library module available on your system (SDBM_File is included in the standard Perl distribution. DB_File or GDBM_File may be required for very large sites). If you want to index PDF files, you must install Xpdf from http://www.foolabs.com/xpdf; the kSearch indexer uses pdftotext in Xpdf to convert PDF files to text content. The disk space needed depends on the users configuration and the size of the website. To get an idea of size requirements, the index for the 5M perl manual is 4.9M. kSearch can be configured to use a very compact index for minimal size requirements.
Installation and Usage Troubleshooting
- How do I open the "ksearch.tar.gz" file?
UNIX: Uncompress the file first by typing "gunzip ksearch.tar.gz", and then type "tar xvf ksearch.tar" to extract the file.
WINDOWS: Extract the ZIP file to your desktop or working folder, easily remembered later.
Recommended software: 7zip is a compression and decompression utility that can open both "tar.gz" and "zip"archives.
- How do I install kSearch?
After you extract the compressed file, you can open the README file for detailed instructions to complete the installation.
General Installation Steps:
- Set the path to perl on the first line of "indexer.pl", "indexer.cgi",
and "ksearch.cgi" files.
- Set the appropriate file permissions.
- Edit the "configuration/configuration.pl" file.
- Upload the SEARCH folder package to your server.
- Run the indexer by opening your browser to the "indexer.cgi" file.
- How do I prevent certain files and directories from being indexed?
To skip files and directories in the indexing routine, add their full paths in the configuration/ignore_files.txt file. Be sure that they are added on separate lines. For example, to prevent everything (all files) within the folder http://www.YourWebsite.com/private/ your Ignore File string will look like this (with NO trailing slash):
- How do I stop the search engine from finding certain terms?
Add terms you do not want indexed in the "configuration/stop_terms.txt" file. Be sure that they are added on separate lines.
- How do I stop the search engine from indexing certain parts of a webpage?
Add the following "nosearch" script between the areas of HTML you DON'T want indexed:
<!--nosearch--> Anything between in this section will not be searched. <!--/nosearch-->
- How do I stop the search engine from finding common terms not already in the "stop_terms.txt" list?
Set $IGNORE_COMMON_TERMS = 80 (configuration.pl file, line 60).
This will add the common terms to stop_terms.txt file when indexing. For example: if set to 80, terms that are found in over 80% of all files will automatically be added to the stop_terms.txt file.
- What is the full path?
The full path is the absolute path of a file or directory. For Windows, you may or may not need to include the drive letter, for example "C:\WINDOWS\kscripts\" (or "/wamp/www/KSearch1.6/search/" if your running a web server (such as WAMP). On Unix systems, you can determine the full path of the current directory by typing "pwd".
We have installed a PHP file in the download files to help you find your Full Path. After uploading your search folder, inside it is a file called fullpath.php . Open it in a web browser and there will be displayed your Full Path.
A couple examples will look like this...
- How do I set the Perl path in "indexer.pl" and "ksearch.cgi"?
In Unix, your path to Perl is generally in the /user/bin/ folder:
In Windows, your Path to Perl is the exact path, including the drive letter. For example:
The first line of each script file tells the computer that the file is a Perl program. On the first line you will see "#!/usr/bin/perl" which is a common location for Perl. You can type "which perl" to determine the Perl path on your system. If it is different, you will have to change the first line to the correct path keeping the "#!" in front. Your Hosting provider can provide you with the exact path and string. Some may also require a trailing argument, such as -w.
- How do I set file permissions?
For Unix platforms, type "chmod 755 filename" for read/exec permissions, and "chmod 744 filename" for read only permissions (filename is the name of the file). For Windows (NT) users right-click the file or directory, click Properties on the shortcut menu, and then click Permissions on the Security tab. See the README file for a list of file permissions for each file.
- When I run the indexer, I get the error message 'dbm store returned -1, errno 28, key "trap" at - line 3.' What does this mean?
This means that your system does not have either the DB_File or GDBM_File module and that you have reached the (key/value) memory limit of the DBM database. You will have to install either the DB_File or GDBM_File module from CPAN (http://www.perl.com/CPAN-local). Most systems will have DB_File.
- I do not have DB_File. Is there an alternative?
Set $USE_DBM = 0 (configuration.pl" file, line 49).
kSearch 1.5a now includes (by default) a flat file database. This prevents the database from running into memory limits or database access problems. The flat-file DB located in the /database folder.
- How do I add a search box on my site?
To add a search box, insert the following HTML code and be sure to change "Domain.com" to your actual domain. Adjust the path string (in line 1 below) to point to the "ksearch.cgi" file.
<form action="http://www.Domain.com/search/ksearch.cgi" method="get">
<input type="text" name="terms" value="" >
<input type="submit" value="Search">
- How do I configure the search engine for speed?
Set $SAVE_CONTENT = 1 (configuration.pl file, line 55). This is RECOMMENDED, and is already set as the Default.
Set $IGNORE_COMMON_TERMS = 80 (configuration.pl file, line 60) to the maximum percentage of files that indexed terms can exist in. This will ignore common terms not present in stop_terms.txt. See the README documentation for details.
- How do I configure the search engine to save disk space?
Set $MAKE_LOG = 0 (configuration.pl file, line 64) so you don't create a logfile of the indexing routine. RECOMMENDED.
- My descriptions always show the same terms that are in my navigation bar. How do I show useful content in the descriptions?
Set $DESCRIPTION_START (configuration.pl file, line 41) to a number that will represent the term number to start the description at. This will show more useful content. Descriptions may always show the same content if websites start with the same navigation bar. The very first term is 0. For example: if you set $DESCRIPTION_START to 50, then the description will start from the 51st term in the file.
- Why does the search seem to take longer than the indicated search time?
The search time is the actual CPU time used to complete the search process. This time does not represent the actual "wallclock" time that takes into account CPU scheduling and other factors. It also does not include the time used by busy networks. The CPU time is a more accurate measure of the search engine itself.
- What is the score?
Unless weights are used, the score is simply the percentage of matching characters in the document (0.00 - 100.00).
- How do I index PDF files?
In order to index PDF files, you must install Xpdf from http://www.foolabs.com/xpdf/. The kSearch indexer uses pdftotext in Xpdf to convert PDF documents to text. Xpdf is freely available for Unix and Windows platforms under the GNU General Public License. Once you have installed Xpdf, set $PDF_TO_TEXT to the full path of the 'pdftotext' executable in the Xpdf package, and then index your web site as per usual. Note: Using this option produces a security risk since the indexer script must run pdftotext using a shell command. Do not use this option if you do not trust the content of the PDF files you want to index. Results for phrases may be less accurate due to the output format of pdftotext; columns of text are left in the conversion rather than continuous content.
Common Problems and Returned Errors
- Cannot open ".../database/files: No such file or directory...?
This error may occur if you do not have a database directory in the kSearch folder, if your server does not grant write access to CGI scripts, if your permissions are set incorrectly, or if the DBM module is not installed correctly. In order to run both the indexer and search script, your server must allow write access to CGI scripts.
If your server grants write access to CGI scripts and you still get this error, then try using a flat file database by setting $USE_DBM = 0 in configuration.pl.
- When I run the indexer from the web, the script ends prematurely.
Most servers give HTTP requests a time limit (time out) to prevent tying up the server's resources. If you are indexing a relatively large site from a web browser, you will probably reach this limit. The only way to get around this limit with the current version of kSearch is to ask your host service or system administrator to increase the time so the server does not time out.
- When I try running the indexer or search script from the web, my browser returns the contents of the file instead of running the script.
Be sure to run indexer.cgi (not indexer.pl). Additionally, be sure your server is set to run Perl for .CGI scripts, particularly in the directory that kSearch is installed.
Download a copy of kSearch. Save it to your desktop and extract it. The next most important thing is to read both README and FAQs file(s) included with your download. After finishing with both of these, then go back to the README file and begin by following the instructions. It's important to take your time and complete each step as outlined. Don't skip a single one. The most difficult thing you're going to tackle is the CONFIGURATION.pl file. The average time to get your kSearch Website Search Engine up and running is about 45 minutes (which includes all the reading).
Download kSearch kSearch 1.6 | kSearch 1.6 UTF-8 w/BOM | kSearch 1.6 UPGRADE | kSearch Client-Side 1.0
contact us to report broken links
Topic: kSearch FAQs
Page 1 of 1