I have a task on my hands.
Basically what I have to do is to create a simple search engine that goes through a group of text documents and record for each word in the document collection all documents that contain a particular word.
The simple search engine must accept a search query (containing a set of keywords) and identify each document that contain all or some keywords.
It should then print documents names in descending order of keywords found, this means the document that contains all keywords should appear at the top of the list
I'm struggling with the pseudocode let alone the program for it.
For example, the pseudocode might be:
> define a class Result with variables int count and string filename
> make an ArrayList or other collection to add Results to
> get List of file names from directory
> get list of keywords from user
> for each file in file names do:
>> for each keyword do:
>>> search for keyword
>>>> if found: count++
>>if count>0: add Result to list
I don't know Java, but in C# I'd read the whole file with System.IO.File.ReadAllText(String) then use a RegularExpression.
I definitely would not use IndexOf -- that will lead to false-positives.
System.Text.RegularExpressions.Regex reg =
( @"(?i)\b(a)|(the)|(this)\b" ) ; //Create the expresion from the provided terms
System.Text.RegularExpressions.MatchCollection mat = reg.Matches ( args [ 0 ] ) ;
System.Console.WriteLine ( mat.Count ) ;