SearchEngine

A tiny search engine featuring a trie as a primary data structure to improve performance. There is a custom HTTP server that supports GET requests and serves pages inside a specific directory. The directory contents are created by a bash script which splits text files into random HTML pages and adds links to other pages for indexing purposes. The Crawler is responsible for downloading the pages from the web server, analyzing them, and following the links to the rest pages or "web sites". After the crawling has finished, the Crawler supports remote commands including search, through a telnet connection.

Installation

Simply fork the project and save it into a directory giving it the required permissions with chmod 755

Usage

While in the main folder in the project directory, run the makefile by typing make

Creating the web sites `./webcreator.sh root_dir text_file w p`

root_dir: needs to be created before hand and will store the web sites
text_file: text source to create the content of the web pages
w: number of web sites to be created
p: number of pages to be created per web site

Starting the server

./myhttpd -p serving_port -c command_port -t number_of_threads -d root_dir

Serving and command port: the ports the server will be listening on (eg. 8080)
Number_of_threads: the number of threads the server will execute in parallel
root_dir: the directory specified in the web creator

The server receives the following commands via telnet: STATS, SHUTDOWN

Crawler

./mycrawler -h host_or_IP -p port -c command_port -t num_of_threads -d save_dir starting_URL

host_or_IP: the domain name or the IP of the machine the server is running on
port: the server's port
command_port: the port on which to connect via telnet and query the crawler
num_of_threads: same as above
save_dir: the directory where the crawler will store the downloaded pages
starting_URL: the URL of the web page where the crawling will start (eg. http://localhost:8080/siteX/pageY_Z.html)

After the crawling has finished, you can connect to the command port via telnet and use the following commands: STATS, SEARCH word1 word2 ... word10, SHUTDOWN

Authors

Stylianos Kotanidis - University of Athens

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Crawler		Crawler
LICENSE.md		LICENSE.md
README.md		README.md
bashinput.txt		bashinput.txt
functions.c		functions.c
functions.h		functions.h
main.c		main.c
makefile		makefile
webcreator.sh		webcreator.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler

Crawler

LICENSE.md

LICENSE.md

README.md

README.md

bashinput.txt

bashinput.txt

functions.c

functions.c

functions.h

functions.h

main.c

main.c

makefile

makefile

webcreator.sh

webcreator.sh

Repository files navigation

SearchEngine

Installation

Usage

Creating the web sites `./webcreator.sh root_dir text_file w p`

Starting the server

Crawler

Authors

License

About

Releases

Packages

Languages

License

devkot/SearchEngine

Folders and files

Latest commit

History

Repository files navigation

SearchEngine

Installation

Usage

Creating the web sites ./webcreator.sh root_dir text_file w p

Starting the server

Crawler

Authors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Creating the web sites `./webcreator.sh root_dir text_file w p`