Frequent Items Counting

A Study on Memory-Efficient Algorithms

Description

Determining the most frequent items on a data stream has many applications and is a hot topic on the research community of the field. The challenges inherent of data stream processing in a memory efficient way are very much worth exploring and some of the existing solutions already provide with great optimization strategies.

In this project, we focus on one of the most famous approximate counters to determine an estimation of the most frequent words of literary works from several authors in several languages and compare it to an exact counter. We also present a few conclusions drawn from the study applied to the dataset.

Repository Structure

/dataset - literary works taken from Project Gutenberg used as input data

/out - contains the programs' output

/report - the written report on the study conducted is made available here

/src - contains the source code, written in Python

Instructions to Run

First install all required packages:

$ pip3 install -r requirements

To run the word counting program, execute the following command:

$ python frequentWordFinder.py -d 1 -m 100 aliceInput/

Authors

The authors of this repository are Filipe Pires and João Alegria, and the project was developed for the Advanced Algorithms Course of the Master's degree in Informatics Engineering of the University of Aveiro.

For further information, please read our report or contact us at filipesnetopires@ua.pt or joao.p@ua.pt.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
dataset		dataset
out/aliceOut		out/aliceOut
report		report
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

out/aliceOut

out/aliceOut

report

report

src

src

README.md

README.md

Repository files navigation

Frequent Items Counting

Description

Repository Structure

Instructions to Run

Authors

About

Languages

FilipePires98/FastCount

Folders and files

Latest commit

History

Repository files navigation

Frequent Items Counting

Description

Repository Structure

Instructions to Run

Authors

About

Topics

Resources

Stars

Watchers

Forks

Languages