Skip to content

Frequent Items Counting: a study on memory-efficient algorithms.

Notifications You must be signed in to change notification settings

FilipePires98/FastCount

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Frequent Items Counting

A Study on Memory-Efficient Algorithms

Description

Determining the most frequent items on a data stream has many applications and is a hot topic on the research community of the field. The challenges inherent of data stream processing in a memory efficient way are very much worth exploring and some of the existing solutions already provide with great optimization strategies.

In this project, we focus on one of the most famous approximate counters to determine an estimation of the most frequent words of literary works from several authors in several languages and compare it to an exact counter. We also present a few conclusions drawn from the study applied to the dataset.

Repository Structure

/dataset - literary works taken from Project Gutenberg used as input data

/out - contains the programs' output

/report - the written report on the study conducted is made available here

/src - contains the source code, written in Python

Instructions to Run

First install all required packages:

$ pip3 install -r requirements

To run the word counting program, execute the following command:

$ python frequentWordFinder.py -d 1 -m 100 aliceInput/

Authors

The authors of this repository are Filipe Pires and João Alegria, and the project was developed for the Advanced Algorithms Course of the Master's degree in Informatics Engineering of the University of Aveiro.

For further information, please read our report or contact us at filipesnetopires@ua.pt or joao.p@ua.pt.

Languages

  • TeX 97.6%
  • Python 2.4%