Skip to content

SectionSeeker AI primary objective was to develop a search bot that allows users to query their uploaded documents. The bot identifies relevant sections, enabling humans to interpret the results.

Notifications You must be signed in to change notification settings

LucasMYSDev/SectionSeeker_AI

Repository files navigation

🤖 SectionSeeker AI

📌 Overview

The SectionSeeker AI is an innovative and advanced search tool designed to pinpoint relevant data sections within uploaded DOCX files. It responds to user-provided keywords or entire sentences. Unlike traditional search functionalities, this tool employs cutting-edge AI models to comprehend and address user queries with unparalleled precision.

✨ Core Features

  • DOCX File Support: Instantly render your DOCX files searchable.
  • Dual Search Mode: Whether it's a specific keyword or a full-fledged sentence, we've got you covered.
  • Multiple Document Selection: Why limit to one? Search across several uploaded documents simultaneously.
SectionSeeker_AI_Chat_Demo.mov

🔧 Technical Details

SectionSeeker AI adopts an embedding search technique utilizing the text-embedding-ada-002 model from OpenAI. It further integrates the gpt-3.5-turbo-16k model for AI-driven filtering during its learning journey. Such an approach empowers users to converse in natural language, breaking free from traditional tool constraints, like the "find" feature in standard browsers.

📂 SectionSeeker AI Website Pages

  1. Contents Page
    • Organize and manage with folders.
    • Ability to delete folders and remove uploaded documents.
    • Uploaded documents remain permanent in the database unless deleted.
    • Upload documents for the search bot to learn from.
    • View how the search bot interprets the uploaded documents.
  2. Chat Page
    • Dive into a direct conversation with the search bot post-document upload.

⚡ Quick Start

To test and try SectionSeeker AI:

📖 How to Use

Step 0: Preparing Your Document

  • Section Identification: SectionSeeker AI primarily identifies sections through heading styles, but it also uses GPT to categorize larger sections into sub-sections.

  • Optimal Heading Styles: For the best results, apply only essential heading styles. However, enriching the document with additional styles improves the AI's processing speed.

  • Performance Insight: As a reference, a .docx file of about 4,000 words, formatted correctly, is processed in under 15 seconds.

  • Recognizable Styles: The AI identifies headings based on Microsoft Word styles with terms like "Headings" or "SubTitle". See the example below for clarification:

    heading_style_example

Step 1: Getting Started

  • Initiate by logging in with the credentials given in the Quick Start section.

Step 2: Navigating Content

  • Access the content page by selecting "Contents" from the main navigation bar.

Step 3: Document Management

  1. Add a new folder.
  2. Choose a folder and upload your document.
  3. Review the processed documents by clicking on "Documents" within the Contents page.

Step 4: Engaging with Chat

  1. Access the chat interface via the "Chat" option in the navigation bar.
  2. Choose your desired document(s) to inquire about.
  3. Hit "Ask" or simply press "Enter" to submit your question.
  4. The AI will present the top 5 relevant sections, each with:
    • Document source and section title.
    • A relevance score (as a percentage).
    • The extracted text recognized by SectionSeeker AI.

Limitations of SectionSeeker AI for Documents

Please note the following constraints when using SectionSeeker AI:

  • Development Stage: As SectionSeeker AI is still in its early development stages, it currently supports only .docx file types.

  • Document Complexity: The tool may not function optimally with .docx files containing images or intricate structures. It is best suited for plain text documents. This is ideal for business-related documents like company handbooks or business agreements, which usually have a straightforward format.

  • Styles Dependency: SectionSeeker AI primarily uses heading styles to demarcate sections. If a .docx file has pre-existing styles that aren't correctly assigned, the tool might still process the document, but there might be unintended segmentations. For instance, what should be a single section might be split into two. As a result, when users search for specific sections based on queries, the returned results might not be entirely accurate.

To achieve the best results, ensure your document adheres to the guidelines and is compatible with the tool's current capabilities.

📂 Project Structure

  • Main Directory: SectionSeeker_AI
    • Essential files: requirements.txt, wsgi.py, config.py.
  • Subdirectory: flask_APP (Within SectionSeeker_AI)
    • Contains files and folders for styling, web interface templates, user registration, search bot web endpoints, and more.

🔗 Extend or Clone the Project

To expand upon or set up SectionSeeker AI on your local machine, check out the detailed installation guide on GitHub:

About

SectionSeeker AI primary objective was to develop a search bot that allows users to query their uploaded documents. The bot identifies relevant sections, enabling humans to interpret the results.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published