Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a task for automatic text recognition #455

Open
PonteIneptique opened this issue Jan 25, 2024 · 7 comments
Open

Add a task for automatic text recognition #455

PonteIneptique opened this issue Jan 25, 2024 · 7 comments
Labels
tasks @huggingface/tasks related

Comments

@PonteIneptique
Copy link

Hi :)
We are in the process of working a pipeline to help people publish their data to huggingface in the context of HTR/OCR groundtruth and HTR-United, and have ourselves a fair amount of data.
I wonder if it could be possible to have a ATR (Automatic Text Recognition) or OCR/HTR (Optical Character Recognition / Handwritten Text Recognition) task to register our datasets under, instead of the quite broader Vision to Text, which seems more focused on image-description datasets ?
Thanks !

@coyotte508 coyotte508 added the tasks @huggingface/tasks related label Jan 25, 2024
@coyotte508
Copy link
Member

cc @merveenoyan @osanseviero

@osanseviero
Copy link
Member

cc @sanchit-gandhi and @Vaibhavs10 for our audio experts :)

@Vaibhavs10
Copy link
Member

This is more vision no?

@PonteIneptique
Copy link
Author

This is more Vision than this is Text (although, depending and who you ask...) but I don't think that Multimodal > Vision-to-text is a good match for HTR/OCR/ATR

@osanseviero
Copy link
Member

Sorry for my confusion, I read too quickly and did string matching with ASR 🥲

Yes, this is indeed vision, In the past, OCR models have been tagged as image-to-text such as in https://huggingface.co/microsoft/trocr-base-handwritten . I think potentially we could keep image-to-text + add a secondary subtype for this use case (either ocr or atr as suggested). WDYT @merveenoyan @NielsRogge @lhoestq ?

@lhoestq
Copy link
Member

lhoestq commented Jan 26, 2024

I'm ok to add a new task_id "ocr" or "optical-character-recognition" under "image-to-text"

@merveenoyan
Copy link
Contributor

I agree with @lhoestq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tasks @huggingface/tasks related
Projects
None yet
Development

No branches or pull requests

6 participants