I want to introduce q_transcribe
a simple tool to transcribe images using QWEN 2 VL AI models.
What did I do to write q_transcribe? I’ve added some simple logic to a CLI wrapper that Andy Janco wrote to run QWEN 2 VL from the CLI.
q_transcribe
can be used to transcribe typed and handwritten text from any image.
How could it be used?
- Transcribe handwritten notes. One of the methods I use is freewriting longhand. Notetaking is often the first step in my writing process. But, at times, it can feel a slog to transcribe 20 page of handwritten notes. Enter,
q_transcribe
. -
Transcribe handwritten archives. One of the projects I am working on with colleagues is an archival project in Colombia. We’re using QWEN 2B to extract text from images as part of a longer pipeline.
q_transcribe
is a simplification of our workflow, which works on an image, a folder of images, or a folder of folders of images.
What is my contribution? I added logic to Andy Janco’s CLI wrapper to QWEB 2 VL’s sample code. My logic handles JPG, JPEG, or PNG files, sorts them, skips files that have already transcribed, and chooses between a CUDA (Nvidia GPU), MPS (Apple Silicon GPU), or CPU.
In my testing, it works with QWEN 2B on my M1 MacBook Pro with 16 GB of RAM, and on a https://lightning.ai server which offers free access to a GPU for researchers.
To install, clone the repository from GitHub, install the necessary dependencies, and then run.
git clone https://github.com/dtubb/q_transcribe.git
cd q_transcribe
pip install -r requirements.txt
python q_transcribe.py images