machine learning – Daniel Tubb

I want to introduce q_transcribe a simple tool to transcribe images using QWEN 2 VL AI models.

What did I do to write q_transcribe? I’ve added some simple logic to a CLI wrapper that Andy Janco wrote to run QWEN 2 VL from the CLI.

q_transcribe can be used to transcribe typed and handwritten text from any image.

How could it be used?

Transcribe handwritten notes. One of the methods I use is freewriting longhand. Notetaking is often the first step in my writing process. But, at times, it can feel a slog to transcribe 20 page of handwritten notes. Enter, q_transcribe.
Transcribe handwritten archives. One of the projects I am working on with colleagues is an archival project in Colombia. We’re using QWEN 2B to extract text from images as part of a longer pipeline.

q_transcribe is a simplification of our workflow, which works on an image, a folder of images, or a folder of folders of images.

What is my contribution? I added logic to Andy Janco’s CLI wrapper to QWEB 2 VL’s sample code. My logic handles JPG, JPEG, or PNG files, sorts them, skips files that have already transcribed, and chooses between a CUDA (Nvidia GPU), MPS (Apple Silicon GPU), or CPU.

In my testing, it works with QWEN 2B on my M1 MacBook Pro with 16 GB of RAM, and on a https://lightning.ai server which offers free access to a GPU for researchers.

To install, clone the repository from GitHub, install the necessary dependencies, and then run.

   git clone https://github.com/dtubb/q_transcribe.git
   cd q_transcribe
   pip install -r requirements.txt
    python q_transcribe.py images