DejaText is a Python script for identifying duplicate and similar text in a directory of text or markdown files. It scans a directory of .txt' or
.md’ files, identifies duplicate and similar text segments, and produces organized reports for easy review. As part of my writing, I find it useful to go through a project and flag repeated words, phrases, or sentences. DejaText helps me with this.
Daniel Tubb
Writer’s Diary #52: Repeated Words
Today was revision. Cutting and tightening a few sections on pencils. A week ago it was 6,000 words. Now, it’s 4,000. The task today was words and phrases that are superfluous. Overused. Bugaboos. It’s not that all repeats are bad. But, the trick is to be deliberate. My drafts are full of words and phrases reused, without deliberation. They can often be cut. The idea for this came to me from John McPhee’s Draft No. 4.
It is toward the end of the second draft, if I’m lucky, when the feeling comes over me that I have something I want to show to other people, something that seems to be working
and is not going to go away. The feeling is more than welcome, but it is hardly euphoria. It’s just a new lease on life, a sense that I’m going to survive until the middle of next month. After reading the second draft aloud, and going through the piece for the third time (removing the tin horns and radio static that I heard while reading), I enclose words and phrases in pencilled boxes for Draft No. 4. If I enjoy anything in this process it is Draft No. 4. I go searching for replacements for the words in the boxes. The final adjustments may be small-scale, but they are large to me, and I love addressing them. You could call this the copy-editing phase if real copy editors were not out there
in the future prepared to examine the piece. The basic thing I do with college students is pretend that I’m their editor and their copy editor. In preparation for conferences
with them, I draw boxes around words or phrases in the pieces they write. I suggest to them that they might do this for themselves.
This is an early step. Cut early, then revise with care.
I use tools: a script that lists repeated words, and Pro Writing Aid, which has a tool to list repeated words and phrases.
NaNoWriMo Update #4: Artisinal Writing
What’s my update today? I was working on a section of cane toads and sapos, and then moved on to spies and anxiety of fieldwork.
I think I’m using it all to get at a discussion of the basic extractive nature of research, which I don’t contest. However, I’m not totally convinced by the normal solutions that people propose—participatory research, do people really want more workshops, solve the issue. But, in any case, I am still committed to the ethnographic endeavours. So, how does one square that circle? I don’t think you can. So, my answer is a. It comes down to the labour of ethnography—both in the field and afterwards.
The conditions of the research and the writing, as it were. It would seem silly to ask my friends from the field to participate in my half-decade-long writing process. But, to see that writing as labour, as embodied, as a practice, is to think of it, perhaps, like artisanal gold mining or subsistence production. Maybe? Is this a point worth pursuing? My writing process, at least as I imagine it to make myself feel better about being a cane toad, sapo, snitch, inquisitorial ethnographer, is artisanal, makeshift, craft work.
Maybe what I’m trying to say is the difference between an open-pit mega mine and a artisanal gold mine. Both are extractive, but the latter does little harm. I of course imagine myself on that latter side of things.
This is what I was trying to articulate today. Didn’t come together, but I’ll try tomorrow.
No finishing Friday today. But, maybe 5000 words are good.
Step by step; bird by bird.
NaNoWriMo Update #3: Zonked by Words
It’s 2 p.m. I’m driving home from the office without my computer. I got to campus at 8:30, went to a coffee shop, and wrote intensely until about 11:00. Then I went to my office, did some more, had lunch, worked a little bit with a student on a grant, then did some more writing on the book. I’m done for the day, and it’s only 2:30, which is good because I’m zonked. I’ll go home for a hike. It feels like cheating, stopping early. But, it’s already been a six-hour day of editing, cutting, quoting, polishing, revising, and rewriting. More manual than intellectual, really. The book’s argument is that writing can be a form of more manual labor, after all. The section, I picked at for years. Maybe five years? It’s an about fieldwork and feelings of anxiety, of being a spy, an outsider, out of place, doing something where they don’t belong. It’s a common feeling, I suspect, for anthropologists. It’s also about cane toads, gossips, tattletales, snitches, and spies. Its now 800 words or so. That doesn’t seem like a lot, but because it’s tight, it’s dense.
At the end of the day, before checking out, I did some rough notes for tomorrow using voice dictation. It’s about thinking about ethnography as both an extractive and an artisanal endeavor. Tomorrow I’ll tighten up the rest of what I’ve already written and then do a first pass on the notes. Friday I’ll finish the whole chapter. It will be about 7,000 words.
I have no idea how that will count for NaNoWriMo records. I’ve been doing a lot in the last few weeks, but haven’t updated or tracked word counts. The election derailed me a lot. Then, I did a bit of programming. Today and Friday, were pretty good writing days.
NaNoWriMo Update #2
Today’s brief update: I went into my Cane Toad section and reviewed it. I worked on structure and coded my really rough notes into a draft outline. This meant putting pieces about the same thing together. I structur. Tomorrow, I’ll tighten each section down to remove all repetition, get the tense correct, make it short as possible. Wednesday, I’ll step back and look at the whole.
While I was at it, I put into order the section on places where I did fieldwork, and describe the work of writing in the field.
In total, I have 6,000 words that are in a tentative order. Is it a perfect? Good? No. Will it change, yes. But, it gives me something to work on tomorrow. I will take each small section of maybe 150 or 200 words and edit them individually.
Once I’ve done that, I can review the whole and see what I have on Thursday.
So far, 6,200 words. A win. I’d say. It’ll give me 2,500 words a day, give or take. On track for NaNoWriMo, I hope. But, word counts are somewhat silly if I think too hard about them. But, since my goal is not a word count, but a draft in a few weeks, it feels okay.
NaNoWriMo Update #1
Update on Today’s Writing.
I was at my mum’s and started flipping through Kyo Maclear’s book Unearthing, about plants, gardens, and tangled roots. On the third page, there’s a line that left me wondering. In it, Maclear reflects on how her own failure to grow plants transferred in 2019. She describes the crucial change:
“When I stopped attributing every little event to my own doing and realized I did not have control (the opposite of a storyteller’s mindset), the plants began to grow” (p. 3).
I know the feeling, but I feel it with words.
I know the art of writing involves an attempt at control. Part of why words can be so torturous is because we’re trying to make them perfect. But in this book, I’m increasingly convinced that the trick is to give up control—to let things come, to recognize that there is agency in the words, in the fingers, and in the process that isn’t merely a reflection of the mind’s control. The result of all this hard work reads like something that comes out of control, but the output has little resemblance to the process.
I wrote a section on this. It’s a bit like Peter Elbow’s metaphor of “growing and cooking.” One grows a garden, where things are messy and uncontrolled. Later, one cooks a meal. But even in cooking, there’s a lot that is out of control. Or, maybe, much of cooking is based on practical knowledge. Certainly, this is De Certeau’s point. Writing is both embodied and practical, of course. The point, it’s not fully controlled.
With all that written in shitty first draft rough sketch, I then turned to a section on cane toads. I’d written five versions of a cane toad hopping into the room I slept in. I worked to cut and revise them into one canonical section.
My point?
Cane toads in Colombia are also known as sapos. Sapo is a colloquial term for snitch and spy, often spoken with venom. Sapos often die young in a country at war. Is writing ethnography an exercise in getting into places one does not belong? Are we not professional strangers, but spies, tattletales, snitches? Anthropologists are often mistaken for spies. But, I don’t think that feels fair. Yet, it does reflex a lot of our professional anxieties.
All in all, with about an hour’s work, little abstract thought, and certainly not much planning, I wrote 800 new words on writing as gardening without control. I also revised and condensed several drafts I’d written over the years bout a cane toad into a tighter scene of about 3,000 words. So, let’s say, Day Three of NaNoWriMo, I got about 4,000 words done. Makes up for not really doing much yesterday.
Writer’s Diary #51: NaNoWriMo
This is a quick update. It’s November—time for NaNoWriMo, time for ambitious goals, audacious writing, even if the words themselves at the end will be ever contingent. The words end up being imperfect. Often, in many cases, so imperfect as to be nearly useless. But the aim is to get something done.
On Friday, my task was a “Finishing Friday.” I sent an article that I’ve been fiddling with for a long time. Is it good? No. Is it perfect? No. Am I happy with it? Not really. But I sent it off to a journal. It will get reviewed, sent back, and then I’ll try again. Finishing Friday.
I think this month is going to be something similar: Finishing November. Or, of course, NaNoWriMo. My goal this year? “Finish the Goddamn Book Writing Month.”
I have a small writing group with some friends. One of them, along with me, is adopting some goals. She’s going to write the first three chapters of the book she’s working on.
My goal? I will revise and reorder and magpie my way into a complete draft of the book by the end of the month.
What does that mean? At first blush, that means 90,000 words.
90,000 words is a good-sized academic book. Mien will be ordered, broken into scenes, with narrative and argument woven together, in support of a makeshift way of proceeding.
I don’t mean 90,000 words, perfect. I don’t mean done, for good. I don’t mean tight as I can get it.
But, I do mean that I want to take forward momentum, stop revising, and weave together a book of about 90,000 new words, organized, put into a temporary, contingent, place, lightly polished enough that I can get a feel for the whole things.
That’s the task.
What are the milestones?
Let’s see. First, a word budget can help.
90,000 words is a good-sized academic book, at least according to William Germano (2009, Getting It Published). Let’s break that down: take out 5,000 words for references and another 10,000 words for notes. That gets us to maybe 80,000. Add in some padding both ways. Say, 75,000 words. So, we’re left with getting a draft of about 75,000 words.
I’ve already got 5,000 words polished. So, that means my task for the November is 70,000 words.
I have 10,000 words from finishing Friday. So, that leaves 60,000 words.
60,000 words is the goal. It’s November 3rd today. Lets break 60,000 into 27 days. My goal is to write, revise, or reorder 2,500 words or so each day. Seems audacious. But, it’s doable. I’ve done it before. Quite a few times actually.
Crucially, my task right now is not write 2,500 words. Or at least, not most of the time. Rather, it’s to code, reorganize, gather, bring together, cut up.
The model is more like making a patchwork quilt, than knitting something from scratch.
But, the method is like a magpie. Taking shiny things, bringing them together, attacking them, seeing how they work.
Crucially, at this stage of the game, it’s not much thinking. It’s a manual work. Craft like. Physical labour.
Wish me luck.
I’ll do updates, daily.
q_transcribe
I want to introduce q_transcribe
a simple tool to transcribe images using QWEN 2 VL AI models.
What did I do to write q_transcribe? I’ve added some simple logic to a CLI wrapper that Andy Janco wrote to run QWEN 2 VL from the CLI.
q_transcribe
can be used to transcribe typed and handwritten text from any image.
How could it be used?
- Transcribe handwritten notes. One of the methods I use is freewriting longhand. Notetaking is often the first step in my writing process. But, at times, it can feel a slog to transcribe 20 page of handwritten notes. Enter,
q_transcribe
. -
Transcribe handwritten archives. One of the projects I am working on with colleagues is an archival project in Colombia. We’re using QWEN 2B to extract text from images as part of a longer pipeline.
q_transcribe
is a simplification of our workflow, which works on an image, a folder of images, or a folder of folders of images.
What is my contribution? I added logic to Andy Janco’s CLI wrapper to QWEB 2 VL’s sample code. My logic handles JPG, JPEG, or PNG files, sorts them, skips files that have already transcribed, and chooses between a CUDA (Nvidia GPU), MPS (Apple Silicon GPU), or CPU.
In my testing, it works with QWEN 2B on my M1 MacBook Pro with 16 GB of RAM, and on a https://lightning.ai server which offers free access to a GPU for researchers.
To install, clone the repository from GitHub, install the necessary dependencies, and then run.
git clone https://github.com/dtubb/q_transcribe.git
cd q_transcribe
pip install -r requirements.txt
python q_transcribe.py images
Writer’s Diary #50: Become a Writer
Writing is a craft; it takes practice. The evocative power of ethnography to convey understanding requires careful attention to words. Words matter. Notes, fragments and jottings are the opening gambit of anthropology. They give energy to anthropologists’ writing and give us subjects to write about. We often begin with stories, fragments, and ethnographic shorts. These are the building blocks of an anthropological enterprise. But, to write them, requires reflecting on the writing process. So, why not experiment. Play. See what works. Read to. Keep reading. Write about what you read. Publish before you are ready. Write more. Rinse. Repeat. Write is a practice.
Writing as a practice needs to be decolonised. Words are often a black box that undergraduates and graduate students and professors aren’t really taught how to do.
As students, we engage with finished pieces and rarely see the messiness of the writing process. Writing is messy. It’s so so messy. Write. Cut. Revise. Reorder. Move. Writing is done on the page. But, the work can be creative and playful. It need not be a slog. It can be a place to experiment. So, play. Play with with free writing, play with genres, write essays, research notes, book reviews, articles, blog posts, social media posts. Experiment with writing short. Play with writing long. Write to think. Experiment with collaborative and group writing. Experiment in the classroom. See what works. See what doesn’t. Failure is fine. Try again.
What I am trying say is separate the writing from the anxieties and emotions of academic work. Ideas are important, sure. But, for me, they always emerge, truly, on the page. It’s on the page where I can make them do wonders. Where I can test them out. Feel them. Taste them. Let the words sing.
As Chilean Poet Pablo Neruda wrote in his memoir:
… You can say anything you want, yessir, but it’s the words that sing, they soar and descend … I bow to them … I love them, I cling to them, I run them down, I bite into them, I melt them down… I love words so much… The unexpected ones… The ones I wait for greedily or stalk until, suddenly, they drop… Vowels I love… They glitter like colored stones, they leap like silver fish, they are foam, thread, metal, dew… I run after certain words … They are so beautiful that I want to fit them all into my poem … I catch them in mid-flight, as they buzz past, I trap them, clean them, peel them, I set myself in front of the dish, they have a crystalline texture to me, vibrant, ivory, vegetable, oily, like fruit, like algae, like agates, like olives… And then I stir them, I shake them, I drink them, I gulp them down, I mash them, I garnish them, I let them go… I leave them in my poem like stalactites, like slivers of polished wood, like coals, pickings from a shipwreck, gifts from the waves… Everything exists in the word… An idea goes through a complete change because one word shifted its place, or because another settled down like a spoiled little thing inside a phrase that was not expecting her but obeys her… They have shadow, transparence, weight, feathers, hair, and everything they gathered from so much rolling down the river, from so much wandering from country to country, from being roots so long … They are very ancient and very new… They live in the bier, hidden away, and in the budding flower …
— Pablo Neruda, “The Word,” in Memoirs. Penguin Books, 1978, p. 53.
Ideas that are unwritten cannot be made external, or thought about, or tested, or changed, or made to sing. Instead, they remain amorphous. Always a potential. Write them, reflect on them, rework them. Revise. Writing is fun. As fun, revising.
Don’t say, “What am I going to write about today?” Instead, go back. Ask “What have I already written about?
Try this exercise. Make a slip box. Put int it a collection of notes and essays and pieces of text you’ve already worked on.
Your task?
Engage with what you’ve already prepared. Reread it. Recycle. Work like British Marxiust Eric Hobsbawn did. Develop a willingness to go back into the well of ideas that are yours and rework them. Revise them into new forms. Test them in public. Write lectures, op-eds, letters, articles, book chapters, and books from each other.
Writing becomes a task of preparing for publication, rather than starting carte blanche.
The trick? Not to draft or write every day, but to prepare for publication. Preparing for publication short pieces, long pieces. Book reviews, articles, essays. It’s not the biggest and most complicated piece; it would be small pieces. What’s the smallest piece you could work on? Start with that.
Start with the words. See what comes from them.
I am working on a book that revisits particular moments from Colombia and which, for now, is also about writing. I say for now, because I keep changing the book. At times it’s both. At times, it’s neither.
But, its building blocks are ethnographic shorts. These are moments that let me build stories and place them stories in a wider context and makes more general claims. My method is inductive, relying on detailed description through elaboration and explanation, trying to make the particular speak to the universal. I boil moments down to their essence, return to field notes to expand and explain, develop one moment and link it to the next, supplement my accounts with others reports–newspapers and archives and videos and research. Decide what to focus on and whose voices to hear.
The words matter too, though. Play them.
Punctuate dialogue, consider verb choice, and revise for active voice unless passive voice is preferred. The words matter. I use transitive verbs to drive description and animate the inanimate. Strive for clarity. Write then rewrite to assemble moments and analysis. Consider exposition, character, scenes, narrative voice, point of view, time and rhythm. I move around, change perspectives, take different approaches and revisit the same theme from different directions. I remove myself from the action or take someone else’s point of view. I adopt an omniscient perspective and choose my approach.
Each is an ethnographic choice, a way of writing to describe and make sense of moments. The raw material for this slip box is the field notes, written on the laptop or in one of the journals or notebooks, as the beginning of a long, slow, sustained process of my becoming a writer.
As a graduate student, in the field, in the Chocó, when there was electricity–carried by an unreliable power line through the jungle and across at least two rivers—I filled a database on my computer with field notes. When there was no electricity because of storms, rain and fallen branches, I wrote longhand with a pen and the light of a candle. Writing the notes was part of my learning to write. The book I am working on reflects on that learning because I am convinced that the evocative power of ethnography to give understanding requires careful attention to words.
Writing requires thinking, but it also requires playing, experimenting, and refining different techniques and processes to discover effective ways of communicating.
Writer’s Diary #49: Digital Workshops
At its best, my computer is not a distraction, but a place to work—a digital workshop. A text workshop, not so different from a carpenter’s workshop with its wood, chisels, drafting tables, power tools, planers, band saws, and jigsaws. In my digital workshop, many things are at hand.
Partly, I mean the storage—the hard drives and flash drives where I keep field notes, first drafts, projects in progress, publications, finished notes, video, film, maps, and photographs. Some of it comes from the computers, laptops, and iDevices I have used over the years: the tablets and readers. Much of it is the detritus accumulated over two decades as a student and then as an academic, stored in various folders. Mostly, I mean the tools. The tools of the word processors, screenwriting apps, mind-mapping apps, search tools, bibliographic managers, and search engines.
One of my favorite pieces of software is Eastgate System’s Tinderbox, which is, in many ways, a digital equivalent of an analogue notebook and a carpenter’s workshop and much more besides.
It’s a digital tool, and a place to work with text and to do things that were impossible before the digital age. To write, link, make maps, collect, edit, cut up, revise, reorder, outline, search, and much more. Mark Bernstein, the lead developer at Eastgate Systems, has offered regular updates for decades.
Tinderbox is a Swiss Army knife for notes, providing a single interface that suits the way I work.
It also has a powerful set of programming and automation tools that allow me to work with notes.
I think of it, the same way John McPhee thinks of his tools.
John McPhee, one of the most prolific writers of long-form creative non-fiction, has an article in The New Yorker about his writing process, which became part of his book Draft No. 4. McPhee tells the story of lying on a picnic table with all his notes, research, interviews, and everything else in manila envelopes, but he’s distraught because he didn’t know the structure. He says this is no way to write.
I agree with him, and indeed, structure is the hard part.
McPhee used to use analogue tools to find structure. But by the 1980s, he adopted a computer—specifically a dedicated word processor. Kedit, short for the Mansfield, Massachusetts-based company KEDIT, was a full-screen text editor. McPhee describes how he moved chunks of text around using custom-built text macros to code, split up, and bring back together text. It’s something I’ve duplicated for my own work.
For me, Tinderbox is my computer. A lot of writing is rewriting and revising, linking and connecting, making connections, and undertaking an archaeology of your own ideas and notes. Tinderbox is, for me, a powerful tool for that.
It’s the heart of my digital workshop.
Still, at times the computer is a place of distraction. There are times when I sit down with nothing but a pen and write longhand for an hour to see what comes out.