[supplied title]

I think I can make this work

2024-05-08T00:00:00-07:00

... and by this, I mean "this blog." I think I can make this blog work. I mean that both in a personal sense (I'm going to write again) and in a technical sense (I'm probably not going to break it).

For my first post in two years, I figure I should write about why I almost gave up on running a static site, and why I decided to stick with it.

But first a little anecdote.

A few years ago, a small organization was having problems securing their self-hosted wordpress blog. The manager of IT had an idea: they would move to a static site. IT set up the new site and began the process of porting over all the existing blog posts.¹ Somewhere along the way, they told the rest of the staff what they were doing. It was not a great development for blog post authors.

The new site had a GUI for authoring, which put it ahead of many static site user interfaces, but there wasn't much in the way of an accounts system. Previously, each author could compose their post in wordpress, submit it for editorial review, and have it published, all from the blog interface. Now the process would be: write a document in a text editor or word processor and send it to the editor, who was tasked with manually entering the content into the static site.

Within a month or two, the organization decided to revert to running wordpress.²

I bring this up not to tell a story about organizational dysfunction (or not just for that reason) but for the perhaps less interesting point about static sites and authoring interfaces. The static blog frameworks I've looked at have not been great for authoring. I almost went back to wordpress earlier this year.

If you're reading this, and you're a static convert, you might be thinking "why not use markdown? markdown is great! many static sites support markdown, what's the problem?" I don't entirely disagree, I've gotten very used to writing notes in markdown. But markdown isn't the entire story.

For blog posts, I'd like to have a bit more WYSIWIG/GUI when I'm including images or file attachments, rather than feeling like I'm just guessing where to put my files and how to format the paths to them, just to get working links at the end of the process. "Just markdown" works great only for a limited range of post genres. Image and media management is the main reason I considered returning to wordpress.

My other problem has been themes. I started my static site on Octopress, which I thought looked pretty good out of the box. I liked it enough that I stuck with the theme when I switched over to my current static framework, Pelican.³ But I found customizing that theme an unrewarding struggle, and after changing some colors I basically gave up. I can't say I enjoy having to install a compiler just to update some CSS.

The theme sounds like a small thing, but it left me feeling like I wasn't really "running" my own site, I was just getting by. At least my wordpress install was managed by my hosting provider, and the wordpress themes I've used have had some amount of customization built into the wordpress GUI.

By the time I bought a new desktop computer last year, I'd basically given up on this site. I didn't even bother going through the setup process to be able to publish from the new machine until I wrote this post. I figured I shouldn't write anything until I decided whether to migrate back to wordpress. I kept putting of that decision.

So why have I decided to stick it out in static land? I looked at the current state of wordpress and got so overwhelmed that I remembered why I went to a static site in the first place: I don't actually need many publishing features. I just want to put up some web pages in an organized way that is readable and persistent. I can deal with the image management problem.

As for themes, I'm trying to embrace the DIY approach. A few weeks ago, I went through all the Pelican themes and chose one of the few responsive ones that I felt I could live with, just as a stopgap to free my blog from Octopress. No more sass compiler or javascript of questionable worth. And the site looked … ok out of the box, I guess.

Very few Pelican themes appear to be maintained, which I usually take as a bad sign. But in this case it felt like a license to treat my new theme as a tear down and start adding my own styles, as there's apparently no reason to worry about incorporating updates. So I pulled out the old bootstrap stylesheet (from 2015?), deleted a bunch of classes I would never use ("jumbotron", "marketing"), and added just enough CSS to feel like I knew I could keep making it better.

Yes, the blog looks a bit off-center (it is, I'll eventually fix that). Yes, I will be adjusting some of the colors. Yes, I probably need to tweak font size and line spacing. Yes, the theme has two files named styles.css and I have no idea why because only one seems to have an effect on anything. And yes, the page templates use a lot of divs where people would use semantic tags today, and I need to sort out which classes to keep and which to remove.

But I can read my blog with my nearsighted eyes without too much extra zoom, the posts and pages resize to fit a tablet or phone, and when I edit the CSS or html templates, I can see that my changes have an immediate effect. It feels like it's my blog again.

A story for another day, but this process also led to mangling the formatting on posts in ways that reflected poorly on the organization, and incidentally on the authors themselves, who were not told what was happening until after their posts were mangled. ↩
The authoring workflow wasn't the only reason. See the above note about mangling post formatting during the migration. And other reasons not included in this telling of the story. ↩
I switched because updating Octopress turned out to be more difficult than it looked, especially in terms of retaining customization. Pelican seemed lighter weight and easier to maintain, with a clearer separation between blog publishing and the theme. At a development level, Octopress now seems abandoned, while Pelican appears to be actively maintained. ↩

better late than never

2022-01-22T00:00:00-08:00

reflection

This post is a week late. I want to blame computers but the problem is my own tendency towards procrastination. Computers just changed the nature of it.

In the last couple of weeks, I've seen a few social media posts that I think of as technological nostalgia prompts. By which I mean prompts like:

What was academic life like before broad internet access?

What do you miss about the pre-smartphone era?

Someone's response to the first prompt, about running across a college campus to turn in assignments, reminded me how for all of my student years, including grad school, I inevitably found myself finishing up papers at the last minute. What computers changed was the precise timing of what the "last minute" really meant.

I am just old enough that I did all of my college writing on paper, but just young enough that I never used a typewriter. I made heavy use of printers.

Having to print your papers enforced a certain schedule. It didn't stop me from writing all night when I absolutely had to, but it did mean I had to leave enough time for after-writing activities like: printing, trying to print again, unjamming the paper feed, regretting my attempt to print double-sided, looking for a stapler, showering and getting dressed for leaving my room, going to campus, handing over the paper. I'm still surprised that I never missed a deadline.

Just a couple years later, in grad school, almost every assigment had to be turned in as a file, freeing me up to write and edit until the literal last minute. I had thought that I would develop better writing discipline because historians tend to write a lot. I was wrong. Even when I gave myself more time to finish, I just wrote more, later.

It was only in library school, which I came to after switching careers, that I started to try to work with my bad habits. I'd written so many short-to-medium length papers by then, and was confident enough that I could meet the baseline of "adequate work" required of a professional degree program, that I started to set limits on my writing time by working backwards from deadlines. I'd say, "I think I can write this in X number of hours" and then start writing X hours before it was due.¹ I still couldn't avoid the occasional late night but it took a lot of the stress out of procrastination.

This is all a long way of saying that I've been trying to write my posts on Sundays without losing any sleep to the blog. If there's one thing I want to get out of doing this as a regular exercise, it's a healthier relationship to writing. But last Sunday night I drove down to southern California, arriving late, and missed my self-imposed deadline. Since I can "publish" literally any time, I kept telling myself I'd get to it "tomorrow" after work, then kept putting it off when it was clear I'd be writing too late at night.

Which is how I've ended up writing this on a Saturday afternoon, and also why I'm writing today about procrastination itself, which wasn't what I'd originally planned to do. I'll try to get back to schedule tomorrow.

observation

I'm getting pretty tired of people engaging in what I think of as a sort of Covid-19 data arbitrage. It's come up every time there's been a surge in cases, following the original rise and decline in spring 2020. It goes like this:

Cases are rising but hospitalizations are steady or declining, and that's what matters
Hospitalizations are rising but serious cases remain low, and that's what matters
Serious cases are rising, but the death rate remains low, and that's what matters
We've reached the peak in cases, and that's what matters (; pay no attention to that high death rate, it's a lagging indicator)

I'd like to think one of the points of having data is to use it to inform policy. And some people actually do that. But there's quite a few people who look at multiple available metrics like they're a menu from which they can pick and choose whichever one supports the decisions they've already made.

memory

The closest I came to missing a deadline in the paper-printout era was for my undergraduate history thesis. It was a small class and at the midpoint we were all supposed to read and comment on each other's papers, so I had to print one copy per person. I left myself the whole afternoon just to do the printing, but it was also the longest paper I'd ever written, and I had a rickety inkjet printer at home.

I did some math after the second copy was done and realized I wasn't going to make the deadline if I stuck with the home printer's pace. So I gathered up my two finished copies and ran to a copy shop to make the rest. When I got to the history department, my classmates were all waiting at the mailroom, also having just barely made the deadline. We tracked down a stapler, exchanged papers, and went on our ways. I don't remember cutting things so close for the final draft.

you're on your own

2022-01-10T00:00:00-08:00

reflection

I've been having a difficult time writing this week's post. It's not that I've been lacking in topics; it's that they all feel like too much to take on right now: the pandemic, the anniversary of the January 6th insurrection, my recent decision to move to southern California this coming summer, away from the place where I grew up and where I've lived for the past eight years. But it's the pandemic that's been on my mind the most.

With so much community transmission, so little public health and policy mitigation, and so many reports of breakthrough cases, I've found it difficult to calibrate both my risk and my anxiety. I don't feel all that worried about my personal health, though I don't dismiss the chances of long Covid, but the uncertainty over how effective the vaccines are against Omicron and the transparent wishful thinking involved in calling the variant "mild" make it hard to judge the risk to my parents. My dad especially has multiple risk factors, and he's scheduled to start a short series of cancer treatments in less than two weeks. It's not reassuring to hear public health officials cheerily talk about how among the vaccinated only people with pre-existing health problems are getting severe cases when so many people fit in that category and so little is being done at the community level to help them avoid exposure.

I don't miss the pre-vaccination period of the pandemic at all, but I did feel like there was a collective sense of clarity around what we needed to do to reduce infection and why, even while we ultimately failed to follow through on those responsibilities as a country. Now the response seems much more fragmented, with many attempts to separate out this or that group as one to blame, dismiss, or ignore.

observation

Things seem bad. I've found myself wondering if any public health official who's been holding the line at "Omicron is mild" and "yep, cases are rising, but what are you going to do? [shrug]" is going to look back at this period, once the worst of it is over (whenever that may be) and think: "What have I done?" But few probably will.

They have plenty of stories they can tell themselves and us about why giving up on mitigation is better than any alternative. For purposes of self-justification, it doesn't matter if these stories are internally consistent or based on false dichotomies. They just need to be repeated. The economy is too strong to justify any provision of relief, but also too fragile to withstand even the most temporary of pauses. The isolation period for someone who has tested positive must be either 5 days or 10, not 6, 7, 8, or 9, and even though isolation starts with a test result testing should play only a peripheral role in determining its end. Our only policy choices are a total and complete lockdown without definite end, or doing essentially nothing.

Community spread is so high now it's easier to say there's no possible way to contain it. Contact tracing can't keep up so few infections have verified causes. If we can't see the transmission chains, how could anyone possibly intervene? Failure becomes its own justification; leadership becomes the regular delivery of updates on the consequences of failure.

I'm left wondering, in the absence of mitigation, what is going to stop the current wave? If "everybody" in a population of 300 million is going to get something, and that something is being distributed at a rate of 1 million people per day, you're looking at a long time before literally everyone has gotten it. In that scenario, a rolling average of a few million people would be home (or possibly worse, at work) sick at any given time.

But I doubt many people, even the loudest voices in the "everybody will get it" crowd, believes that literally everyone will get the current strain of Covid before cases start to drop. It's more likely that somewhere along the way a mixture of immunity and people staying home will eventually break enough chains to bring cases down. But if we're not trying to make that happen, when will it happen?¹

A pandemic doesn't end just because you became unresponsive.

memory

Fifteen years ago this January I rented a room in a house where five other guys lived. The kitchen was usually a mess and it got worse as winter turned into spring, bringing warmth, new odors, and insects. When you live somewhere with no collective sense of responsibility, cleaning up often falls to the person with the least tolerance for filth. To my surprise that person turned out to be me.

I wouldn't say I cleaned the kitchen daily, or even on a regular schedule. But I did occasionally wash all the dishes that had been left out, run the dishwasher, and put everything away. As the weather warmed, and as I got more fed up with the house, the landlord, and my housemates, I started to back off on cleaning. I remember coming home once on a hot spring day, walking into the kitchen, looking at the sink and counter, and walking right back out to make the half-mile walk to the nearest place I could buy a take-out meal.

The last straw was when someone started putting unrinsed bowls with cereal residue on them into the dishwasher. Maybe today's modern dishwashers can handle that level of grime but that's not what our cheap landlord had installed in the house. The spray of the dishwasher propelled bits of cereal from the bowls in the lower rack up into the cups and glasses on the upper rack. The cereal bits then dried in place. It got so that I couldn't assume any dish I didn't clean myself was actually clean.

I lived there only six months and by the time I left I'd settled into a new cleaning routine. I'd clean my own dishes after I used them. Beyond that, I'd only clean a dish if I was about to use it myself.

links

I'd say this just about sums up the U.S. Covid situation right now.

South Africa, apparently everyone's favorite model for Omicron projections, appears to have had mitigation policies in place before, during, and after their Omicron wave. This included a mask mandate, capacity limits in public spaces, social distancing rules, 11 PM closing hours for bars and restaurants, and a curfew from 12-4 AM . The mitigations may not have prevented the rise in cases, but it's certainly possible they helped break up the wave.

It's becoming increasingly clear that the U.S. and many European countries are not going to see as short of a cycle as South Africa saw. South Africa's current public health regulations can be found here; they were recently updated after a sustained drop in cases.

Similarly, South Korea brought back restrictions when Omicron reached there, and so far they seem to be keeping cases down. Things might spiral out of control but so far I'd say it's hard to look at their example and conclude that mitigation simply can't work. Meanwhile, according to a reader of The Guardian (which is consistent with other stories I've read), keeping Covid-19 under control has allowed South Korea to avoid major disruptions in other aspects of life. ↩

one more try

2022-01-03T00:00:00-08:00

reflection

If time is a social construct, scheduling is a construct on top of a construct. It may be arbitrary that a week has seven days, or that a week is a unit of time at all, but it's even more arbitrary to set yourself a goal to do something weekly. Last year I wanted to get back into the habit of writing and I started the year off writing one blog post per week. Somehow I managed to keep up that pace for nine weeks. Then I didn't post again until October, when I wrote four posts in one month. And then didn't post again in 2021.

By the arbitrary metric of one post per week, last year was not a success: I barely wrote more than one post per month, if you average out the whole year. But thirteen posts is more than I wrote during the years 2014-2020 combined, so I'm going to declare my vague goal of "writing more in 2021" to have been achieved.

This year, I'm going to try the weekly posting schedule again and I'm going to try to learn from what I think was my biggest mistake last year: thinking of each potential post as a sort of essay. What I had wanted to do was form a writing habit, even if what I wrote was boring, trivial, or diaristic, of interest only to myself. And I certainly wrote a few posts along those lines, when I just wanted to get something written down for the week. But I also found that unsatisfying, not much better than not posting at all.

The kinds of things I enjoyed writing were much harder to put together on a weekly basis, such as this surprising history of a viral email from the mid-1990s or this look into an obsolete audio format. Both of those posts took some research; I started writing the first one without even knowing where it would lead. I ended up leaving a few more posts in draft state when I realized they were eating away more of my weekends than I was willing to give up to sitting in front of a computer.

So why am I trying again this year? I felt like my weekly posting schedule, when I was keeping up with it, actually was making it easier for me to do more writing in general. And I was happy to see that some people actually read the posts I wrote last October, which were mostly about digital archives, and even found them useful. So I'd like to do more of that this year, plus possibly write some things formal enough for publication. And the hardest thing for me to do when it comes to writing is to start.

My approach to weekly posts this year is going to follow a template. I'm not going to call it a newsletter but I will admit it has a newsletter-ish resemblance. Each post will have four prompts:

reflection
observation
memory
links

I have no particular goals for the length of each section; the point of the prompts is simply to not make myself think up a new topic and structure every week. And maybe when I've gotten into the routine, I'll start doing more of the other kinds of writing again too.

That concludes this week's reflection, which doubles as an introduction.

observation

I'll have to come back to this when I have more time, but I've been thinking a lot about the difference between approaches to leadership that seek to take responsibility and approaches that seek to avoid blame. A lot of pandemic policy decisions in the U.S., especially in the past month, seem to be following the latter approach.

memory

I read earlier this week that California's rainy season is off to the wettest start since 1983. I was pretty young then, but I remember the winter of 1983 as the one time in my life I got to sled off a rooftop all the way down to the ground. We were staying somewhere in the Sierra Nevada mountains, I think near Bear Valley, and the snow had practically buried one side of the lodge.

That winter was followed by nearly a decade of drought, though I don't remember any of those years being as extreme as the drought years we've had since 2014.

copying files isn't always a straightforward process (or, some things I've learned working with digital archives)

2021-10-19T00:00:00-07:00

Copying files is a task that seems like it should be simple and often it is. Pick the right tool for your needs, set up a workflow, repeat. You often don't even need to know what you're copying, you can just duplicate the bits, verify that the copies match, and you're done.

Except ...

sometimes filenames are too long

Are you copying to a Windows system? Then you might have to watch out for long paths and long filenames. Lots of Windows systems are configured with filename or file path character limits. Usually this issue will creep up on you. Maybe a filename on a USB drive looks long but doesn't seem to be causing any problems. But then you try to copy it from the drive to a new location on a Windows machine, and the path to that location is itself a few levels deep. Something like D:\archives\collections\collection_number\accession_number\identifier. And suddenly you find that the long filename is too long to be copied because the combination of the destination path (on drive D:) plus the filename put it over the limit.

I've been lucky to have never learned a solution to this problem: I've used Windows in my archives work but never as the destination for long-named files. So while I've seen the issue, I've always had the option to send files to Mac or Linux systems that don't have the same limits. I believe that recent versions of Windows 10 now offer the option of removing the previous limits. But you may not have access to that yet in your workplace, depending on how often your systems get updates.

sometimes filenames use characters that other systems won't accept

This is another problem I've seen most often when copying to a Windows system. Windows has relatively strict rules for allowable characters in filenames. Unix-based systems, especially Linux ones, are a lot more accepting. So you might find that you can't copy a file from a non-Windows system to a Windows system without either having to rename it or letting the name get mangled during the copying process.

I've seen a few different characters, including question marks, inserted into filenames when one system couldn't deal with the characters it was being asked to interpret. This doesn't always prevent copying the bits so it's a good idea to have a check in place to make sure that filenames, not just files, have gotten copied. You don't want someone to come back years later to find a folder full of names like secret??_�_.txt. Sometimes you might find you have to rename the files yourself. It's a good idea to make a log that includes the original filenames if you have to do that.

sometimes filesystems are too insensitive to case

I've run into this one when copying (or trying to copy) files to Windows as well. But I've seen it in multiple operating systems. This problem is more an issue of filesystem support than it is a problem with filenames. Some filesystems will preserve case and treat the names New_File and new_file as different files. But many systems will display New_File exactly how you typed it (with or without capitals) while in the background the system actually treats upper and lower case characters as if they're the same, making it impossible to create both a New_File and a new_file. Not sure what your system does? Try to create new files with those names and see what happens. You'll either end up with two files or get a message back from the machine telling you you're asking it to do something it can't.

So what happens if someone gives you a drive containing both New_File and new_file (from a case sensitive system) and you try to copy those two files to a system that sees both of those filenames as the same?

Good question! I don't have a great answer. It seems to depend on which systems are involved and which tools you're using. I've seen:

One of the files doesn't appear in the destination system. Maybe it was just not copied (because the destination system sees only one file with that name) or maybe both files were copied but the second one to get copied overwrote the first, leaving only one file in the destination. You aren't shown an error and have to work out that there's a missing file yourself.
The copying tool reports an error. It tells you something like "can't copy new_file because the file already exists" (referring toNew_File, which was copied successfully moments earlier). You stare at the error dialog box wondering what happened to your day.
The copying tool automatically renames one of the files, adding something like (conflicted copy) into the file name, so you end up with New_file and new_file (conflicted copy.
The copying tool is itself case insensitive in how it reads and displays filenames so it doesn't even indicate to you that there are two files with nearly the same name for you to copy in the first place. Instead, it looks like there's only one file. You might not even know you're missing something until you run a check (with a different tool) to make sure you copied the expected number of files and discover that one is missing.

sometimes there's more than one way to encode a character

I recommend searching for "Unicode normalization" and weeping.

I ran into this when copying files that had umlauts (and other characters from outside the ASCII range) in their names from a Mac to a Linux system. I was using rsync and I noticed that each time I reran the command, it would delete the files with the not-entirely-ASCII filenames from the destination location and then recopy them. The problem, I learned, was that Macs and Linux systems make different encoding choices and these encodings aren't always translated across system boundaries. To the human eye, the umlaut on the Mac and the umlaut on Linux might look the same, but a system level they were treated as different characters, giving the files different names.

I was running rsync with the --delete option, which should result in the destination directory matching the source directory exactly. But because the systems used different encodings, rsync kept deleting files on the destination that it saw as extraneous (because they did not appear to match any names on the source) and then re-copying those same files to the destination (because it did not recognize that those files had already been present before the command deleted them).¹

What made this especially puzzling to me was that the names looked correct on both sides of the transfer and the rsync command always reported success. Up until that moment, I'd thought there was only one valid way to encode each character using Unicode. I had no idea there could be multiple valid ways to arrive at what looked like the same character to a human eye.

You might also run into a more general problem of non-recognized characters, where one system doesn't recognize some or all characters in a filename that was produced on a different system. On the broader issues surrounding filename encoding, I highly recommend reading Elvia Arroyo-Ramirez, Invisible Defaults and Perceived Limitations: Processing the Juan Gelman Files (2015) and Ashley Blewer, Artist_Exhibition-copy (FINAL)(2).mov: Preserving diacritics in filenames as significant properties in media conservation (2019).

sometimes files have attributes that you don't notice at first that turn out to matter

Permissions are the instance of this issue that I've seen the most. Different filesystems can have different permissions schemes. You might want to copy files using the -a (for archive) option of rsync because that's the option that's supposed to copy "everything" without you needing to think about the details. But then it turns out that the permissions to be copied don't actually exist on the destination system because the source filesystem (maybe a USB drive, formatted as NTFS or exFAT) and the destination (maybe a Linux server) don't use the same permissions. So either your copying attempt fails or you get a bunch of error messages back saying that everything was copied except for the permissions.

I've generally gotten around this by using the rsync options of -r (for recursive) to copy directories and --times to copy the file's timestamps, the main attribute I've always tried to preserve. I've rarely been in a position where I needed to preserve a file or directory's original permissions. I'd probably end up making a disk image if I needed to do that. Or I'd make a log file and record the original permissions there.

Beyond permissions, different filesystems may have other ("extended") attributes worth considering when making copies. I haven't spent a lot of time with these, but it can be worth getting familiar with the supported attributes of common filesystems. I've heard stories about important information being stored in the tags that you can associate with files in OSX systems. Those tags might be a feature that's unique to OSX, and I think they're in the attributes. But I could be wrong.

Macs also create things called resource forks, which often appear on non-Mac systems as files that start with ._, but I'm not going to go into detail about those in this post. Partly because I never did have to research them myself. Resource forks are a bigger issue for older Mac filesystems as they often contained essential information for reading a file, and failing to copy a resource fork could result in the corresponding file being unreadable. For newer Mac systems, it often doesn't matter if you copy the resource fork. It might have some information (like last downloaded date), but it's generally not information critical to being able to open the file later. If you're on a newer Mac filesystem and you look at resource forks in a hex editor, you might see a message like This resource fork intentionally left blank. I'm not sure what that really means but have taken it as a sign that I can stop thinking about that file.

sometimes you're getting files from a "cloud" service and the download method you choose affects the filenames

All the consumer-oriented cloud services I've used (products with "box" or "drive" in the names) have provided an interface that looks like a traditional folder-file filesystem. But are the names you see in those interfaces the actual names of the files? Are they even storing your files as "files"? Who knows? Google Drive will let you name two "files" in the same "folder" with exactly the same "name" so it's clearly not enforcing the rules that you would expect out of an ordinary file system.

What I have found when I was trying to come up with a standard workflow for downloading files from cloud providers was that the filenames you ended up with could vary depending on the method you used to download.

It's been a couple years since I looked closely but the big difference I remember had to do with spaces and a few other "special" characters in filenames. Depending on if you downloaded the file directly (i.e. by itself, via a browser) or downloaded the whole folder that contained it (usually as a .zip), you'd get:

the filename, but with spaces and "special" characters replaced by underscores
the filename exactly as it appeared in the cloud interface²

And if you downloaded an entire "folder" where two "files" have the same "name" you'd either see an error or see that one of the files got automatically renamed for download, as a "traditional" filesystem will require names to be unique within a folder.

Bonus annoyance: when using a browser's "save page as" option to save a webpage as a file, you might find that the browser will try to name the resulting file with the webpage "title" (i.e. the value in the webpage's HTML tag, which may or may not be the human-readable title of the document). If that title uses a character that isn't valid on Windows (like a |) but is valid on the system you used to save the page, you might end up with that character in the filename. Then, later on, you might try to copy that file to a Windows system and then run into one of the filename incompatibility issues described above. (Not that this has ever happened to me!)

sometimes files turn out not to be files

I was tasked with copying half a million files from a hard drive once. They were all source code and it turned out that they had been kept in a system that used symbolic links to relate different files and directories to each other. This made "copying" the "files" much more complicated than I had expected. I ended up making a disk image because I couldn't get the "ordinary" copying process to work well enough to be sure I hadn't lost data. It is often possible to copy symbolic links, but there was something weird about these files that prevented me from getting an error-free copy. I think the problem may not have been the symbolic links themselves, but some incompatibility whose origins were in the 1980s (these were files from an old system) but I never worked that out.

The disk image I made just pushed the problem of copying the files down the road for when someone had the chance to extract them. But we needed to return the hard drive to the donor and I couldn't keep sinking time into figuring out what was going on. I remember explaining the situation to my fellow archivists when I left that job. I hope I was suitably apologetic for leaving them with that mess, but at least we had all the bytes, right?

Apparently, newer versions of rsync can handle this problem but the Mac was running an older version that couldn't translate the encodings. Did I ever want to know this much about rsync and its versions? I did not. I just wanted to copy some files. ↩
Except possibly on Windows, where an unsupported character might get converted to an underscore no matter what. Again, I can't quite remember the details, just that it's worth looking at multiple download methods and watching what happens. ↩

void where uninhabited

2021-10-17T00:00:00-07:00

I associate receiving texts at unusual hours of the day with emergencies, so I felt a bit of anxiety when I heard my phone go off a few days ago right around the time I woke up. It turned out to be my cousin reminding me that William Shatner was going to space. I did not rearrange my schedule to watch it live.

I did read about it afterwards and Shatner's reaction, as reported here, was more interesting than I'd expected:

But Shatner’s comments on return stood this all on its head. His reactions weren’t about first steps on the precipice, with him as some network TV spaceman Moses walking to the edge of the promised land others will explore. He spoke of space as death, darkness, ugliness. He describes having the blue blanket of the Earth’s atmosphere suddenly ripped away moments after takeoff. Suddenly you’re in darkness. “Is it death?” he asks Bezos. He described this as akin to having a blanket suddenly ripped off you before you’re ready to get out of bed.

I had a roommate one summer between college and grad school who watched a Star Trek episode every night. Not the original series or the next generation but one of the later ones, probably Voyager. One night there was an episode where nothing happened and the lack of happenings became the central tension in the story. With no cultures to contact, no elements to battle, no unusual phenomena to study, nothing but the vast emptiness of space all around them, the crew struggled to maintain their sanity. But they survived their challenge as they always did, episode after episode.

The go-to metaphor for people who can't help but relate space exploration to early American colonial expeditions is Lewis and Clark: they were, it's said, "the first astronauts." But they had guides, they crossed lands where people had been living for thousands of years, and they weren't even the first English-speaking people to see the Pacific coast or to cross North America. The regular contact in the fictional world of Star Trek episodes more closely resembles their journey than actual trips into orbit.

When I watched that episode about the nothingness of space I didn't think of overland journeys but instead about the memoirs and journals I've read about ocean voyages by sail, how they often talked about long stretches at sea where very little happened, days of nothing but water, weather, and sky. And it occurred to me that that episode, which presents boredom as unusual, might turn out to be the most accurate depiction ever made of what the day to day experience of space exploration is really going to be like.

so, you want to "archive" your facebook account

2021-10-08T00:00:00-07:00

Since my previous post about archiving and deleting my Facebook account is a narrative of my experience and offers no real advice, I thought I'd post a follow up about how I'd approach doing the same thing today. I should be clear that because I don't have a Facebook account, I don't know exactly what steps I'd follow. But I think that's ok, as most of what I say below isn't technical.

Things to think about before you save anything

1. Think about what you actually want to save, and what format you want it to be in.

Do you want all of your posts? Your photos? Comments on your posts? The URLs of all the links you shared? All of the above? How much context do you need? Do you need to display your archive in a way that looks like a real Facebook page or could it look more like a simpler web page or even a text file?

Knowing where you want to end up before you start the process will help you decide what effort you want to make and what tools you might need to use. I've been using twitter's "archive" export for years and it's basically fine for my purposes. I get all of my tweets and my photos and that's all I want. Sure, I could take things to another level and gather up all of the replies to my tweets so that I could reconstruct the threads, but I'm not interested in doing that. So I just run the twitter archive export every few months, save the zip file twitter gives me, and that's it.

If the official Facebook export has all the data you want, even if it's not ideal from an archivist's perspective, that's completely fine. Using the official export will save you a lot of time and trouble. But if the official export is missing some data you want, you'll either need to look for a new strategy or revise your expectations so that the official download meets them. In my case, I decided it was worth it to take the time to run a web archiving tool so that I could save comment threads and shared links, two types of data left out of the official export.

2. Think about how you're going to manage what you download and how (or if) you plan to preserve it.

Depending on how much data is in your account and what approach you take to saving it, you could find yourself with a few (dozen) gigabytes of files. You may also need special tools to access them, particularly if you choose to use a web archiving approach. Generally speaking, the official account export will likely take up much less space than the files you would get from using a web archive tool.

Some of your data is likely to be sensitive, if only because it's personal. I'd seriously consider turning disk encryption on if you have that option on whatever computer system you use. I'd encrypt your backups (which you should be keeping!) too. At the very least, I would treat your personal archives as carefully as I'd treat whatever other personal data you may have on your computer.

3. Think carefully about whether to capture any data that you didn't create yourself, such as comments posted by other people. As many privacy experts say, the best way to protect other people's data is to not collect it at all.

As I mentioned above, I wanted to keep both my posts and the comments on anything that I posted. That meant saving comments that other people posted. I felt comfortable doing that, though as I write this blog post I now feel less comfortable having done that, because of the resemblance between saving those comments and saving incoming correspondence: I don't usually throw away letters and emails I get in response to things I write simply because other people wrote them. So it seemed fair to save things written directly in reply to me.

But I drew the line at saving things other people posted on their own timelines. I wanted to preserve "my" archives, not surveil the whole set of timelines available to me as a logged in user. I did consider saving the comments that I wrote in response to other people's posts but felt like I couldn't disentangle them cleanly enough from the rest of the system to make that work. At least not without putting in significantly more effort than I was willing to put in. So I left those comments of mine behind.

In any case, my Facebook use was fairly innocuous by most standards, political and otherwise, and I don't think there was much in there to implicate others in anything other than having been on my "friends" list. If you've used the system to engage in activist work, for example, you're probably facing a whole set of privacy and security issues I never had to face. It might be best under the circumstances to carefully review your posts and be selective about what you keep for yourself.¹

Some ways to approach the archiving process

If you've gotten this far, you've identified what you want to save and what you're ok with leaving behind, and you have some idea of how you're going to preserve whatever it is you capture from your account. Great! So how would you actually go about getting your data?

As I said above, I'd start with the official export. Since I don't have a Facebook account anymore, I don't know what the export looks like these days. Maybe it's fine for your purposes and if so, you might be able to stop there. As long as my account was active, I'd run an export a few times a year and save that. I'd recommend doing that if you're not planning to delete your account, just so you don't leave yourself with a big gap to cover in case you do decide to delete your account later and want to make sure you've archived it first.

But as I wrote earlier, over time the official export stopped being sufficient for me because it didn't include comments on my posts or links that I shared. That's what drove me to look into using dedicated web archiving tools. Since I haven't used these tools myself in a couple of years, I'm not going to go through detailed steps because I don't know exactly what those would be. Instead, I'll lay out the strategy I followed when I crawled my own account. Some of it probably still applies.

My goals were this: 1) get every post from my account, including posts where I shared links to other web pages 2) get the comments on those posts. I followed a two-step process: I got a list of every one of my posts, then visited each of those posts using a web archiving tool to capture the data.

1. Figure out a way to get a list of all of your posts.

For the entire time I used Facebook, the service lacked any dashboard or management interface that provided a simple list, with links, to all of my posts. Instead, I remember being subject to the lazy loading of the infinite scroll even for my own profile page. The result was that I could not be sure I found every one of my posts simply by scrolling through my timeline. Some posts were always skipped when I tried this. I don't know if that's still the case. Maybe it's easier to see all your posts in a list these days.

What I ended up doing was to use the official export of my account to compile the list I needed. The export didn't have everything I wanted but it did have all of my posts. I then pulled out the unique identifiers for each of those posts and turned that list into a list of the direct links to each post on the open web. At the time, the identifier was a part of every direct link URL. It's possible that it's easier to get these links now.

2. Visit every post using your chosen web archiving tool

Equipped with the list of URLs for my posts, I launched the web archiving tool I used, an early version of what has since become the suite of webrecorder tools, and visited each of my posts one by one. After loading each post, I scrolled down to capture the comments, clicking on the prompts to load additional comments when necessary. It took a full weekend for me to cover everything I'd posted over the previous two or three years. It might take longer if you have a particularly long-running or prolific account.

This process got tedious pretty quickly, and was made more challenging by the fact that Facebook kept logging me off and forcing me to log in again. I guess systematically visiting every one of my posts isn't the kind of engagement they favored. I would not underestimate the time it will take to capture your posts like this, and I'd be prepared for the possibility that you might get locked out for a while.

This brings me to an important point: if you're going to capture your account via web archiving, you should be thinking about the security of your login information. I took what was at the time the more complicated route of installing the webrecorder tools on my own computer so that nothing I did went through a third-party hosted service (except for Facebook itself). The web archive tools are a bit different today² but the principle that you should think about the security implications of capturing your account still applies.

I changed my Facebook account password before capturing anything. When I was done, I changed my password again. There is a possibility that the web archiving process will capture your credentials while you save your posts, so I would recommend not only changing your password before and after capture, but also using a password manager to generate your new passwords. Unless every one of your Facebook posts is public, you're going to have to log in before you can see and capture your own posts. As a consequence, you're going to have to do something about your password.

That's pretty much it for my advice. The short summary is: think about what you want to get from your account, what you'll do with that data, and how you'll get it, then go out and (try to) get it. And think about the privacy and security of your account data and others' throughout the whole process.

Then, once you've archived your account, consider deleting everything you posted there, and then consider deleting the account itself.

It's not clear how long Facebook keeps your data after you delete it. So even if you don't save your own copy, it might sit around on their servers for quite a while. ↩
It looks to me from a brief read of the webrecorder tools page, that the capture and replay tools have gotten easier to install and use in the past few years. I would try out "archiveweb.page" to capture a few pages, then download that data for playback using "replayweb.page", just to see how well those tools work. ↩

delete my account

2021-10-05T00:00:00-07:00

I was thinking about deleting my Facebook account today, something I did four years ago. At that point, I hadn't used it for almost a year, except once: to let a friend know I wasn't using it anymore. I'd already deleted everything I'd ever posted. I'd been deleting my posts regularly for years.

Whenever a social site or cloud service goes down, people will tell you: back up your stuff! Or export it. Or use the "Save Page Now" feature of the Internet Archive. Or click on links hundreds of times, saving every one using your browser's save function. Or print it all to PDF, or to paper. Or run wget with the --warc option, or use the webrecorder suite of tools to create web archive files that you can then play back on a web archive server you maintain yourself. Or write your own personal client to interface with social media services via their APIs and then save all of your outgoing messages before you send them, like a pre-computer business agent keeping carbons. Or recognize that all is flux and leave your data to the whims of time and third-party data retention policies. Posting on social media can feel like posting on the boarded up exterior of a condemned building without knowing when it's going to implode.

I was taking classes in archival studies when I first started to use Facebook regularly, around 2010. I made a lazy attempt to apply records management principles to my posts: every few months I'd use the Facebook-provided "archive" option to export my data, then about a week later I'd delete everything that was part of that export. That way I could avoid leaving everything up "forever" and I could avoid downloading the same data twice.

Facebook's idea of an archive wasn't complete in the sense of having literally all of my data, but the first few iterations were good enough. I wanted my own posts, the comments on those posts, the URLs of the links I shared, and copies of the photos I posted. I wanted the photos less for the photos themselves¹ than for their context within my feed: when I posted them, how I described them. That's what more or less came with the first exports.

This cycle of post, share, export, delete worked well enough until I noticed that the Facebook archive had become a lot less useful. At some point they stopped including comments from other people on my posts, and stopped including the URLs of some of the links I shared. In my more charitable moments, I've wondered if this was a side effect of Facebook becoming either more careful or more risk averse with user data. The official Facebook archive is a download of "your data"; by definition, comments left by other people are not "your" data. But I didn't come up with a plausible explanation for the missing shared links. Maybe if Facebook was a more responsive organization I'd have filed a bug report. (They may have eventually fixed the issue.)

The decline of the Facebook archive put me in a bind: I could either live with preserving a less useful set of exported data or I could come up with a more complex preservation strategy. For the next few years I did neither. I grumbled about the changes, left all my Facebook activity online, and told myself I'd come up with something eventually. And eventually I did: the strategy was to crawl my account manually, once, and then delete everything, including the account itself.

Despite everything I hated about the platform, I actually liked using Facebook for most of the period I was active. It wasn't blogging, the quasi-social media format I'd started out with when I moved beyond passively reading news and journal articles online, but almost all the bloggers I knew had given up by then for a combination of short posting platforms and published work. That's part of how I ended up on Facebook. But the algorithmic feed eventually became too much. I was one of the last chronological hold outs among my friends.

I've tried a few times to find, in my Facebook archive, a discussion I remember having about the potential of the non-chronological feed to shape what we see online in ways that may involve more conscious intent than we usually attribute to an algorithm. But I haven't been able to find it. I may have left it in the comments on someone else's post. My concern, which was not an original one, was that once you remove a relatively transparent organizing principle, like reverse chronology, you open the door to everything from randomness to deliberate manipulation. If you once saw all the links your friends posted in order, and those links were your main source of information about the world, what would you do if your feed started showing only a subset of those links? What if that subset was highly skewed? Again, I am not claiming any great insight here, or even to have seen signs that my feed was being manipulated at the time, though I remember a few of us speculating about whether Facebook would downrank links to articles that were critical of Facebook. These are scattered conversations I remember having around the time the algorithmic feed came about, years before the Brexit vote and the 2016 U.S. elections.

In the end, what pushed me off of Facebook was the combined effect of learning more about the company's political activities and the way the feed intersected with my life more personally. Like a lot of people my age, especially people who grew up without using the social internet much when they were younger, I've had the repeated experience of losing touch with people. The summer ends, you graduate, you move, you get a new job, you say you'll write, and then no one actually does. Sometimes losing touch felt like a big deal, often it was just part of life.

Being on Facebook changed all that because it was so easy to keep up with people without having to manage a contact list or remember to schedule a time to meet. Asynchronicity meant you didn't even have to post often or check in often, you just had to do a little clicking and scrolling. Do Facebook posts skew towards what people are willing to reveal about themselves in semi-public? Yes, of course. But it's not like people aren't managing their self-presentation in other contexts. To me the difference was less between having a phone call with a close friend and seeing a Facebook post and more between seeing a Facebook post and running into someone twenty years later.

But over time I just couldn't deal with the feed anymore: the way it skewed towards "engagement"; the way it skewed towards the more frequent posters irrespective of how well I knew them; the way I found myself spending more and more time tracking down the posts from closer friends who, I often learned, had been posting the whole time but without all of their posts surfacing in my feed; the regular changes to the interface that never seemed to result in anything that could improve my own experience of the feed; the sheer amount of links people would post to thinly sourced and often historically inaccurate sites that I could never figure out how to block completely ("show me less" isn't "never show again"); and the seemingly random approach to people sharing personal tragedies. I would see in my feed people I didn't know offer condolences to people I'd never met, but I never saw a friend post about a death in his family, something I learned about later in a different forum, then found on Facebook by digging back through weeks. When a different friend ended up in the hospital with a serious condition, my feed didn't show all of his updates regularly, and when I checked his wall for new posts, Facebook kept helpfully hiding ones I'd seen before, making it difficult to be sure of what I had and hadn't seen. "Wouldn't you like to see this post from four months ago that did some good numbers?" No, Facebook, I want to see if my friend has regained consciousness.

I'd started using Facebook in the way Facebook marketed itself: to connect with friends. Writing that now, I'm still surprised at how well that went for a while. By the time I quit, using Facebook felt more like a job where Facebook was the intranet and you were always fighting it in some way to get to whatever it was you were trying to do. I knew there would be a cost to leaving, a return to mostly losing touch with a lot of people. But I'd come to resent the constant and overt mediation, the skew of the feed, and the feeling of being trapped by the poor quality "archive" that had stopped me from my old archive and delete routine. So at the end of 2016, I came up with a plan.

In my work as an archivist, I'd dabbled in web archiving and had become familiar with the service known then as webrecorder (now Conifer²). Unlike most other crawling tools, the tools behind webrecorder made it possible to capture dynamic interactions like scrolling through a Facebook feed while repeatedly clicking on the 'see more' links required to expand the page or to show additional comments. I'd seen demos of people using it to capture social media in institutional archives contexts. I figured I could do that too, just for myself.

This is not meant to be a technical blog post, so I'm going to leave out most of the details. But here is what I did. I installed the webrecorder tools on my own computer rather than use the third-party web service. I knew I'd need to log in to Facebook and I didn't want to expose my credentials to anyone else. Then I downloaded one last Facebook "archive" to serve as a complete list of all of my posts. And then I systematically opened up every post in the webrecorder-ed browser and clicked to expand every comment. I'd built up 2-3 years of "unarchived" posts by that point; it took a whole weekend to get through everything. Towards the end, I was getting prompted repeatedly to log back in and fill out captchas along the way, slowing things down even more. But I got through it.

Once I was satisfied that I'd gotten all of my posts, I launched a web archives playback application just to make sure I'd be able to read them again. The tools were remarkably effective. Almost five years later, I can still replay the pages, complete with vintage 2016 display ads. Archives in hand, I was ready to delete.

Facebook at the time provided a central activity log, a way to track not just your posts but also your comments, likes, photos, friend requests. I went through my log and deleted everything: every comment, every post, every like. I left only the friendships. I could have skipped all of those steps and just deleted my account, trusting that Facebook would take care of the remaining deletions. But I didn't trust Facebook. I wanted every delete to be on record.

I let my account sit for a while, empty. I've quit enough things to which I soon returned that I didn't want to make a dramatic exit only to come right back. Instead I logged out for a week, two weeks, a month. Somewhere along the way I logged in to find a question from a friend about how I was doing. I appreciated the concern, explained how quitting was improving my life, left the post up for a couple of weeks, then came back and deleted that too. Towards the end of a year of not using Facebook, I got a message implying someone had tried to log into my account, and would I come back and change my password? I changed the password, took one more day to think about the account, then came back and deleted it.

I don't know what the account deletion process is like today but in 2017 it encapsulated everything I'd come to dislike about Facebook: the dark patterns, the blatant manipulation, the attempt to leverage personal relationships for corporate gain. I wish I'd been recording my deletion session because I don't remember all the details now. Towards the end of the form filling and the "are you sure?" questions, Facebook started showing me pictures of friends with messages like "Aren't you going to miss hearing from them? Are you sure you really want to leave?" And all I could think of were all the steps that had brought me to that point and how I'd made exactly the right decision.

Which I was already preserving alongside my other photos. Everything I posted to Facebook was a version (resized, cropped) of something I kept in its original format. ↩
For more information, see the homepage. I have not used the service since it became Conifer. ↩

long drives

2021-02-28T00:00:00-08:00

When I committed to writing one post per week this year even if that one post wasn't much, I meant it. But I didn't define the week. I started by thinking I'd put something out every Saturday morning, but that was unrealistic. Then I thought I could post every Saturday, until I actually spent time not on the computer for most of a Saturday. Last week, I didn't finish my post until Sunday because it takes time to look things up and cite them correctly.¹ And now it's Sunday night and I haven't posted anything yet.

I spent the night at home this weekend for the first time since the weekend after Thanksgiving. Santa Clara County has had a travel quarantine in place for three months now, which means that when I go home I need to stay in my apartment for 10 days before I can do anything else. The rule only applies if you spend the night, so for months I've been going home to check on my apartment and get my mail, and then returning to my parents' home on the same day, a round trip of about 650 miles. Am I using a narrowly literal interpretation of the travel quarantine rule? Kind of! But my apartment is literally the only place in the county I've stopped on these trips.

This routine has gotten pretty tiring and recently I re-read the county's FAQ more closely and it seems that I'm ok staying in my apartment for less than 10 days if: 1) it's the only place I go within the county and 2) I leave the county entirely when I leave my apartment.² That's what I did this weekend. The benefit is that it's much easier to split up the drive over two days. The drawback is that driving did eat up much of both days, leaving me too tired to finish the blog post I started yesterday. Hence, no real post today.

Who knew? ↩
Arguably, it's not clear where my "home" is now. I had to decide whether to renew my lease during the early shelter-in-place period in April and I was in no position at the time to move all of my stuff out. Maybe it would have made sense to have done so, or to have broken the lease later, but I've always planned to go back home and it would be difficult to find somewhere better for a similar rent.

I looked at some other places pre-pandemic, since I'd just changed jobs and had a slightly different commute, and as far as I could tell even a marginally "better" apartment would likely mean a 25-40% rent increase (because my current place is at the low end of the market), giving up the protection from large rent hikes that has kept my current place affordable (I've been there five years), and giving up a landlord who's so far proven not to be someone who tries to extract the maximum rent out of every tenant (my rent has increased a few times, but never to the maximum amount allowable). So I feel like if I gave up my place, I might never return to the Bay Area.

As it is, my parents recently got their second Covid-19 vaccine doses, and my dad seems to be at a point where his treatments are a routine and manageable once-per-month, so it looks like I'll be able to go home soon. And then sit there for 10 days or have to leave again. ↩

what's in a format

2021-02-21T00:00:00-08:00

Usually, when people highlight things they've come across in archives, they focus on content. Here I'm going to highlight a format.

In my first job as an archivist, I asked to get hands-on experience working with digital materials, but the collection I was processing didn't have any.¹ So I found myself assisting on the processing of a massive collection that had a small amount of computer media. Most of that media consisted of floppy disks or CDs, and my task was to image them and determine the contents. There was already a well-established workflow set up for this and the process was uneventful, though I seem to remember there was a disk that couldn't be read. None of the content was interesting or unique, and pretty much all of it was also represented in paper.

Except there was one other (potentially) digital object that was a mystery: a strip or two (I can't remember) of magnetic tape with an IBM logo on it. The tape was unusual not just in format but in temporal origin. The collection consisted of the papers of a major political figure whose active career was essentially over by the 1990s. Every position he held was one where he either worked in a paper world (he was born over a decade before the first computers were built) or where computers were present but other people likely would have been the ones assigned to use them. All of the computer media in the collection, except the magnetic tape, was from after the 1980s. And pretty much all of it had accompanying material indicating that it was created by someone other than the politician: speeches he made that someone else transcribed, photos taken for digital portraits, a CD of documents someone else compiled.

The magnetic tape was from the 1970s, when the politician was still active in government. Contextual information suggested it contained notes from an interview with a journalist, most likely created by the journalist rather than by the politician. I can't remember if the interview had been published; I can remember that there was some hope that whatever was on the tape would be interesting, unlike the rest of the computer media.

Since we didn't have a tape reader, or even know what kind of tape we had, we needed to identity it first and then find an appropriate vendor. A few people had already looked at it and the best guess so far was that it was tape for an IBM mainframe. IBM has put a lot of their documentation online, so I looked up the specifications, measured the tape, and concluded it had to be something else. The width didn't match the tape types for any of the mainframe models I looked at, plus it wasn't clear how the tape strips would have been loaded into what looked like reel-based systems.

That got me thinking: why would a journalist in the 1970s and a politician without a close connection to the computer industry be working with tape for a mainframe? I'm sure I could imagine some espionage scenarios, but if the context was correct and the tape was related to an interview, wasn't it more likely that a different technology was involved? What equipment would have been in reach at the time that would have both worked on the scale of a personal meeting and stored its data in an IBM tape format?

So I did what anyone trained in historical methodologies would do: I watched "The Paperwork Explosion", a promotional film Jim Henson produced for IBM in 1967:

The star of that movie, besides the man on the farm who now spends most of his time thinking, is the IBM MT/ST, a form of typewriter that used magnetic tape. A check of the dimensions showed it was not the same magnetic tape format we had in the collection, but I felt like I was on the right track. Typewriters were at least personal office equipment in a way that mainframes were not. I would expect both journalists and politicians to have had regular access to them. That sent me down a path of looking at IBM office product catalogs and checking the tape-based equipment. None of the typewriters panned out.

There was one other technology in those catalogs that fit the social context: the audio recorder. It seemed so obvious in retrospect: is there anything more ordinary today than recording an interview? A few product name searches later and I was on someone's vintage technology website looking at the IBM 224 Dictating Unit. The tape pictured there is virtually identical to the tape in the collection.

I will admit not being entirely satisfied with an unofficial source, though I had no reason to doubt that page, so I looked for more official confirmation. I don't think I found a specification page for the IBM 224², but I did find some advertisements in Duke's digitized advertisements collection. Would it count as official confirmation if the right kind of tape was visible in any of the ads? I decided that would be good enough for me. I set about watching them.

It should come as no surprise that the ads hit the same themes ("time", "think") as the Paperwork Explosion. The IBM 224 will save you, a working professional, from having to interrupt your day to write out memos and notes. Or if you're doing something where you can't stop and write, it will save you from having to rely on your memory until you can get back to your desk. Though the 224 weighed over a pound and was not in any real sense a "wearable", it's striking how much overlap there is between the scenarios IBM imagined for their market and contemporary pitches for wearable technologies.

These ads also capture and represent the gendered nature of the workplaces IBM saw as their market. Most of the usage scenarios they present start with "this man": "this man is working on a game plan"; "this man is going to court"; "this man used to be handcuffed by paperwork." Even one of the ads that includes a scenario where a woman uses the 224 to record her own words ("this woman used to spend hours taking stock of things") still ends on the tag line: "small and compact, this new dictating unit fits a man whose job is bigger than his office."

So what happens to your thoughts after you've recorded them? Don't they still have to get to text? The assumption, of course, is that you, the busy professional, have a secretary who will do that. This is left unsaid in most of the ads, but one makes implicit reference to this work. This is also the moment where the tape itself appears on screen.

At the 35 second mark below, you can see what I assume to be an executive place a strip of tape on his secretary's desk while the narrator intones: "While you're using it [the IBM 224] to clear your desk of letters, memos, reports; or to put thoughts and ideas in order, your secretary handles other work for you. She's free to do a better job, and so are you."

This is also the ad that reaches the furthest beyond the office, promising that the new dictating unit will help restore your work-life balance, symbolized by a man flying a kite with his son, as the man himself once did when he was a child. Who knew a tape format could do all that?

At this point you might be wondering, what was on that politician's tape? After finding these ads, I felt like I'd done all the format research I could given the state of my expertise, and it seemed like I'd gotten close enough to refer it to an audio specialist. So I sent some of my links to the archivists in charge of the whole collection, who would be the ones to contact a vendor.

I changed jobs not too long after that, and maybe a year or so later I ran into one of my former colleagues at a conference and asked them if they knew what happened with the tape. It was sent out to a specialist who apparently recognized it without having to go through the IBM back catalog. I'm not sure if it was really tape for the IBM 224 or for some other model, but it was audio and they tried to read it. They weren't able to recover the content.

There were signs of digital format records in the collection I was processing in the form of data analysis and help screen printouts from the 1980s, but either the donor didn't keep any of the computer media or it didn't make it through appraisal. The collection did have a huge amount of AV. ↩
Did I mention that I did all of this over a period of a few hours sometime in 2013 or 2014? I have not re-done the research to look for new sources. ↩