[supplied title]

so, you want to "archive" your facebook account

Since my previous post about archiving and deleting my Facebook account is a narrative of my experience and offers no real advice, I thought I'd post a follow up about how I'd approach doing the same thing today. I should be clear that because I don't have a Facebook account, I don't know exactly what steps I'd follow. But I think that's ok, as most of what I say below isn't technical.

Things to think about before you save anything

1. Think about what you actually want to save, and what format you want it to be in.

Do you want all of your posts? Your photos? Comments on your posts? The URLs of all the links you shared? All of the above? How much context do you need? Do you need to display your archive in a way that looks like a real Facebook page or could it look more like a simpler web page or even a text file?

Knowing where you want to end up before you start the process will help you decide what effort you want to make and what tools you might need to use. I've been using twitter's "archive" export for years and it's basically fine for my purposes. I get all of my tweets and my photos and that's all I want. Sure, I could take things to another level and gather up all of the replies to my tweets so that I could reconstruct the threads, but I'm not interested in doing that. So I just run the twitter archive export every few months, save the zip file twitter gives me, and that's it.

If the official Facebook export has all the data you want, even if it's not ideal from an archivist's perspective, that's completely fine. Using the official export will save you a lot of time and trouble. But if the official export is missing some data you want, you'll either need to look for a new strategy or revise your expectations so that the official download meets them. In my case, I decided it was worth it to take the time to run a web archiving tool so that I could save comment threads and shared links, two types of data left out of the official export.

2. Think about how you're going to manage what you download and how (or if) you plan to preserve it.

Depending on how much data is in your account and what approach you take to saving it, you could find yourself with a few (dozen) gigabytes of files. You may also need special tools to access them, particularly if you choose to use a web archiving approach. Generally speaking, the official account export will likely take up much less space than the files you would get from using a web archive tool.

Some of your data is likely to be sensitive, if only because it's personal. I'd seriously consider turning disk encryption on if you have that option on whatever computer system you use. I'd encrypt your backups (which you should be keeping!) too. At the very least, I would treat your personal archives as carefully as I'd treat whatever other personal data you may have on your computer.

3. Think carefully about whether to capture any data that you didn't create yourself, such as comments posted by other people. As many privacy experts say, the best way to protect other people's data is to not collect it at all.

As I mentioned above, I wanted to keep both my posts and the comments on anything that I posted. That meant saving comments that other people posted. I felt comfortable doing that, though as I write this blog post I now feel less comfortable having done that, because of the resemblance between saving those comments and saving incoming correspondence: I don't usually throw away letters and emails I get in response to things I write simply because other people wrote them. So it seemed fair to save things written directly in reply to me.

But I drew the line at saving things other people posted on their own timelines. I wanted to preserve "my" archives, not surveil the whole set of timelines available to me as a logged in user. I did consider saving the comments that I wrote in response to other people's posts but felt like I couldn't disentangle them cleanly enough from the rest of the system to make that work. At least not without putting in significantly more effort than I was willing to put in. So I left those comments of mine behind.

In any case, my Facebook use was fairly innocuous by most standards, political and otherwise, and I don't think there was much in there to implicate others in anything other than having been on my "friends" list. If you've used the system to engage in activist work, for example, you're probably facing a whole set of privacy and security issues I never had to face. It might be best under the circumstances to carefully review your posts and be selective about what you keep for yourself.1

Some ways to approach the archiving process

If you've gotten this far, you've identified what you want to save and what you're ok with leaving behind, and you have some idea of how you're going to preserve whatever it is you capture from your account. Great! So how would you actually go about getting your data?

As I said above, I'd start with the official export. Since I don't have a Facebook account anymore, I don't know what the export looks like these days. Maybe it's fine for your purposes and if so, you might be able to stop there. As long as my account was active, I'd run an export a few times a year and save that. I'd recommend doing that if you're not planning to delete your account, just so you don't leave yourself with a big gap to cover in case you do decide to delete your account later and want to make sure you've archived it first.

But as I wrote earlier, over time the official export stopped being sufficient for me because it didn't include comments on my posts or links that I shared. That's what drove me to look into using dedicated web archiving tools. Since I haven't used these tools myself in a couple of years, I'm not going to go through detailed steps because I don't know exactly what those would be. Instead, I'll lay out the strategy I followed when I crawled my own account. Some of it probably still applies.

My goals were this: 1) get every post from my account, including posts where I shared links to other web pages 2) get the comments on those posts. I followed a two-step process: I got a list of every one of my posts, then visited each of those posts using a web archiving tool to capture the data.

1. Figure out a way to get a list of all of your posts.

For the entire time I used Facebook, the service lacked any dashboard or management interface that provided a simple list, with links, to all of my posts. Instead, I remember being subject to the lazy loading of the infinite scroll even for my own profile page. The result was that I could not be sure I found every one of my posts simply by scrolling through my timeline. Some posts were always skipped when I tried this. I don't know if that's still the case. Maybe it's easier to see all your posts in a list these days.

What I ended up doing was to use the official export of my account to compile the list I needed. The export didn't have everything I wanted but it did have all of my posts. I then pulled out the unique identifiers for each of those posts and turned that list into a list of the direct links to each post on the open web. At the time, the identifier was a part of every direct link URL. It's possible that it's easier to get these links now.

2. Visit every post using your chosen web archiving tool

Equipped with the list of URLs for my posts, I launched the web archiving tool I used, an early version of what has since become the suite of webrecorder tools, and visited each of my posts one by one. After loading each post, I scrolled down to capture the comments, clicking on the prompts to load additional comments when necessary. It took a full weekend for me to cover everything I'd posted over the previous two or three years. It might take longer if you have a particularly long-running or prolific account.

This process got tedious pretty quickly, and was made more challenging by the fact that Facebook kept logging me off and forcing me to log in again. I guess systematically visiting every one of my posts isn't the kind of engagement they favored. I would not underestimate the time it will take to capture your posts like this, and I'd be prepared for the possibility that you might get locked out for a while.

This brings me to an important point: if you're going to capture your account via web archiving, you should be thinking about the security of your login information. I took what was at the time the more complicated route of installing the webrecorder tools on my own computer so that nothing I did went through a third-party hosted service (except for Facebook itself). The web archive tools are a bit different today2 but the principle that you should think about the security implications of capturing your account still applies.

I changed my Facebook account password before capturing anything. When I was done, I changed my password again. There is a possibility that the web archiving process will capture your credentials while you save your posts, so I would recommend not only changing your password before and after capture, but also using a password manager to generate your new passwords. Unless every one of your Facebook posts is public, you're going to have to log in before you can see and capture your own posts. As a consequence, you're going to have to do something about your password.

That's pretty much it for my advice. The short summary is: think about what you want to get from your account, what you'll do with that data, and how you'll get it, then go out and (try to) get it. And think about the privacy and security of your account data and others' throughout the whole process.

Then, once you've archived your account, consider deleting everything you posted there, and then consider deleting the account itself.

  1. It's not clear how long Facebook keeps your data after you delete it. So even if you don't save your own copy, it might sit around on their servers for quite a while. 

  2. It looks to me from a brief read of the webrecorder tools page, that the capture and replay tools have gotten easier to install and use in the past few years. I would try out "archiveweb.page" to capture a few pages, then download that data for playback using "replayweb.page", just to see how well those tools work.