How to Archive a Website Programmatically with SavePageNow API
There’s a quiet kind of panic that sets in when a website starts changing - fast. You see a blog post disappear. A page title shift. An image gets replaced. A URL starts redirecting somewhere else entirely. And you think: I should’ve saved it.
That’s where archive.org’s SavePageNow API comes in.
It’s the digital equivalent of slamming a snapshot into a time capsule. One URL, one request, and you’ve got a frozen copy - saved into the Wayback Machine for good (or at least as long as they’re around). Best part? You don’t have to be at your keyboard when it happens.
This guide walks through how to use the SavePageNow API to programmatically archive webpages - automatically, reliably, and respectfully. Whether you’re preserving evidence, backing up client pages, or archiving live changes in real-time, this gives you control over what gets saved, and when.
And if you've ever used our own Smartial Wayback Machine Sniffer tool, you already know how valuable archived files and versions can be - especially when the originals go missing.
Why Archive Actively?
Most people assume archive.org just saves everything. It doesn’t. While automated crawlers work their way through the web regularly, the frequency is unpredictable. A popular news page might get captured ten times a day, but a small changelog buried on a subdomain? Maybe once a year - if ever.
Programmatic archiving lets you decide when to save, and more importantly, what to save. You’re not waiting for archive.org’s crawlers. You’re sending a clear, explicit instruction:
“This page matters. Save it. Now.”
It’s especially useful for:
OSINT professionals preserving fast-changing content (like social media bios or staff directories)
Web developers creating audit logs of deployments or changes
Researchers capturing controversial material before takedown
Anyone backing up dynamic content from sites they don’t control
And yes, it’s a great complement to Google Dorking, when you stumble across something that probably won’t stay public for long.
What Is SavePageNow API?
The SavePageNow API is a public endpoint provided by archive.org that lets you send a URL to be archived - on demand. It behaves much like the “Save Page Now” button on their website, but via code.
The basic structure looks like this:
Send a request to that endpoint with your target URL, and the Wayback Machine will begin archiving it. In most cases, it’ll return a Content-Location
header that points to the archived snapshot.
You can call it from the command line, from a PHP or Python script, or even trigger it from a form submission or webhook.
Example: Simple Save Request Using PHP
Here’s how to send a save request using PHP’s file_get_contents()
and stream_context_create
:
<?php $url = "https://example.com/interesting-page";
$encoded = urlencode($url);
$archiveUrl = "https://web.archive.org/save/$encoded";
$options = [ "http" => [ "method" => "GET", "header" => "User-Agent: SmartialBot/1.0\r\n" ] ];
$context = stream_context_create($options);
$response = file_get_contents($archiveUrl, false, $context);
// Optional: Parse headers for the archived URL
foreach ($http_response_header as $header)
{ if (stripos($header, "Content-Location:") !== false)
{ $savedPath = trim(substr($header, strlen("Content-Location:")));
echo "Saved to: https://web.archive.org$savedPath\n";
} } ?>
This script sends a save request to archive.org and echoes the URL of the newly archived version - usually available within seconds or minutes.
You can automate this with a cron job, webhook trigger, or admin panel depending on your workflow.
Respectful Automation Matters
Archive.org is free. Open, and available to all. But it’s not infinite. They’re hosting petabytes of data for everyone, and they ask users to archive responsibly.
Here are a few ground rules:
Don’t hammer the endpoint. One request per page, per few minutes is fine. Hundreds per second is not.
Avoid archiving login pages or gated content. They won’t render anyway.
Stick to meaningful, public URLs. Pages with dynamic parameters or user-specific content rarely archive cleanly.
The point of SavePageNow isn’t mass scraping. It’s targeted preservation.
Tracking Your Archive History
You can check if a URL has already been archived (and when) using the CDX API, or visually through archive.org’s calendar interface.
If you’ve saved something recently and don’t see it yet, give it a few minutes. Sometimes archiving is queued, especially during high-traffic periods.
Want to make sure your archived copy includes embedded files (like images, CSS, or PDFs)? That’s where tools like our PDF recovery guide come into play - because even a well-timed archive can miss attachments if you’re not careful.
Bonus - Automating Alerts When Pages Change
One clever way to combine SavePageNow with other tools is to monitor a page for changes and trigger a save automatically when something shifts. You can do this by:
Checking the hash of the current page vs. the last saved version
Watching for visual layout changes or new keywords
Pairing with a service like UptimeRobot or a GitHub Action for known endpoints
That way, you're not just saving at random - you’re saving at the moment it matters.
And Don't Regret Not Saving It
You won’t always know what’s about to disappear.
But you’ll know it might.
In a world where digital content can vanish with a policy change, a DNS update, or a panicked PR team, the ability to preserve what you see in real time is invaluable.
Whether you're documenting history, preserving evidence, or just holding onto something fragile before it changes, SavePageNow is your friend. Quiet. Free. And always there - if you remember to use it.