web-archiving
Here are 109 public repositories matching this topic...
Serverless replay of web archives directly in the browser
-
Updated
Jun 12, 2024 - TypeScript
The repository and website hosting the peer review process for new Programming Historian lessons
-
Updated
Jun 12, 2024 - Jupyter Notebook
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
-
Updated
Jun 12, 2024 - TypeScript
InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
-
Updated
Jun 12, 2024 - Python
Run a high-fidelity browser-based crawler in a single Docker container
-
Updated
Jun 12, 2024 - TypeScript
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
Updated
Jun 10, 2024 - Python
A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!
-
Updated
Jun 10, 2024 - JavaScript
Makes saving pages in bulk to the wayback machine much easier
-
Updated
Jun 9, 2024 - HTML
A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.
-
Updated
Jun 7, 2024 - Python
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
-
Updated
Jun 5, 2024 - Scala
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
-
Updated
Jun 5, 2024 - JavaScript
Ed course archiver and viewer
-
Updated
Jun 2, 2024 - Python
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
-
Updated
May 29, 2024 - Java
Streaming WARC/ARC library for fast web archive IO
-
Updated
May 27, 2024 - Python
A Memento Aggregator CLI and Server in Go
-
Updated
May 21, 2024 - Go
Official Python package for ArchiveBox, the self-hosted internet archiving solution.
-
Updated
May 21, 2024
Improve this page
Add a description, image, and links to the web-archiving topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the web-archiving topic, visit your repo's landing page and select "manage topics."