Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifiying Dead Links And Replacing Them #409

Open
bonedaddy opened this issue Jun 7, 2020 · 9 comments
Open

Identifiying Dead Links And Replacing Them #409

bonedaddy opened this issue Jun 7, 2020 · 9 comments
Labels
Engineering Changes our tools and data pipeline

Comments

@bonedaddy
Copy link
Collaborator

bonedaddy commented Jun 7, 2020

For now most links are actively and viewable, however we will inadvertently get dead links, such as those reported in #392

While "dead" the data isn't lost as it will be captured by my archiver tool, we need a method for:

  1. Identifying dead links
  2. Replacing/supplementing dead links with the backups on IPFS

I'm not sure what the best method is, I suppose I can have some central listing place that I periodically post the new backup links to?

@ghost
Copy link

ghost commented Jun 8, 2020

I think we could use a torrent file so others can grab from your archive and create redundancy. Of course we need an ID system to ensure it's easy to grab from the file. This would also allow maintainers to re-upload.

We may want to use a combo of free services like streamable and image.fri so most folks can re-establish links as well as an AWS solution one of the maintainers could host from as suggested in another thread. A mix of centralized and decentralized.

@bonedaddy
Copy link
Collaborator Author

bonedaddy commented Jun 8, 2020

IPFS is somewhat like torrents in the sense that people can "seed" the data. There's a WIP PR I have going #286 that contains the instructions on how to mirror the archive

@ubershmekel ubershmekel added the Engineering Changes our tools and data pipeline label Jun 10, 2020
@ubershmekel
Copy link
Collaborator

ubershmekel commented Jun 11, 2020

@bonedaddy I think we're going to need to have some of this backed up media links inside the repo directly. Especially when links die.

@Murkantilism
Copy link
Collaborator

Murkantilism commented Sep 8, 2020

@bonedaddy @ubershmekel may I propose a simple square bracket tag for identifying dead links? I just found one:

image

@ubershmekel
Copy link
Collaborator

I would make the language less morbid, but I agree. Perhaps something like

[original link that is now broken](https://example.com)

@Murkantilism
Copy link
Collaborator

@ubershmekel ah perhaps I chose a bad example, I meant more like this, with a whitespace separator:

[Dead] [Photojournalist's account](https://twitter.com/bfeinzimer/status/1277014331968782339)

To preserve the original context if trying to replace it. And yeah I'm fine with different language, something like [Broken] or [404].

@ubershmekel
Copy link
Collaborator

@Murkantilism I misread your example. At the moment I would prefer to keep the markdown syntax to keep the parser simple and fit the existing data structure at https://raw.githubusercontent.com/2020PB/police-brutality/data_build/all-locations-v2.json

@Murkantilism
Copy link
Collaborator

@ubershmekel ah good point! Maybe a pipe separator within the link markdown, something like this?

[Broken Link | Photojournalist's account](https://twitter.com/bfeinzimer/status/1277014331968782339)

Also, do we care about differentiating why a link is broken? ie: if the twitter account was deleted versus a genuine 404 page for example.

@ubershmekel
Copy link
Collaborator

ubershmekel commented Sep 9, 2020

@Murkantilism that looks fine by me.

On differentiating why broken - I'd be fine with either option. Though managing a nomenclature for such a system might be a bit much for a small project like ours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Changes our tools and data pipeline
Projects
None yet
Development

No branches or pull requests

3 participants