Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape and add Non-webextension based extensions and themes for Thunderbird and SeaMonkey #58

Open
mattatobin opened this issue May 9, 2019 · 9 comments

Comments

@mattatobin
Copy link

mattatobin commented May 9, 2019

It would be really nice if we could preserve the remaining extensions on addons.thunderbird.net because either those projects will fail or as time goes on they will have to admit defeat and eventually purge them.

IF NOT for the main extension perhaps a light fork.

I could use my scraping techniques to get the data and xpi files for you if that would be of help. I know it would be a hell of a lot less and given I have the CAA ids I could exclude those you already have that support Phoenix.

@JustOff
Copy link
Owner

JustOff commented May 9, 2019

To be honest, I'm unlikely to have enough inspiration to do this in the foreseeable future, if at all. Let better someone who is really interested create an independent project or use the full fork of the CAA for this purpose.

@mattatobin
Copy link
Author

mattatobin commented May 9, 2019

Well I will do a scrape and grab and store it away for someone who wants to implement or create a fork soon.

@wyatt8740
Copy link

Well I will do a scrape and grab and store it away for someone who wants to implement or create a fork soon.

Let me know when this is done; I'd sort of like to take a stab at it.

@autoteelar
Copy link

did anyone ever scrape the themes? even if they arent in the CAA archive, i only have the default theme. any theme files that work for firefox 52 or basilisk specifically would be appreciated

@mattatobin
Copy link
Author

mattatobin commented Oct 18, 2022

Nope. The 2014 scrape included themes because they were contemporary to Pale Moon and after that i didn't find it was pressing to care about because Australis made theming more trouble than it was worth. I know for a fact no one else had enough foresight back then and I bet few properly did even at the 11th hour. I know my priority was filling in the xul extension gap from my 2014 scrape which was based ONLY on ids collected over 48 hours for that purpose. My 2014 and 2018 plus JustOff's now are IN caa..

But CAA has some problems the first being that @JustOff has either abandoned you all because of geopolitical events or he hasn't survived.. Either way unless we know if he is alive and will eventually come back i feel it is important that everyone redouble their efforts to have a copy of the caa tarball and as many as can do should try and setup mirrors so that no one may ever wipe out 20 years of history like it didn't matter.

I should also attempt to place the web-accessable Phoenix Extension Archive on next year's calendar as well. I mean I only been intending to do it for what? 8 years or so..

@mattatobin
Copy link
Author

mattatobin commented Oct 24, 2022

I FORGOT.. I stashed my 2018 amo scrape scripts in my scripts repo.. These should still apply to atbn as it is olympia before the purge.

https://code.binaryoutcast.com/infrastructure/scripts/src/TRUNK/sniplets/php/amo-scrape

You will need modules/basicFunctions.php from an earlier version of Phoebus from the same time period.. Likely the last 1.x version. In fact I am almost certain of it. https://repo.palemoon.org/MoonchildProductions/phoebus/src/branch/PHOEBUS_1.9/modules/basicFunctions.php

@autoteelar
Copy link

nice! are those 2014 scrapes still available anywhere aswell?

and yeah, i hope he comes back :)

@mattatobin
Copy link
Author

mattatobin commented Oct 28, 2022

The 2014 scripts abused AUS to abuse the discover API .. While it worked well for KNOWN ids.. unknown proved difficult. BUT with all the Pale Moon users we had a good sampling of the most POPULAR ids cause they were being requested for. I sampled those over 48 hours and did the 2014 thing of which I DO have themes. Also some stuff in 2014 wasn't ON amo in 2018 so that is why my earlier effort was important and why I .. eventually... gave it to that crazy Ukrainian gnome to fill out and have at least ONE functioning store which IS the most complete firefox extension archive on this planet.

The 2018 query the olympia unified amo api and are still applicable to atbn. Tho the scripts are very adhoc and step by step by step they did the job to download 19 thousand extensions in 2018 to combine with 2014 scrape + pre-ftp shutdown scrape cause they were all fully on ftp.mozilla.org when it was ftp at one point.. my aus/discover scrape filled out the full ftp dump filled out the 2018 scrape (justoff's).

The API is .. or WAS documented too so you can likely create a more elegant solution than my smattering of rushed adhoc scripts doing something under the gun.

I would suggest though you sleep your script for 2 seconds between every request you make to an amo backend cause it DOES have abuse countermeasures. Since the remaining store on atbn is a fuckton smaller it won't take nearly as long once you get the process going.

@mattatobin
Copy link
Author

mattatobin commented Oct 28, 2022

Here are directory dumps from my 2014 archive.
firefox-extensions.txt
firefox-themes.txt
firefoxbutnotonamo.txt
nodata.txt
allnotfirefox.txt

I suppose I could provide allnotfirefox and nodata and if you do the atbn scrape you can merge 2014 with atbn with ca which has 2014 scrape 2014 ftp and 2018. AND if I can find that old mirror with SOME of mozdev's shit on it again.. You could have NOT JUST the most complete Firefox Extension Archive on the planet but the most complete AMO archive on the planet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants