docs: clarify AWS Lambda storage #2477

connorads · 2024-05-20T08:24:56Z

There is ephemeral storage in /tmp
https://docs.aws.amazon.com/lambda/latest/api/API_EphemeralStorage.html

Which could technically be used if desired
CRAWLEE_STORAGE_DIR=/tmp/crawlee/storage

There is ephemeral storage in `/tmp` https://docs.aws.amazon.com/lambda/latest/api/API_EphemeralStorage.html Which could technically be used if desired `CRAWLEE_STORAGE_DIR=/tmp/crawlee/storage`

website/versioned_docs/version-3.10/deployment/aws-cheerio.md

B4nan

I am honestly not sure if this is adding any clarity, to me its actually adding confusion (as now you say its crawlee that has read only storage?). If you want to improve this, why not mention what you said in the PR description explicitly?

barjin · 2024-05-23T09:47:32Z

Which could technically be used if desired (CRAWLEE_STORAGE_DIR=/tmp/crawlee/storage)

This is only true to an extent - the ephemeral storage can be shared between different Lambda invocations, provided they run in the same execution environment (i.e. if you call the Lambdas one after another, AWS will repurpose the running Lambda environment). This might cause some very hard-to-debug issues (stuck shared state from the previous runs) - even though Crawlee should always purge the previous state, you can never be too cautious with these things :) This is especially important if you want to run multiple crawler instances in one Lambda.

I agree w/ @B4nan that explaining all these whys and wherefores is rather counterproductive - I'd show the one and only way to do this rather than confusing the reader with (more or less) irrelevant details.

connorads · 2024-05-25T13:06:44Z

Thanks for your feedback @B4nan and @barjin

Sounds like your saying we should use in-memory storage not because of the readonly Lambda filesystem but because it will cause the "statefulness" and potential hard to debug issues. I've tried to update it to express that 70a4fdd.

If you still think its worse than before then feel free to edit it and/or close this pull request.

Clarify AWS Lambda storage

fab0e3c

There is ephemeral storage in `/tmp` https://docs.aws.amazon.com/lambda/latest/api/API_EphemeralStorage.html Which could technically be used if desired `CRAWLEE_STORAGE_DIR=/tmp/crawlee/storage`

connorads changed the title ~~Clarify AWS Lambda storage~~ docs: clarify AWS Lambda storage May 20, 2024

connorads commented May 20, 2024

View reviewed changes

website/versioned_docs/version-3.10/deployment/aws-cheerio.md Show resolved Hide resolved

Try to add whitespace back

54d3217

connorads commented May 20, 2024

View reviewed changes

website/versioned_docs/version-3.10/deployment/aws-cheerio.md Outdated Show resolved Hide resolved

connorads added 2 commits May 20, 2024 09:27

Trying to resolve diff

47d9398

Fix whitespace hopefully

d1c4ef4

connorads marked this pull request as ready for review May 20, 2024 08:40

B4nan reviewed May 22, 2024

View reviewed changes

B4nan added the t-tooling Issues with this label are in the ownership of the tooling team. label May 22, 2024

Put underlying reason for not using file storage

70a4fdd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: clarify AWS Lambda storage #2477

docs: clarify AWS Lambda storage #2477

connorads commented May 20, 2024

B4nan left a comment •

edited

barjin commented May 23, 2024

connorads commented May 25, 2024

docs: clarify AWS Lambda storage #2477

Are you sure you want to change the base?

docs: clarify AWS Lambda storage #2477

Conversation

connorads commented May 20, 2024

B4nan left a comment • edited

Choose a reason for hiding this comment

barjin commented May 23, 2024

connorads commented May 25, 2024

B4nan left a comment •

edited