once upon a time i was in charge of integrating with akamai and so almost 20 years later everything is jaded by that model but we are pushing parts of our app to AWS and the TERRABYTES of file storage actively used by our customers is a hot topic.

i'll try to keep my objectives simple:

- we currently host hundreds of thousands of files on our own hardware/web file-delivery servers
- would like to use that going forward as the seed/pull-from location to populate the CDN/edge servers. is that possible? many of those files may never be requested by end users so it seems a waste to copy the entire server up to S3
- speaking of that, we have a process where hundreds or even thousands of those files are updated daily. S3 needs to be able to pull those; in the past we set a TTL of 24 hours or something similar to ensure stale files were removed promptly
- we also had a management tool where we could flush a given directory should an outdated file or problem arise where we needed to immediately remove access.

does any of that make sense?

here's the tl;dr

- can S3 storage match an existing file/directory structure hosted on our hardware and do we have to push the entire file server to S3 before it can manage and push those files to the Edge servers?

This post was edited on 9/6/23 at 4:06 pm

2 ...

Report Post

Posted by LemmyLives

Texas

Member since Mar 2019

9996 posts

Posted on 9/6/23 at 5:37 pm to CAD703X

I am just an AWS CCP, so I'm spit balling. You are charging customers for the TB they're using right? That makes their data migration their cost decision, not yours. If they are willing to pay to store TB of data on S3, they get to store TB of data on S3.

Doesn't AWS already have tools to move stuff from S3 to Glacier in a relatively automated fashion? You could push non stale files into S3, and push stale files into Glacier, just in case. I don't know about the guts of Storage Gateway, but that could be an answer.

This Stackoverflow question (which wasn't answered, from 8 years ago) might give you some threads to pull. It's over my head, but you might recognize something useful.

0 ...

Report Post

Posted by BabySam

Member since Oct 2010

1528 posts

Posted on 9/6/23 at 6:10 pm to CAD703X

Been almost a year since i was up to date on AWS stuff, but sounds like the perfect use case. Your hardware as storage gateway and pushed up to aws, set retention or even tiered for frequent/infrequent access. All which can be automated.

Page 1