Opened 3 years ago
Last modified 13 months ago
#2706 new task
Set up load balancing configuration for download.osgeo.org
Reported by: | robe | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | Sysadmin Contract 2024-I |
Component: | SysAdmin | Keywords: | |
Cc: |
Description
One of the things on my list was to setup some sort of cdn setup for download.osgeo.org
We do have ftp.osuosl.org which we can push traffic to. In theory we should be able to set this up on nginx.
as detailed here:
https://nginx.org/en/docs/http/load_balancing.html
Though not sure how well that works for balancing download traffic
Change History (5)
comment:1 by , 3 years ago
comment:2 by , 3 years ago
I just had an even crazier thought to this.
I think the speed between the servers is very fast. It's the push out of the network that is bounded.
That said if I set up a round robin in DNS for download, but I simply have osgeo3, osgeo4, osgeo9 have a redundant nginx config for download (have download accept all those as proxies for it), pointing back to osgoe7, then that might work. I'm going to give that a try with bottle.download.osgeo.org.
This of course still requires doing at the dns level - setting up for round robin, and will still require that folks use upload.osgeo.org for uploading (since that will be the only one that has ssh port open). Depending on which server you hit with download.osgeo.org, the ssh port might or might not be open.
comment:3 by , 3 years ago
Milestone: | Sysadmin Contract 2022-I → Sysadmin Contract 2022-II |
---|
I've started to work on this -- a lot of the notes are on #2705.
So I have set up a round-robin for download.osgeo.org and notified via project and discuss to use upload.osgeo.org for sftp. upload.osgeo.org will remain only connected to osgeo7-download.
I have download-cache.osgeo.org for testing which consists of (osgeo4 and osgeo9 which pull directly from upload.osgeo.org).
I have download.osgeo.org which consists of (osgeo7 pulling via download.lxd and osgeo9 pulling via upload.osgeo.org). Note both ultimately go thru the nginx on osgeo7, so nginx itself is not issue of slow download on osgeo7.
All osgeo9 does is proxy straight to upload.osgeo.org (nginx) -> osgeo7-download, but yet when this is active speed can be like anywhere from 6MB/s to 20MB/s.
How this is possible my guess is the connectivity between the hosts is at least 100GB/s but the thru put out to the world is much lower and since osgeo7 is heavily taxed network out, it cripples the outbound network. osgeo9 only caches the current request pulling at 100-1000GB/s from download and since it is not taxxed with as many requests can push out much faster.
Putting this in place immediately ballooned osgeo9 traffic.
Here are stats from osgeo9:
osgeo9 vnstat output as of now - note I turned it on 2 days ago, so that 2022-03: 7.58 tiB is just for the 2 days. The traffic though I think includes copying from upload.osgeo.org (so really half of that).
Anyway it's huge and I can't believe how huge it is.
On osgeo9 as of now
vnstat output
rx / tx / total / estimated enp2s0f0: 2022-02 5.44 GiB / 425.10 GiB / 430.54 GiB 2022-03 3.60 TiB / 3.98 TiB / 7.58 TiB / 8.42 TiB yesterday 1.27 TiB / 1.24 TiB / 2.51 TiB today 2.16 TiB / 2.11 TiB / 4.27 TiB / 4.73 TiB
vnstat -d 5 #for last 5 days
# note late 3/26 is when I added it to round robin
enp2s0f0 / daily day rx | tx | total | avg. rate ------------------------+-------------+-------------+--------------- 2022-03-24 191.36 MiB | 21.29 GiB | 21.48 GiB | 2.14 Mbit/s 2022-03-25 471.22 MiB | 21.44 GiB | 21.90 GiB | 2.18 Mbit/s 2022-03-26 160.08 GiB | 174.21 GiB | 334.29 GiB | 33.24 Mbit/s 2022-03-27 1.27 TiB | 1.24 TiB | 2.51 TiB | 255.63 Mbit/s 2022-03-28 2.23 TiB | 2.17 TiB | 4.40 TiB | 483.15 Mbit/s ------------------------+-------------+-------------+--------------- estimated 2.40 TiB | 2.34 TiB | 4.75 TiB |
Now on osgeo7:
vnstat
rx / tx / total / estimated eno1: 2022-02 1.54 TiB / 104.72 TiB / 106.26 TiB 2022-03 1.76 TiB / 115.49 TiB / 117.25 TiB / 130.25 TiB yesterday 27.14 GiB / 2.95 TiB / 2.97 TiB today 44.84 GiB / 4.18 TiB / 4.22 TiB / 4.66 TiB
vnstat -d 5 #for last 5 days
eno1 / daily day rx | tx | total | avg. rate ------------------------+-------------+-------------+--------------- 2022-03-24 75.43 GiB | 4.45 TiB | 4.52 TiB | 460.09 Mbit/s 2022-03-25 70.75 GiB | 4.40 TiB | 4.47 TiB | 454.90 Mbit/s 2022-03-26 46.99 GiB | 4.24 TiB | 4.29 TiB | 436.80 Mbit/s 2022-03-27 27.14 GiB | 2.95 TiB | 2.97 TiB | 302.77 Mbit/s 2022-03-28 45.75 GiB | 4.26 TiB | 4.31 TiB | 473.15 Mbit/s ------------------------+-------------+-------------+--------------- estimated 49.35 GiB | 4.60 TiB | 4.65 TiB |
So how do we solve this issue.
- Finish setting up osgeo8 to also act as a proxy. This one can be a true cache since it has much more disk space than osgeo9. So it can do a full rsync of download. Short term solution. One issue I am working out is that all the traffic coming thru osgeo9 to osgeo7 is being logged as osgeo9 on download container. Which is both good and bad. Good in that it's easy to see how much traffic osgeo9 is picking up, but bad in that I don't have a single authoritative log (then again we wouldn't anyway with a true round-robin). osgeo9 logs are showing the true identity of traffic it is handling.
- Curb traffic - I'm investigating nginx settings to say limit each user to 1 or 2 requests per second etc or limit bandwith. I've been trying - https://www.nginx.com/blog/rate-limiting-nginx/ but my settings seem to be ignored or not working as expected. There is a lot of bot traffic (we really don't need hogging resources). I still need to break up the stats to figure out low hanging fruit that should just be killed off.
- Setup a true CDN for download around world (future plan, this could be costly something like keycdn comes to mind as someone had suggested a while back since they offer an open source plan - https://www.keycdn.com/open-source-cdn. Though given how much traffic this is, I suspect we'll quickly run out or not be able to use download.osgeo.org for name which would make it worse than just adding some extra round robin vms on commercial cloud hosters (hetzner, atlantic, digital ocean come to mind). Keycdn commercial pricing is $0.01/GB per month for NA/Europe for over 100 TB/month - which would be the bulk of our traffic. Given we are doing about 105-130 TB if my math is right would be about $1300/mth -- way too much.
comment:4 by , 2 years ago
Milestone: | Sysadmin Contract 2022-II → Sysadmin Contract 2023-I |
---|
pushing to next milestone since my contract funds have been used.
comment:5 by , 13 months ago
Milestone: | Sysadmin Contract 2023-I → Sysadmin Contract 2024-I |
---|
Moving my prior still open items to the next proposed Milestone
This as a feared doesn't work for our needs. I tested using bottle.downoad.osgeo.org using the download backup on osgeo4.
osgeo4 upload is around 10 / 13 MB/s, much faster than osgeo7, perhaps because it isn't pounded on so much.
But setting up as noted in load_balancing. I still ended up with slow osgeo7 speed.
My nginx script for bottle.download.osgeo.org on osgeo7-nginx looked something like this: