22:11 range | Okay, let me start. 22:11 range | First: Thanks for attending. 22:11 range | Second: I'd like to do a small description of our setup, then what we are missing and maybe then what we want (or don't want). 22:12 range | We have a rather largish mirror setup. Last count was > 400 mirrors world wide, if I remember correctly. 22:12 range | The system we have in place at the moment copes, but doesn't really scale up. 22:12 poeml | not bad! I noticed that there joined a new one nearly once a day, for a while :-) 22:14 range | AFAIR the last real changes have been done several years ago. We have the mirrors in a DB, a script which checks mirrors for correctness of the data (which now takes a long time to run) and sorts the mirrors into files per country. 22:14 range | Another script which does the same for .isos - and mirmon, which checks for staleness of mirrors and disables/enables them in the DB. 22:15 range | So nothing that special so far. We do GeoIP handouts for countries/regions and hand out a mirrorlist of 10 (hopefully) nearby mirrors, which gets randomized. 22:16 range | The system clearly needs to be worked on. So the question is: Do we further development on our own system or do we want to use a system which is actively developed :) 22:17 range | Some things we probably need: We have countries in which some mirror admins don't want their mirrors to be handed out to other countries, because crossborder traffic is expensive. 22:18 range | We have mirror maintainers who would like their mirrors to be the mirror for their network by default - we don't have the granularity for that. 22:18 z00dax6 | and we need finer grain control for places like the US - where just going by country isnt good enough 22:18 range | Looking at what happens in Fedora, it looks like we might also need support for metalinks. 22:18 z00dax6 | although this might just be a case of needing the right geo-ip db from the right place. but its a concern 22:18 range | What we clearly do *NOT* want is mirror maintainers to offer a "public" mirror, which just serves their network and nothing else. 22:19 range | I thinkt that is about what I noted down today :) 22:20 adrianr | I can try to answer what fedora's mirrormanager can do 22:20 poeml | I would certainly recommend to join one of the two systems that are actively developed :-) It's a complex task, doing all this :) 22:20 warty9_andy | From my perspective I would rather uou spend your efforts elsewhere than pushing forward with the current setup. Personally I eould advocate mirrormanager (what fedora uses). If nothing else it pushes the admin onus for most things to the mirror admins vs. CentOS 22:20 range | A few of those concerns also have been raised on the mirror mailing list, as some of you might remember. 22:21 poeml | adrianr: let's hear 22:21 warty9_andy | (Sorry for the spelling, on my phone right now) 22:21 range | As we have people here from both projects, maybe we can do some pro/con? 22:22 adrianr | i think feature wise mirrorbrain and mirrormanager offer almost the same 22:22 adrianr | as a mirror admin I have worked with both and from the standpoint I cannot which one is better or worse 22:22 range | I must admit, that I only took a deeper look at mirrorbrain. 22:22 adrianr | I am also running a mirrormanager setup for rpm fusion 22:23 adrianr | and helping in the fedora mirrormanager setup 22:23 z00dax6 | ok, so lets start at the top - how does mirrormanager check for remote mirror's consistency ? 22:23 warty9_andy | Mirrorbrain (in my experience) leaves the admin tasks solely on the side of the distro 22:24 adrianr | by cron all mirrors are scanned every 8 hours (in my setup) I do not know the frequency fedora uses, but that is just a cron script 22:24 adrianr | it takes a long time, probably just like the script you have right now 22:24 z00dax6 | adrianr: sure, but does it actually download all the content from the remote machine before accepting content ? 22:25 range | Do you have a guesstimate for that? 22:25 adrianr | in addition, mirrors can run a script which updates the database about the status, just after a rsync run for example 22:25 z00dax6 | ..and can multiple machines take that role on ? eg. we have .centos.org machines in 25 countries and we could ( potentially ) spread the load. does something in MirrorManager let us do that already ? 22:25 adrianr | z00dax6: download? not it does not downlaod all the data 22:26 adrianr | hmm, no, i do not think that part can be dsitributed 22:26 adrianr | at least the database needs to run on one host 22:26 adrianr | then you could run the crawlers on different machines 22:26 warty9_andy | I think it just verifies the file is present, not download or sha1 it or something 22:27 adrianr | and then they can connect to the database 22:27 adrianr | no, it just does HEADs 22:27 z00dax6 | yeah, that would be a major issue we need to address. eg. checking all .de based mirrors from a .de based .centos.org machine. 22:27 adrianr | let me look at my logs how long it needs to scan the mirrors 22:27 range | Or at least EU/US/South America and so on. 22:28 poeml | how much checking do you want? 22:28 warty9_andy | Z00dax6: or have the machines be able to self report 22:28 adrianr | it needs over an hour to check a mirror in china from germany (where my crawler is running) 22:28 adrianr | and rpm fusion is small 22:28 z00dax6 | poeml: at the moment, we check ( download and verify ) all repomd content, and do head/ranges on isos + some rpms. 22:29 poeml | for security reasons or to just verify that the correct content is in place? 22:29 z00dax6 | warty9_andy: that needs to be taken with a pinch of salt i think, its important we check for exactly what the users sees. so post-rsync-script might be one dimention, but i feel a bit uncomfortable about it being the only dimension 22:30 z00dax6 | poeml: security as reason no.1, and to make sure that all mirrors handed out in a single mirrorlist= line look exactly the same 22:30 warty9_andy | True, but it is easeier to self report, and then randomly spotcheck after 22:30 z00dax6 | warty9_andy: that is true. 22:30 warty9_andy | Scales better too 22:31 range | How does mirrorbrain do the checks? 22:31 z00dax6 | warty9_andy: do you think it will also increase the barrier a bit to people join'ing as mirrors if there was a script to run ? 22:31 poeml | z00dax6: That's easy to deceive. I strongly advise against doing that for security reason... 22:32 warty9_andy | Not really, I beat on mdomsch to make that self reporting script as server painless as possible 22:32 z00dax6 | poeml: its one of many layers involved. it has helped us catch broken setups more than once. 22:33 warty9_andy | Think there might be a fallback of server crawl if no self report, likely just vetifying things are in place 22:33 range | poeml: Were you talking about a script running on the server side or what z00dax6 was talking about? 22:33 poeml | z00dax6: yes, it's fine to catch brokenness. I just wouldn't trust it for security. 22:34 adrianr | the current setup is the crawler scans every host even if it has already reported its status 22:34 poeml | range: I was referring to downloading content from mirrors to "verify" that they don't try to sneak in manipulated stuff. 22:34 range | Because we don't run one. 22:34 range | Ah. 22:34 range | Okay, I was unclear about that. 22:34 warty9_andy | Proml: that becomes a question of 1 trusting your mirrors and 2 trusting the gpg sigs in the rpm 22:34 range | poeml: How does mirror manager check the mirrors? (while we're at it) 22:35 z00dax6 | and repomd will contain signed data soon too 22:35 poeml | MirrorBrain crawls for filelists, which is (I'd say) as efficient as possible and mainly depends from the remote performance. This check is fast and doesn't download content. 22:35 poeml | It does range requests though for files > 2G though to verify that they are delivered via HTTP/FTP correctly. 22:35 range | So it mostly is a "are all files there" check? 22:35 z00dax6 | but with no verification of content ? 22:36 poeml | There is also support for downloading files and check their md5 sum, to detect/debug brokenness. 22:36 z00dax6 | ok 22:36 range | z00dax6: Well, that is the same issue as above: Do we trust GPG? 22:36 poeml | "support" meaning it'd be some kine of one-liner in a shell/python script to do that for all mirrors 22:37 range | poeml: So random checks would be possible, I guess. 22:37 poeml | range: definitely. 22:38 poeml | warty9_andy: MirrorBrain survived an attack with a rogue mirror. Well :) ( http://www.usenix.org/publications/login/2009-02/openpdfs/samuel.pdf ) 22:38 z00dax6 | and how easy/hard would it be to spread the number of machines doing checks ? 22:39 poeml | z00dax6: the checks are efficient enough that it was never necessary to spread them to more machines (in all setups that I know of). But it could certainly be done. 22:40 poeml | z00dax6: openSUSE is the largest deployment with 1.500.000 files in 70.000 directories 22:40 warty9_andy | Poeml will look when I'm back at lappy. MM should do ok, but that's based on gpg signed rpms, which is overall more robust 22:40 poeml | warty9_andy: MM is okay too, the only problem at the time was that Fedora clients would accept old metadata, which Matt partly worked around by adding Metalink support 22:41 range | That would be the next question (and again I must admit I haven't looked hard enough at 6beta): Do we need metalink support in 6? 22:41 warty9_andy | *nods* 22:41 poeml | z00dax6: not all mirrors have all the files, especially not, let's say, a Chinese mirror, but the scanning is not the problem... the bottleneck is getting the stuff there in the first place :-) 22:42 range | Ah yes. 22:42 range | How about partial content? 22:42 range | We have several mirrors which only carry 5 or which only carry i386/x86_64 as arches. 22:42 adrianr | that is supported in mirrormanager 22:42 z00dax6 | poeml: do you have some time estimates ? 22:43 adrianr | each mirror can decice what it carries, which releases, which arches 22:43 warty9_andy | Poeml: working on the getting data there problem already (chasmd which I / kernel.org are working with them on) 22:44 poeml | z00dax6: how many files/directories do you have? That's the main determinant. I scan all 50 TDF mirrors in 10 seconds, but they have few files :-) 22:45 range | 70000 in 800 at the moment. 22:45 z00dax6 | prolly a few dozen dir's - files <= 50k 22:45 z00dax6 | for c5, something similar for c4/c3 22:46 range | The above is centos-withdvd 22:46 z00dax6 | ah, range has more exact numbers 22:46 poeml | let me check to find something comparable... 22:46 z00dax6 | and i dont see us dropping content checks in a hurry 22:47 z00dax6 | adrianr: what geoip stack does MM use ? is it the standard maxmind geoip-free db ? 22:48 warty9_andy | It uses maxmind right now 22:49 adrianr | yes, it is downloading this file once a week: http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz 22:49 range | adrianr: Is that the free one? 22:49 adrianr | it also uses ASNs 22:49 adrianr | yes, the free one 22:49 adrianr | and admins can enter netblocks for which they want to be the mirror 22:50 range | adrianr: Is that (netblocks) exclusive or does that only mean that their mirror always comes up for them but also for others? 22:50 adrianr | could be both 22:50 adrianr | if you mark your mirror as private then it is exclusive, else not 22:50 z00dax6 | and how do you work out with some level of surity that the guy really is an admin for that netblock ? 22:51 adrianr | private mirrors are not allow to sync from tier 0/ tier 1 mirrors 22:51 adrianr | it is not enforced 22:51 adrianr | the netblock thing is not controlled 22:51 range | But then again most things are signed. 22:51 adrianr | you can basically add anything you want 22:52 adrianr | no, not true 22:52 adrianr | nothing larger than class b is allowed 22:52 adrianr | that cann only be done by an admin 22:52 range | But up to /16 can be done by others? 22:52 adrianr | but if you want you could create lots of trouble 22:52 adrianr | you can only change your mirror 22:53 adrianr | in the webfrontend 22:53 range | Yeah, sure. 22:53 poeml | anyone can create a mirror, it doesn't take much :-) 22:53 range | Does it have to be in the netblock I say it is good for? :) 22:54 range | poeml: Which geoip db does mirrorbrain use? 22:54 adrianr | do not if it actually checks it, would make some sense 22:54 adrianr | I do not know * 22:54 z00dax6 | just for the sake of completeness, if userX marks mirrorA as the mirror for subnet blah, will that machine be the only machine exposed as mirror to the entire subnet ? or is that just one machine that is always included in the list of mirrors for that subnet ? 22:55 poeml | range: the same - the GeoLiteCity one. The GeoLite (smaller) one would suffice, because I don't use the geographical coordinates yet that the GeoLiteCity db offers, but the latter proved to be slightly more up to date sometimes. 22:55 range | Okay. 22:55 z00dax6 | interesting, that will have State level info for the US 22:55 poeml | range: I have a plan to use the coordinates as well, for some special cases - like US 22:55 adrianr | not it will be the machine on top of the mirrorlist created for the clients in the subnet 22:55 range | I know it can do ASN and netblocks, too. 22:56 jthurman42 | MM's ability to let the mirror maintainer define specific blocks is very nice, but would have to be controlled in some manner. 22:56 range | Well, if people like traffic ... 22:56 adrianr | and then there will be the other mirrors 22:56 z00dax6 | adrianr: ok, thanks. that makes sense 22:57 warthog9 | jthurman42: it hasn't been a particular problem for fedora 22:57 warthog9 | (back on my laptop btw) 22:57 poeml | so, I checked for a comparable file tree and found that openSUSE's 'distribution' is not too far off. I scanned Adrian's mirror (from my small 1GHz server) and got: 22:57 jthurman42 | Block request submitted, whois lookups done, authorized parties emailed... you would think that would be all automated by MM. 22:57 poeml | Mon Oct 25 22:54:00 2010 ftp-stud.fht-esslingen.de: files in 'distribution' before scan: 47035 22:57 poeml | Mon Oct 25 22:56:10 2010 ftp-stud.fht-esslingen.de: scanned 47035 files (359/s) in 130s 22:58 jthurman42 | pulling more traffic to you only really DoSs yourself anyway... then connection times out and the next mirror is chosen 22:59 adrianr | poeml: can the backend of mirrorbrain only be accessed by the mirrorbrain admin or are the ways for the mirror admins to change some settings? 22:59 z00dax6 | adrianr: what settings would you ideally want to change as a mirror admin ? 23:00 adrianr | netblocks, private mirror, all the URLs serving the content 23:00 poeml | adrianr: I had the plan since long, to make some parts accessible to mirror admins, but at first, I didn't have the time, because it's a one-man-show, and then in the long term it didn't prove necessary. 23:00 adrianr | this really depends on what you want for centos 23:00 poeml | adrianr: I see it useful mainly for a "stop redirection" button 23:01 adrianr | do you want the possibility to let the user control their setup or not 23:01 warthog9 | z00dax6: machines you control, urls, ips 23:01 range | adrianr: Not sure about that yet :) 23:01 z00dax6 | we were going down the route of allowing an 'admin' to request a specific token - they would need to place on the mirror, which would then allow them to make some changes in a web-ui 23:02 adrianr | i have written lots of nice emails with poeml to configure my mirror (which i did not mind) 23:02 poeml | range: for me, the time was always better spent actually communicating with the mirrors, than writing tools for them. My choice :-) 23:02 * warthog9 actually really likes the direct control he has over his mirrors in MM, probably saves mdomsch a lot of admin work 23:02 range | Yeah, that's what we do right now, but the process tends to lag :) 23:03 poeml | after doing the work over 4-5 years (all on my own), and meanwhile for others, too (OOo and TDF), I can't really say that it is a lot of work. But it certainly creates bottlenecks 23:03 z00dax6 | if there is a reasonable way to make sure that the person requesting changes is indeed the person who runs the mirrors - then yeah, there is clearly some value in webui 23:04 adrianr | for fedora it is a bit more complicated setup 23:04 z00dax6 | btw, it just struck me that when doing a release, the report-from-mirror script might be a nice way to handle the 500 perms on private directories 23:04 adrianr | MM gets the user information from FAS (Fedora account system) 23:04 poeml | I think, what MM does is just fine - it offers people a way to create a mirror, and it is later activated by some admin (who has less work then). But for me, creating a mirror also takes only a minute or two. I would have to verify the entered data anyway (IMO). 23:05 range | Well, mirrorbrain is certainly easier to maintain than what we have at the moment. 23:05 adrianr | with RPM Fusion it is much more simple, it only connects to a local database where I have to add the account manually 23:05 poeml | MM's integration with the Fedora Account System looks great, I'd love to have the same (or steal it) for MB 23:05 adrianr | the user also cannot change their passwords in RPM Fusion in contrast to fedora 23:06 adrianr | as a turbogears application it can theoretically use any compatible account backend 23:06 range | We don't have an account db at the moment, but moving over to a new system will be some work anyway :) 23:07 poeml | by the way, I didn't mention yet that I started a web frontend for MirrorBrain, too. I just never completed it. But it's working at least for the admins. 23:07 poeml | it's a simple Django app. 23:07 range | poeml: Yes, but I like the cli interface (as far as I have seen it yet, anyway). 23:08 poeml | the commandline tool is usable enough that one doesn't really wish for a web app. But for admins that only drop by every few weeks and can't remember anything, web stuff is easier. And it would be the basis for opening it to outside. 23:08 range | Yupp. 23:09 poeml | anyway, you guys will almost certainly like the commandline tool, because you would use it not only 3x per year :-) 23:09 z00dax6 | advantage with a cli took is also that its easier to integrate with other systems 23:09 range | I know mirrormanager supports metalink within yum. Can mirrorbrain do that too? Again I must admit that I have no idea what it takes to do that. 23:10 range | I know MB has metalink support. 23:10 z00dax6 | but, as far as i can tell - there isnt much between the two systems at this time. except: (a) webui (b) mirror side script to check contents 23:10 poeml | MirrorBrain is the Metalink "master". I actually co-authored the RFC :-) It has the most complete Metalink support you will find on this planet ;-) 23:11 * range goes and stand in the corner. 23:11 range | * stands 23:11 poeml | I believe that MM's Metalink support is a bit rudimentary and wouldn't scale very well, but I would need to double chec 23:11 poeml | +k 23:12 range | Anything on torrent tracking? 23:12 poeml | MirrorBrain also has a hash database since recently, which can store hashes of all files (md5, sha1, sha256, btih, ...) 23:13 adrianr | as far as i know there is nothing with torrents in mm 23:13 poeml | and after implementing the hash database, I actually added a Torrent generator, so a Torrent can be generated for any file. Including fallback HTTP seeds, which are exactly those mirrors that are close to the client 23:13 range | I know that MB supports torrent links, but I am not sure on how about the tracker side works. 23:13 poeml | the only thing that's not in MirrorBrain is a tracker - but there are several trackers that can be used 23:13 range | (me lacking knowledge of that mostly, anyway) 23:14 z00dax6 | that would still need a torrent tracker and someplace to announce to 23:14 poeml | either in anonymous mode (then there's no further setup to do), or (if the tracker is accepting only files that it "knows"), the torrents need to be downloaded from MirrorBrain to be given to the tracker, which is easy to do 23:15 * warthog9 bites his tongue on bittorrent 23:15 poeml | the latter we do at the Document Foundation, where we run some weird networked tracker 23:15 poeml | warthog9: I read your paper ;-) (and I like it very much) 23:15 range | warthog9: I know, I know :) 23:15 poeml | warthog9: I hate this Torrent stuff, but some people want it badly, so I resign 23:16 warthog9 | poeml: I know 23:17 poeml | warthog9: on the TDF launch, there was more data served from my little 100 MBit mirror than via Torrent worldwide. Ridiculous. 23:17 range | TDF? 23:17 poeml | range: http://www.documentfoundation.org/ 23:17 range | Ah, not tour de france :) 23:18 range | Does mirrorbrain support private mirrors and if so, does it also put restrictions on them like not being able to download from tier 0/1? 23:18 poeml | range: ;-) secretly launched a few weeks ago -- I joined shortly before, to build a mirror network in just 2 days... 23:19 poeml | yes 23:19 range | And: Is MB able to support incomplete (only one release/arch) mirrors? 23:19 poeml | MB supports the following restrictions: client must be in same region, in the same country, in the same AS, in the same network prefix 23:21 range | Oh, one more question. 23:21 poeml | what can be important in this context is that MB also supports to configure certain mirrors to be preferred as fallback mirrors for a certain region. For instance, sending south african clients to German mirrors (if they don't have a mirror, or theirs is down), instead of a country "next door". 23:21 range | We do that by manually "fixing" geo_cc.pm 23:22 poeml | with MB, it's explicit fallback behaviour, so that a local mirror still is preferred 23:22 poeml | partial mirroring is supported at the heart of MB - it doesn't make any assumptions about completeness, and works fully on file level 23:22 range | Hmm, okay. 23:22 poeml | I actually used to believe that MM works directory-wise, Adrian, do you know details? 23:22 poeml | range: with MB, you can have a mirror that mirrors only a single file. No problem. 23:23 warthog9 | MM I think works fine with partial mirrors as well 23:23 adrianr | MM works on directories, that is also my knowledge 23:23 warthog9 | effectively on a directory bassis 23:23 range | I am still thinking about how to integrate our msync machines into a new setup. 23:23 z00dax6 | warthog9: adrianr: do you know off the top of your head how the mirror-side-script sends data over ? it looks to be a http/rpc call..is that right ? 23:24 adrianr | yes 23:24 adrianr | http/rpc 23:24 warthog9 | z00dax6: yeah http/rpc 23:24 z00dax6 | range: yes, we would need them to run with acl's that get passed in from the 23:24 range | As we clearly want to give everyone running a mirror the possibility to carry the dvds/large files, but then need to restrict those to only mirrors. 23:24 range | Yeah. 23:24 poeml | range: what's msync, and what's the issue with the DVDs? I never understood that, so far 23:24 range | Okay, I guess that can be done via puppet. 23:25 range | In the beginning too few machines for too many mirrors (those are the "masters" for the other mirrors in different parts of the world) 23:25 range | Now a confuzzled structure which isn't resolved easily :) 23:26 poeml | you mean, the structure of mirrors-to-mirrors syncing is complicated? 23:26 poeml | from tier1 to somewhere else? 23:27 range | No. It isn't easy for us to resolve the dvd/no-dvd issue at the moment. 23:27 z00dax6 | poeml: there are a bunch of tiers before the tier0 ( seen by public ) 23:27 poeml | sorry, I still don't understand what that means :-) 23:27 z00dax6 | poeml: and there are 2 kinds of public facing rsync targets, msync and msync-withdvd 23:27 range | We actually deliver two trees to different machines. Only "a chosen few" get to see the dvd. 23:27 z00dax6 | the nomral msync machines have no dvd images, the others do 23:28 z00dax6 | typos! 23:28 poeml | okay, and why do you offer two different trees? 23:28 range | Load issues at the beginning. Now a "historical" issue. 23:28 poeml | is that equivalent to offering smaller and larger rsync modules? 23:29 herrold | range: also, it permits a remote admin to not to have to mess with rsync configs and so lowers the bar to being a mirror 23:29 z00dax6 | time/bandwidth -> load; we dont want non public mirrors pulling dvd's from us - plus the ability to seed up the mirror network quickly goes away very fast when you double the b/w that everyone needs 23:29 range | herrold: Yeah. 23:29 z00dax6 | yeah, so a bunch of issues 23:29 poeml | sounds as if it should be unified? ACcess control, and public modules, could be on the same machine, after all 23:30 range | Yes and yes. 23:30 z00dax6 | poeml: in many cases, they are 23:30 range | I am actually running out of questions at the moment :) 23:30 poeml | I have one :-) 23:30 range | Sure. 23:31 poeml | how many requests do you handle per second? 23:31 z00dax6 | i have one : we know that mirrorbrain might be able to do US states, could mirrormanager do something similar ? 23:31 range | poeml: Ummm. 23:31 z00dax6 | poeml: for what ? 23:32 range | poeml: Centrally we only hand out mirrorlists to the clients. 23:32 range | What they then do with that, is all theirs to care about. 23:32 poeml | z00dax6: so far, MB doesn't use the state info, and it doesn't use geographic coordinates, but I think it could make sense (I described some details here: http://mirrorbrain.org/issues/issue34 ) 23:32 z00dax6 | right, at 5.4 releasetime we did about 3 million in the first 24 hrs of mirrorlist= traffic 23:33 poeml | z00dax6: it would be easy to implement US state, if it actually is a good idea (need to check) 23:33 * z00dax6 confirms 23:33 adrianr | I do not know if it can do states, probably the mm author could answer that 23:33 z00dax6 | adrianr: i guess it just boils down to the geo-ip db 23:34 adrianr | probably 23:34 poeml | anyway, you have so many mirrors (over 400!) that a fine mirror selection definitely pays off (which it doesn't if there's only 1 mirror in South America, you get the idea) 23:34 z00dax6 | poeml: 3million in 45 hrs 23:34 range | poeml: 1.1 mio in the last 12 hours on one of the two machines handing those out. 23:34 z00dax6 | ( from my notes during 5.4 release ) 23:35 range | So I guess 4 a day. 23:35 poeml | so about 25 req/second, if I guesstimate correctly 23:35 range | 46. 23:36 poeml | that's quite relaxed, openSUSE gets 250-400 per second, which is handled easily on a very old box 23:36 adrianr | sorry, but i have to go 23:36 range | adrianr: Thanks for being here. 23:36 z00dax6 | adrianr: thanks for coming along. 23:36 poeml | adrianr: bye! 23:36 adrianr | bye 23:37 range | But mirrormanager already is on US state level? warthog9, do you know that? 23:37 poeml | the mirror lists that you serve would need to be generated by MB then, right? 23:37 range | poeml: Yes. 23:38 poeml | what could be more interesting than state level could be AS adjacency, btw. Enlarging the "radius" of an AS regarding the clients that it gets. 23:38 range | A typical one looks like this: 23:38 range | http://mirrorlist.centos.org/?release=5.5&arch=x86_64&repo=os 23:39 herrold | poeml: crawl before walking 23:39 poeml | herrold: sorry, what do you mean? 23:39 herrold | poeml: enhancements such as proximity measures are fine, but netting it up initially is the first task 23:40 z00dax6 | poeml: what tool did it fail with ? 23:42 poeml | herrold: hm, I didn't get what you mean with "netting it up" 23:42 z00dax6 | so,is there a third tool other than mirrorbrain / mirror manager out there ? 23:43 range | poeml: But yeah, what does mirrorbrain give back to the client.? 23:43 range | IOW: Are there mirrorbrain repositories which work with yum? 23:43 range | z00dax6: I have looked, but haven't really found anything. 23:44 poeml | there is not really a third tool, unless you count Bouncer -- which was used by OOo (migrated to MB earlier this year), but it is still used by Mozilla. 23:44 z00dax6 | range: same same 23:44 z00dax6 | poeml: there is the 'centos tool!' 23:44 range | :) 23:45 poeml | Bouncers development is not completely dead, and sometimes still being worked on, but it has some properties that tie it into the concept that is followed by Mozilla (and earlier OOo) regarding "products" and "language packs" 23:45 z00dax6 | range: btw, i guess MB does support yum repo's since yum is in opensuse, and all *suse stuff is based of repomd/ type repos these days 23:46 range | I thought Suse uses zypper? :) 23:46 poeml | so files are handled in groups, so to speak, which are specific to their products. That makes it not really reusable. And what Bouncer does regarding mirror selection is not "state of the art" anyway. It's in PHP. Mozilla survives only because they run a massive farm of servers, 20 or so). 23:47 range | Okay, I have a pro and con list (and a bit of the introduction) as notes here, which I will put up on the wiki. 23:47 poeml | so Mozilla is the only remaining user now. 23:47 z00dax6 | right, but its backed by repomd afaik 23:47 z00dax6 | poeml: ok 23:47 poeml | z00dax6: sorry about my ignorance about the centos tool :-) I simply never thought about it 23:47 poeml | z00dax6: do more marketing ;-) 23:48 range | That would mean do cleaning up first >:D 23:48 poeml | there is also Apache2::Geo::Mirror (used by CPAN) and closer.cgi, used by Apache, but both are comparably limited (I hope I can say that) 23:48 herrold | release time coherency testing would be a concern at new releast time on our side for adding a candidate into the dispatcher 23:50 poeml | there is also a guy working on a tool called Cacheboy, basically a Squid derivate, which requires root access on all mirrors. Doesn't sound too attractive to me. 23:50 poeml | that's about it. Then there is CoDeeN, which could at least be integrated in a mirror network and then be useful. It's the only remaining of the academic attempts to implement contend delivery networks, I would say. 23:50 poeml | I maintain a list at http://mirrorbrain.org/links/ 23:51 poeml | regarding the question of repository support: 23:52 poeml | MB is opaque to what files it serves. It supports serving repositories - as files, just as it supports serving DVDs, images or anything. 23:53 poeml | openSUSE uses zypper, but offers repo-md metadata (among other types), and I am a long-time yum user with openSUSE (almost exclusively). 23:53 range | Okay, that needs more reading anyway. 23:53 poeml | repo-md as generated with createrepo, for instance. 23:53 range | But I've learned several things today, which gives me ideas on what to read :) 23:54 poeml | having said that, early I had the thought of generating some kind of mirror list. I designed my own text-based mirror list format, which looked _quite_ similar as your yum mirror lists -- it simply included other data, like country after the URLs. 23:55 z00dax6 | herrold: definitely high on things-to-resolve list 23:55 range | I am developing a splitting headache, so I cannot really concentrate anymore at the moment. 23:55 z00dax6 | guys, I need to head off. 23:55 range | Yeah. 23:55 range | Thanks again to everybody who was here today. 23:55 poeml | I looked around and found that Metalinks (which are the same in XML) are a good fit for this purpose, so instead of inventing my own stuff I joined the Metalink effort, and generating Metalinks is one of the things that MB does now. 23:55 warthog9 | (sorry got dragged off to do a presentation) 23:55 range | I put that stuff (my notes plus irc log) on the wiki tomorrow. 23:55 z00dax6 | poeml: thanks for coming by. I am sure we will be in touch 23:56 poeml | okay, sorry that you have to leave 23:56 warthog9 | awesome :-) 23:56 range | warthog9: You just came back at the right time, it seems :) 23:56 warthog9 | wheee 23:56 warthog9 | perfect timing 23:56 range | poeml: But most of us hang around here most of the time :) 23:56 poeml | anyway, generating text mirror lists is trivial... I deleted that old code a while ago ;-) 23:57 range | poeml: My problem is that I am kind of in information overload mode at the moment (plus the headache). 23:57 poeml | range: I'm also logged into this channel since a few months, I think ;-) 23:57 range | Yeah, but nothing ever happens :) 23:57 poeml | range: no problem. Relax and do something else (or nothing ;-) 23:57 poeml | range: all the best. 23:58 range | I think real work on that can begin after the 6 release. 23:58 poeml | me too. I'd be happy to enhance and extend MirrorBrain. If anything's missing, let's add it. For the good of everybody. 23:58 range | Great. 23:59 herrold | range: proceduraly for self serve mirrors to be, something like http://bugs.centos.org/view.php?id=4443 is needed 00:00 range | Yes. 00:01 range | Maybe an ls -lR, too, like in the old times :) 00:02 poeml | herrold: maybe http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/tools/rsyncinfo?revision=8120&view=markup could be useful, a tool that can assess directory sizes on mirrors (or stage servers) via rsync 00:02 herrold | we can do better with a minimum of datemarking and use it as an intial canary to check freshness 00:02 herrold | taht to range ... dunno as to the poeml item 00:03 poeml | % rsyncinfo size rsync://ftp-stud.fht-esslingen.de/opensuse/distribution/ 00:03 poeml | opensuse/distribution/ 111.49G 00:03 poeml | I used that to publish a nightly list of rsync module sizes for openSUSE 00:03 herrold | poeml: one reasin for a recursed tree listing is people want to see what is in a director of a huge number of files -- having a well know beacon to DL and grep simplifies life 00:04 herrold | secondary tools may be built on such such as my 'Oracle' 00:04 poeml | herrold: I see, you are more interested in seeing the contents 00:04 herrold | poeml: I am interested in killing lots of birds that people carp about with a single rock 00:05 poeml | herrold: I was remembered to the frequent "how much space does it take to mirror XY" questions recently, where there didn't seem to be a good place to point people to 00:05 herrold | poeml: that bug was filed after one such 00:05 poeml | ah, my English 00:05 range | See you all around ... 00:06 poeml | range: see you, Ralph