#2001 closed defect (fixed)
Mailing list delay sending messages
Reported by: | Jeff McKenna | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | Unplanned |
Component: | SysAdmin/Mailman | Keywords: | |
Cc: | rouault, Mateusz Łoskot, strk, wildintellect, jef, neteler, sac@… |
Description
This was reported on the SAC list by Even (https://lists.osgeo.org/pipermail/sac/2017-September/008467.html), and I can confirm that today there is at least a 1 hour+ delay in messages. (I had tested this Friday and it wasn't an issue for me, but today it is).
Change History (56)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
I am trying to restart the mailman service on osgeo6 VM:
sudo /etc/init.d/mailman restart jmckenna is not in the sudoers file. This incident will be reported.
Can someone please add user 'jmckenna' to sudoers on osgeo6? (since during off-hours it is always me tackling these issues) Thanks!
comment:3 by , 7 years ago
I have alternatively tried:
sudo /usr/lib/mailman/bin/mailmanctl restart jmckenna is not in the sudoers file. This incident will be reported.
comment:4 by , 7 years ago
Jeff, seems you've nudged the mailmain so the queued posts have arrived, in my inbox and archives too
https://lists.osgeo.org/pipermail/sac/2017-September/thread.html
follow-up: 7 comment:5 by , 7 years ago
Copied here since mailman currently doesn't deliver it in time:
From: Markus Neteler <neteler@osgeo.org> Date: Sun, Sep 24, 2017 at 6:17 PM Subject: Re: [SAC] Long email distribution delays on OSGeo lists since yesterday To: System Administration Committee Discussion/OSGeo <sac@lists.osgeo.org>
Yes, looks like that: http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html
mailq | grep mailman-bounces@lists.osgeo.org | wc -l 91552
One issue is probably a single user/the related mail server (the same email address as last time) which currently has a backlog of
# email address not disclosed publicly: mailq | grep OMITTED | wc -l 39172 The error: 451 Recipient is busy, please try again later
MartinS may apply the postfix trick from last time to relax the situation.
Best, Markus
comment:7 by , 7 years ago
Replying to neteler:
mailq | grep mailman-bounces@lists.osgeo.org | wc -l 91552
Now:
mailq | grep mailman-bounces@lists.osgeo.org | wc -l 69012
The situation seems to get better:
http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html
comment:8 by , 7 years ago
This ticket is assigned to me but to be honest I'm quite clueless on how to help in alleviating this issue.
comment:9 by , 7 years ago
Should we change ownership of "Mailing List" component back to the SAC mailing list ?
comment:10 by , 7 years ago
Owner: | changed from | to
---|
For the record: things continue getting better:
# mailq | grep mailman-bounces@lists.osgeo.org | wc -l 6967
follow-up: 12 comment:11 by , 7 years ago
Today there are 1504 messages in queue, of which 1453 (97%) are bounces to a single address of Ari Jolma (user ajolma). Does anyone have an offline contact with that user ?
comment:12 by , 7 years ago
Replying to strk:
Today there are 1504 messages in queue, of which 1453 (97%) are bounces to a single address of Ari Jolma (user ajolma). Does anyone have an offline contact with that user ?
I do. Ari is a GDAL contributor. Probably an old email of him not unsubscribed. I'll email him about that
follow-up: 17 comment:13 by , 7 years ago
It's a gmail.com address - and there are a lot more. And for some there is:
450-4.2.1 The user you are trying to contact is receiving mail at a rate that 450-4.2.1 prevents additional messages from being delivered. Please resend your 450-4.2.1 message at a later time. If the user is able to receive mail at that 450-4.2.1 time, your message will be delivered. For more information, please 450-4.2.1 visit 450 4.2.1 https://support.google.com/mail/?p=ReceivingRate 100si441760lfs.108 - gsmtp (in reply to RCPT TO command))
Who can add a SPF to the osgeo.org DNS zone? Apparently DNS is not run by us but PAIRNIC.COM.
comment:15 by , 7 years ago
Others "top" users:
- maillists at codeha dot us (not sure who that is - isn't codehaus dead?)
- maplabs at light42 dot com (aka darkblue_b)
comment:16 by , 7 years ago
Replying to ajolma:
So my gmail.com address is a problem? What could I do?
Not yours specifically, looks like we're just trying to send too much mail to gmail.
comment:17 by , 7 years ago
Replying to jef:
Sorry, actually above is a different message than yesterday - which pointed at "authentication" via SPF.
That's the one:
421-4.7.0 suspicious due to the very low reputation of the sending domain. To 421-4.7.0 best protect our users from spam, the message has been blocked. 421-4.7.0 Please visit 421 4.7.0 https://support.google.com/mail/answer/188131 for more information. y40si1011389pla.813 - gsmtp (in reply to end of DATA command)
comment:19 by , 7 years ago
Hm, is a SPF for osgeo.org enough? Shouldn't we also have one for lists.osgeo.org (which is where the mail originated from)?
Apparently yes: https://serverfault.com/questions/322949/do-spf-records-for-primary-domain-apply-to-subdomains
follow-up: 21 comment:20 by , 7 years ago
we could have all domains delegate to spf.osgeo.org to centralize control of SPF record. Who has access to DNS ?
comment:21 by , 7 years ago
Replying to strk:
we could have all domains delegate to spf.osgeo.org to centralize control of SPF record. Who has access to DNS ?
comment:22 by , 7 years ago
Anyone with access to Secure can access the DNS, also the Treasurer of the Board.
Can you please clarify what changes need to be made?
The SPF record for osgeo.org currently has
v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org
Here is the syntax guide: http://www.openspf.org/SPF_Record_Syntax
comment:23 by , 7 years ago
I copied the @ SPF record to lists and mail subdomains as TXT records. Apparently dedicated SPF records are depricated and no longer recommended.
@ » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org" lists » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org -all" mail » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org -all"
Several online validators show these are ok. Though I'm getting mixed answers from the documentation about the -all at the end. Let's see if gmail is happy now.
comment:25 by , 7 years ago
Today there seems to be a 2+ hour delay again for emails to OSGeo lists. The problem is back.
comment:26 by , 7 years ago
Cc: | added |
---|
Repeating last message (and adding people to the CC, to throw direct Trac notifications to interested members): problem is back today, with 2+ hour delay again. Likely if left alone this will be 7+ hours again by Monday.
comment:27 by , 7 years ago
update: several international SMS messages later....today is also a big deadline for the first stage of proposals for FOSS4G-2019. Well this caused chaos because the Conference-Dev list is now more than 2 hours delayed....
comment:28 by , 7 years ago
More than 45000 mails queued. looks like chinese spam from @qq.com. cleaning...
http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html
comment:30 by , 7 years ago
also rejecting and cleaned @163.com and @139.com. "just" 651 requests left in queue.
comment:31 by , 7 years ago
update: spammer is still trying to feed us mails from numerous IPs. postfix is rejecting them and in turn fail2ban bans the IPs.
~600 queued requests for quite a while now.
follow-up: 33 comment:32 by , 7 years ago
Jef: are these rejections automatic or are you doing that manually ?
comment:33 by , 7 years ago
Replying to strk:
Jef: are these rejections automatic or are you doing that manually ?
Postfix rejects those three domains automatically after they were added to /etc/postfix/access (and use of access was setup).
follow-up: 35 comment:34 by , 7 years ago
The /etc directory on that server is under a local git, could you add the relevant configurations in that repository to better track changes ?
comment:35 by , 7 years ago
Replying to strk:
The /etc directory on that server is under a local git, could you add the relevant configurations in that repository to better track changes ?
done.
comment:36 by , 7 years ago
@qq.com has been posting as a non-member to OSGeo lists for a while now...thanks Jürgen!!!!!!!!!
comment:37 by , 7 years ago
domains also blocked on projects (otherwise relays to osgeo6, which osgeo6's postfix blocks and in turn fail2ban blocks all mails from projects).
comment:38 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Delivery back to normal for a while now...
comment:39 by , 7 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
qq.com email addresses are spaming our -owner aliases now (see ticket# 1778 https://trac.osgeo.org/osgeo/ticket/1778). Can the fix from comment#28 be applied again? "@qq.com gets rejected in postfix now"
comment:40 by , 7 years ago
more info from the headers of one of the many @qq.com spams received today, showing how OSGeo postfix processes it:
from osgeo6.osgeo.osuosl.org (localhost [127.0.0.1]) by lists.osgeo.org (Postfix) with ESMTP id 23FB8605B830 for <jmckenna@gatewaygeomatics.com>; Tue, 5 Jun 2018 10:12:28 -0700 (PDT) from plasticscrap.us (unknown [123.8.242.201]) by lists.osgeo.org (Postfix) with SMTP id CC8D960650B2 for <atlanticcanada-owner@lists.osgeo.org>; Tue, 5 Jun 2018 10:12:24 -0700 (PDT) from plasticscrap.us (unknown (96.164.231.142]) by plasticscrap.us with SMTP id e2ef1992-d623-451a-a68c-94630c9eefdc; for <2498073052@qq.com>;Wed, 06 Jun 2018 01:12:34 +08:00 }}
comment:41 by , 7 years ago
I notice a spike in queued emails recently on the OSGeo email server, which I think has to do with these @qq.com spam messages : http://webextra.osgeo.osuosl.org/munin/static/dynazoom.html?plugin_name=osgeo.org%2Fosgeo6.osgeo.org%2Fpostfix_mailqueue&start_iso8601=2018-05-01T06%3A26%3A12-0300&stop_iso8601=2018-06-18T12%3A26%3A12-0300&start_epoch=1525166772&stop_epoch=1529335572&lower_limit=&upper_limit=&size_x=800&size_y=400&cgiurl_graph=%2Fmunin-cgi%2Fmunin-cgi-graph
comment:43 by , 7 years ago
Perhaps related or the same as the issue Jeff McKenna raises above, since a few days, I don't seem to receive copies of the emails I send to mailing lists (observed on difference mailing lists, like gdal-dev, COG, mapserver-dev). I do receive answers from others to my posts, but I don't receive th copy of my own posts. Has any configuration change being done ?
comment:44 by , 7 years ago
I've examined the logs closely and spent my whole day on this. (funding, anyone?)
The original issue reported in this ticket (spam from qq.com domain) still exists.
- Typical log message today showing successful emails sent to our list owners from the qq.com domain:
Jun 18 11:08:51 osgeo6 postfix/qmgr[23549]: 173A7600C6B7: from=<123725849@qq.com>, size=956, nrcpt=1 (queue active) Jun 18 11:08:51 osgeo6 postfix/pipe[24762]: 173A7600C6B7: to=<mapguide-internals-owner@lists.osgeo.org>, relay=mailman, delay=0.69, delays=0.54/0/0/0.15, dsn=2.0.0, status=sent (delivered via mailman service)
- So I examined our postfix config files.
- /etc/postfix/access contains: qq.com REJECT
- so something wasn't right, because the qq.com domain is not being rejected
- I noticed that the config file /etc/postfix/main.cf was missing the important line:
smtpd_sender_restrictions = check_sender_access hash:/etc/postfix/access
- restarted service
- logs say that qq.com now REJECTS the qq.com domain:
Jun 18 11:35:04 osgeo6 postfix/smtpd[17873]: NOQUEUE: reject: RCPT from unknown[114.228.74.19]: 554 5.7.1 <676479210@qq.com>: Sender address rejected: Access denied; from=<676479210@qq.com> to=<discuss-bounces@lists.osgeo.org> proto=SMTP helo=<mail.tofine.com>
- but that slows the queue as postfix tries to send a rejection email to a broken qq.com sender. So updated the access file to DISCARD instead, when allows postfix to crunch faster:
Jun 18 11:40:23 osgeo6 postfix/smtpd[20305]: NOQUEUE: discard: RCPT from unknown[125.121.117.70]: <491235343@qq.com>: Sender address triggers DISCARD action; from=<491235343@qq.com> to=<gdal-dev-owner@lists.osgeo.org> proto=SMTP helo=<chinarida.com.cn>
I am watching the logs being processed now. I hope this change helps!!!
comment:47 by , 7 years ago
PS. we really need to fund this work. I also missed the England-Tunisia soccer game for this!!!!
comment:48 by , 7 years ago
We're actually funding some sysadmin work, there's a Milestone you can set to assign to the funded person (currently Martin).
I guess we could have more than one person at the time doing this, as it looks like a single one isn't enough.
comment:49 by , 5 years ago
Milestone: | → Sysadmin Contract 2019-II |
---|
It's been reported the problem is back -- I'm adding this to the sysadmin milestone, hopefully it'll get more visibility then :)
comment:50 by , 5 years ago
strk you have any particular lists named. I've definitely been getting mail from lists fine but haven't reconciled with the time the item got sent to the mailing lists.
Unfortunately the mailing list server is one of the servers I know little about. I'll take a look at it this weekend.
comment:51 by , 5 years ago
okay I just got a response from SAC mailing list immediately after submitting this ticket, so maybe the issue is isolated to some domains?
comment:52 by , 5 years ago
It could be my setup but I have the impression mails sent to mantra-request arrive late (me and Kalxas often both reply due to one not seing the reply from the other in time).
comment:55 by , 5 years ago
Milestone: | Sysadmin Contract 2019-II → Unplanned |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
I don't think this is an issue anymore
comment:56 by , 5 years ago
We could suggest mailing list administrators to reduce mail size using content filtering (supported by Mailman to do things like removing multipart/alternatives to keep a single alternative)
I responded to Even's post last night and, 12+ hours later, it still has not arrived to the list archives.
An hour ago, I sent a new post to the list, not archived yet too.