Opened 7 years ago

Closed 5 years ago

Last modified 5 years ago

#2001 closed defect (fixed)

Mailing list delay sending messages

Reported by: Jeff McKenna Owned by: sac@…
Priority: critical Milestone: Unplanned
Component: SysAdmin/Mailman Keywords:
Cc: rouault, Mateusz Łoskot, strk, wildintellect, jef, neteler, sac@…

Description

This was reported on the SAC list by Even (https://lists.osgeo.org/pipermail/sac/2017-September/008467.html), and I can confirm that today there is at least a 1 hour+ delay in messages. (I had tested this Friday and it wasn't an issue for me, but today it is).

Change History (56)

comment:1 by Mateusz Łoskot, 7 years ago

I responded to Even's post last night and, 12+ hours later, it still has not arrived to the list archives.

An hour ago, I sent a new post to the list, not archived yet too.

comment:2 by Jeff McKenna, 7 years ago

I am trying to restart the mailman service on osgeo6 VM:

  sudo /etc/init.d/mailman restart

     jmckenna is not in the sudoers file.  This incident will be reported.

Can someone please add user 'jmckenna' to sudoers on osgeo6? (since during off-hours it is always me tackling these issues) Thanks!

comment:3 by Jeff McKenna, 7 years ago

I have alternatively tried:

  sudo /usr/lib/mailman/bin/mailmanctl restart

    jmckenna is not in the sudoers file.  This incident will be reported.

comment:4 by Mateusz Łoskot, 7 years ago

Jeff, seems you've nudged the mailmain so the queued posts have arrived, in my inbox and archives too

https://lists.osgeo.org/pipermail/sac/2017-September/thread.html

comment:5 by neteler, 7 years ago

Copied here since mailman currently doesn't deliver it in time:


From: Markus Neteler <neteler@osgeo.org>
Date: Sun, Sep 24, 2017 at 6:17 PM
Subject: Re: [SAC] Long email distribution delays on OSGeo lists since yesterday
To: System Administration Committee Discussion/OSGeo <sac@lists.osgeo.org>

Yes, looks like that: http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html

mailq | grep mailman-bounces@lists.osgeo.org | wc -l
91552

One issue is probably a single user/the related mail server (the same email address as last time) which currently has a backlog of

# email address not disclosed publicly:
mailq | grep OMITTED | wc -l
39172

The error:
451 Recipient is busy, please try again later

MartinS may apply the postfix trick from last time to relax the situation.

Best, Markus

comment:6 by neteler, 7 years ago

Cc: sac@… added

Adding SAC in CC, otherwise hard to know for SAC members

in reply to:  5 comment:7 by neteler, 7 years ago

Replying to neteler:

mailq | grep mailman-bounces@lists.osgeo.org | wc -l
91552

Now:

mailq | grep mailman-bounces@lists.osgeo.org | wc -l
69012

The situation seems to get better:

http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html

comment:8 by jsanz, 7 years ago

This ticket is assigned to me but to be honest I'm quite clueless on how to help in alleviating this issue.

comment:9 by strk, 7 years ago

Should we change ownership of "Mailing List" component back to the SAC mailing list ?

comment:10 by strk, 7 years ago

Owner: changed from jsanz to sac@…

For the record: things continue getting better:

# mailq | grep mailman-bounces@lists.osgeo.org | wc -l
6967

comment:11 by strk, 7 years ago

Today there are 1504 messages in queue, of which 1453 (97%) are bounces to a single address of Ari Jolma (user ajolma). Does anyone have an offline contact with that user ?

in reply to:  11 comment:12 by rouault, 7 years ago

Replying to strk:

Today there are 1504 messages in queue, of which 1453 (97%) are bounces to a single address of Ari Jolma (user ajolma). Does anyone have an offline contact with that user ?

I do. Ari is a GDAL contributor. Probably an old email of him not unsubscribed. I'll email him about that

comment:13 by jef, 7 years ago

It's a gmail.com address - and there are a lot more. And for some there is:

450-4.2.1 The user you are trying to contact is receiving mail at a rate that 450-4.2.1 prevents additional messages from being delivered. Please resend your 450-4.2.1 message at a later time. If the user is able to receive mail at that 450-4.2.1 time, your message will be delivered. For more information, please 450-4.2.1 visit 450 4.2.1  https://support.google.com/mail/?p=ReceivingRate 100si441760lfs.108 - gsmtp (in reply to RCPT TO command))

Who can add a SPF to the osgeo.org DNS zone? Apparently DNS is not run by us but PAIRNIC.COM.

Version 1, edited 7 years ago by jef (previous) (next) (diff)

comment:14 by ajolma, 7 years ago

So my gmail.com address is a problem? What could I do?

comment:15 by jef, 7 years ago

Others "top" users:

  • maillists at codeha dot us (not sure who that is - isn't codehaus dead?)
  • maplabs at light42 dot com (aka darkblue_b)
Last edited 7 years ago by jef (previous) (diff)

in reply to:  14 comment:16 by jef, 7 years ago

Replying to ajolma:

So my gmail.com address is a problem? What could I do?

Not yours specifically, looks like we're just trying to send too much mail to gmail.

in reply to:  13 comment:17 by jef, 7 years ago

Replying to jef:

Sorry, actually above is a different message than yesterday - which pointed at "authentication" via SPF.

That's the one:

421-4.7.0 suspicious due to the very low reputation of the sending domain. To
421-4.7.0 best protect our users from spam, the message has been blocked.
421-4.7.0 Please visit
421 4.7.0  https://support.google.com/mail/answer/188131 for more information. y40si1011389pla.813 - gsmtp (in reply to end of DATA command)

comment:18 by neteler, 7 years ago

Concerning SPF, see also #1454

Last edited 7 years ago by jef (previous) (diff)

comment:19 by jef, 7 years ago

Hm, is a SPF for osgeo.org enough? Shouldn't we also have one for lists.osgeo.org (which is where the mail originated from)?

Apparently yes: https://serverfault.com/questions/322949/do-spf-records-for-primary-domain-apply-to-subdomains

comment:20 by strk, 7 years ago

we could have all domains delegate to spf.osgeo.org to centralize control of SPF record. Who has access to DNS ?

in reply to:  20 comment:21 by jef, 7 years ago

Replying to strk:

we could have all domains delegate to spf.osgeo.org to centralize control of SPF record. Who has access to DNS ?

http://wiki.osgeo.org/wiki/SAC_DNS_Registry

comment:22 by wildintellect, 7 years ago

Anyone with access to Secure can access the DNS, also the Treasurer of the Board.

Can you please clarify what changes need to be made?

The SPF record for osgeo.org currently has

v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org

Here is the syntax guide: http://www.openspf.org/SPF_Record_Syntax

comment:23 by wildintellect, 7 years ago

I copied the @ SPF record to lists and mail subdomains as TXT records. Apparently dedicated SPF records are depricated and no longer recommended.

@ » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org"
lists » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org -all"
mail » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org -all"

Several online validators show these are ok. Though I'm getting mixed answers from the documentation about the -all at the end. Let's see if gmail is happy now.

comment:24 by strk, 7 years ago

I'd use ?all (neutral)

comment:25 by Jeff McKenna, 7 years ago

Today there seems to be a 2+ hour delay again for emails to OSGeo lists. The problem is back.

comment:26 by Jeff McKenna, 7 years ago

Cc: strk wildintellect jef neteler added

Repeating last message (and adding people to the CC, to throw direct Trac notifications to interested members): problem is back today, with 2+ hour delay again. Likely if left alone this will be 7+ hours again by Monday.

comment:27 by Jeff McKenna, 7 years ago

update: several international SMS messages later....today is also a big deadline for the first stage of proposals for FOSS4G-2019. Well this caused chaos because the Conference-Dev list is now more than 2 hours delayed....

comment:28 by jef, 7 years ago

More than 45000 mails queued. looks like chinese spam from @qq.com. cleaning...

http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html

Last edited 7 years ago by jef (previous) (diff)

comment:29 by jef, 7 years ago

Down to 1333. @qq.com gets rejected in postfix now.

comment:30 by jef, 7 years ago

also rejecting and cleaned @163.com and @139.com. "just" 651 requests left in queue.

comment:31 by jef, 7 years ago

update: spammer is still trying to feed us mails from numerous IPs. postfix is rejecting them and in turn fail2ban bans the IPs.

~600 queued requests for quite a while now.

comment:32 by strk, 7 years ago

Jef: are these rejections automatic or are you doing that manually ?

in reply to:  32 comment:33 by jef, 7 years ago

Replying to strk:

Jef: are these rejections automatic or are you doing that manually ?

Postfix rejects those three domains automatically after they were added to /etc/postfix/access (and use of access was setup).

comment:34 by strk, 7 years ago

The /etc directory on that server is under a local git, could you add the relevant configurations in that repository to better track changes ?

in reply to:  34 comment:35 by jef, 7 years ago

Replying to strk:

The /etc directory on that server is under a local git, could you add the relevant configurations in that repository to better track changes ?

done.

comment:36 by Jeff McKenna, 7 years ago

@qq.com has been posting as a non-member to OSGeo lists for a while now...thanks Jürgen!!!!!!!!!

comment:37 by jef, 7 years ago

domains also blocked on projects (otherwise relays to osgeo6, which osgeo6's postfix blocks and in turn fail2ban blocks all mails from projects).

comment:38 by jef, 7 years ago

Resolution: fixed
Status: newclosed

Delivery back to normal for a while now...

comment:39 by Jeff McKenna, 7 years ago

Resolution: fixed
Status: closedreopened

qq.com email addresses are spaming our -owner aliases now (see ticket# 1778 https://trac.osgeo.org/osgeo/ticket/1778). Can the fix from comment#28 be applied again? "@qq.com gets rejected in postfix now"

comment:40 by Jeff McKenna, 7 years ago

more info from the headers of one of the many @qq.com spams received today, showing how OSGeo postfix processes it:

from osgeo6.osgeo.osuosl.org (localhost [127.0.0.1]) by lists.osgeo.org (Postfix) with ESMTP id 23FB8605B830 for <jmckenna@gatewaygeomatics.com>; Tue,  5 Jun 2018 10:12:28 -0700 (PDT)

from plasticscrap.us (unknown [123.8.242.201]) by lists.osgeo.org (Postfix) with SMTP id CC8D960650B2 for <atlanticcanada-owner@lists.osgeo.org>; Tue,  5 Jun 2018 10:12:24 -0700 (PDT)

from plasticscrap.us (unknown (96.164.231.142]) by plasticscrap.us with SMTP id e2ef1992-d623-451a-a68c-94630c9eefdc; for <2498073052@qq.com>;Wed, 06 Jun 2018 01:12:34 +08:00
}}

comment:42 by Jeff McKenna, 7 years ago

current queue: 29574

bounces: 6692

comment:43 by rouault, 7 years ago

Perhaps related or the same as the issue Jeff McKenna raises above, since a few days, I don't seem to receive copies of the emails I send to mailing lists (observed on difference mailing lists, like gdal-dev, COG, mapserver-dev). I do receive answers from others to my posts, but I don't receive th copy of my own posts. Has any configuration change being done ?

comment:44 by Jeff McKenna, 7 years ago

I've examined the logs closely and spent my whole day on this. (funding, anyone?)

The original issue reported in this ticket (spam from qq.com domain) still exists.

  • Typical log message today showing successful emails sent to our list owners from the qq.com domain:
Jun 18 11:08:51 osgeo6 postfix/qmgr[23549]: 173A7600C6B7: from=<123725849@qq.com>, size=956, nrcpt=1 (queue active)
Jun 18 11:08:51 osgeo6 postfix/pipe[24762]: 173A7600C6B7: to=<mapguide-internals-owner@lists.osgeo.org>, relay=mailman, delay=0.69, delays=0.54/0/0/0.15, dsn=2.0.0, status=sent (delivered via mailman service)
  • So I examined our postfix config files.
  • /etc/postfix/access contains: qq.com REJECT
  • so something wasn't right, because the qq.com domain is not being rejected
  • I noticed that the config file /etc/postfix/main.cf was missing the important line:
    smtpd_sender_restrictions = check_sender_access hash:/etc/postfix/access
    
  • restarted service
  • logs say that qq.com now REJECTS the qq.com domain:
    Jun 18 11:35:04 osgeo6 postfix/smtpd[17873]: NOQUEUE: reject: RCPT from unknown[114.228.74.19]: 554 5.7.1 <676479210@qq.com>: Sender address rejected: Access denied; from=<676479210@qq.com> to=<discuss-bounces@lists.osgeo.org> proto=SMTP helo=<mail.tofine.com>
    
  • but that slows the queue as postfix tries to send a rejection email to a broken qq.com sender. So updated the access file to DISCARD instead, when allows postfix to crunch faster:
    Jun 18 11:40:23 osgeo6 postfix/smtpd[20305]: NOQUEUE: discard: RCPT from unknown[125.121.117.70]: <491235343@qq.com>: Sender address triggers DISCARD action; from=<491235343@qq.com> to=<gdal-dev-owner@lists.osgeo.org> proto=SMTP helo=<chinarida.com.cn>
    

I am watching the logs being processed now. I hope this change helps!!!

Last edited 7 years ago by Jeff McKenna (previous) (diff)

comment:45 by Jeff McKenna, 7 years ago

I also have removed all @qq.com requests from the queue.

comment:46 by Jeff McKenna, 7 years ago

it's crunching away now!!!

comment:47 by Jeff McKenna, 7 years ago

PS. we really need to fund this work. I also missed the England-Tunisia soccer game for this!!!!

comment:48 by strk, 7 years ago

We're actually funding some sysadmin work, there's a Milestone you can set to assign to the funded person (currently Martin).

I guess we could have more than one person at the time doing this, as it looks like a single one isn't enough.

comment:49 by strk, 5 years ago

Milestone: Sysadmin Contract 2019-II

It's been reported the problem is back -- I'm adding this to the sysadmin milestone, hopefully it'll get more visibility then :)

comment:50 by robe, 5 years ago

strk you have any particular lists named. I've definitely been getting mail from lists fine but haven't reconciled with the time the item got sent to the mailing lists.

Unfortunately the mailing list server is one of the servers I know little about. I'll take a look at it this weekend.

comment:51 by robe, 5 years ago

okay I just got a response from SAC mailing list immediately after submitting this ticket, so maybe the issue is isolated to some domains?

comment:52 by strk, 5 years ago

It could be my setup but I have the impression mails sent to mantra-request arrive late (me and Kalxas often both reply due to one not seing the reply from the other in time).

comment:53 by wildintellect, 5 years ago

Is mantra a list or an alias?

in reply to:  53 comment:54 by jef, 5 years ago

Replying to wildintellect:

Is mantra a list or an alias?

alias

comment:55 by robe, 5 years ago

Milestone: Sysadmin Contract 2019-IIUnplanned
Resolution: fixed
Status: reopenedclosed

I don't think this is an issue anymore

comment:56 by strk, 5 years ago

We could suggest mailing list administrators to reduce mail size using content filtering (supported by Mailman to do things like removing multipart/alternatives to keep a single alternative)

Note: See TracTickets for help on using tickets.