Opened 2 years ago

Last modified 5 weeks ago

#2001 reopened defect

Mailing list delay sending messages

Reported by: Jeff McKenna Owned by: sac@…
Priority: critical Milestone: Sysadmin Contract 2019-II
Component: Mailing Lists Keywords:
Cc: rouault, Mateusz Łoskot, strk, wildintellect, jef, neteler, sac@…

Description

This was reported on the SAC list by Even (https://lists.osgeo.org/pipermail/sac/2017-September/008467.html), and I can confirm that today there is at least a 1 hour+ delay in messages. (I had tested this Friday and it wasn't an issue for me, but today it is).

Change History (54)

comment:1 Changed 2 years ago by Mateusz Łoskot

I responded to Even's post last night and, 12+ hours later, it still has not arrived to the list archives.

An hour ago, I sent a new post to the list, not archived yet too.

comment:2 Changed 2 years ago by Jeff McKenna

I am trying to restart the mailman service on osgeo6 VM:

  sudo /etc/init.d/mailman restart

     jmckenna is not in the sudoers file.  This incident will be reported.

Can someone please add user 'jmckenna' to sudoers on osgeo6? (since during off-hours it is always me tackling these issues) Thanks!

comment:3 Changed 2 years ago by Jeff McKenna

I have alternatively tried:

  sudo /usr/lib/mailman/bin/mailmanctl restart

    jmckenna is not in the sudoers file.  This incident will be reported.

comment:4 Changed 2 years ago by Mateusz Łoskot

Jeff, seems you've nudged the mailmain so the queued posts have arrived, in my inbox and archives too

https://lists.osgeo.org/pipermail/sac/2017-September/thread.html

comment:5 Changed 2 years ago by neteler

Copied here since mailman currently doesn't deliver it in time:


From: Markus Neteler <neteler@osgeo.org>
Date: Sun, Sep 24, 2017 at 6:17 PM
Subject: Re: [SAC] Long email distribution delays on OSGeo lists since yesterday
To: System Administration Committee Discussion/OSGeo <sac@lists.osgeo.org>

Yes, looks like that: http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html

mailq | grep mailman-bounces@lists.osgeo.org | wc -l
91552

One issue is probably a single user/the related mail server (the same email address as last time) which currently has a backlog of

# email address not disclosed publicly:
mailq | grep OMITTED | wc -l
39172

The error:
451 Recipient is busy, please try again later

MartinS may apply the postfix trick from last time to relax the situation.

Best, Markus

comment:6 Changed 2 years ago by neteler

Cc: sac@… added

Adding SAC in CC, otherwise hard to know for SAC members

comment:7 in reply to:  5 Changed 2 years ago by neteler

Replying to neteler:

mailq | grep mailman-bounces@lists.osgeo.org | wc -l
91552

Now:

mailq | grep mailman-bounces@lists.osgeo.org | wc -l
69012

The situation seems to get better:

http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html

comment:8 Changed 2 years ago by jsanz

This ticket is assigned to me but to be honest I'm quite clueless on how to help in alleviating this issue.

comment:9 Changed 2 years ago by strk

Should we change ownership of "Mailing List" component back to the SAC mailing list ?

comment:10 Changed 2 years ago by strk

Owner: changed from jsanz to sac@…

For the record: things continue getting better:

# mailq | grep mailman-bounces@lists.osgeo.org | wc -l
6967

comment:11 Changed 2 years ago by strk

Today there are 1504 messages in queue, of which 1453 (97%) are bounces to a single address of Ari Jolma (user ajolma). Does anyone have an offline contact with that user ?

comment:12 in reply to:  11 Changed 2 years ago by rouault

Replying to strk:

Today there are 1504 messages in queue, of which 1453 (97%) are bounces to a single address of Ari Jolma (user ajolma). Does anyone have an offline contact with that user ?

I do. Ari is a GDAL contributor. Probably an old email of him not unsubscribed. I'll email him about that

comment:13 Changed 2 years ago by jef

It's a gmail.com address - and there are a lot more. And for some there is:

450-4.2.1 The user you are trying to contact is receiving mail at a rate that
450-4.2.1 prevents additional messages from being delivered. Please resend your
450-4.2.1 message at a later time. If the user is able to receive mail at that
450-4.2.1 time, your message will be delivered. For more information, please
450-4.2.1 visit 450 4.2.1  https://support.google.com/mail/?p=ReceivingRate 100si441760lfs.108 - gsmtp (in reply to RCPT TO command))

Who can add a SPF to the osgeo.org DNS zone? Apparently DNS is not run by us but PAIRNIC.COM.

Sorry, actually above is a different message than yesterday - which pointed at "authentication" via SPF.

Last edited 2 years ago by jef (previous) (diff)

comment:14 Changed 2 years ago by ajolma

So my gmail.com address is a problem? What could I do?

comment:15 Changed 2 years ago by jef

Others "top" users:

  • maillists at codeha dot us (not sure who that is - isn't codehaus dead?)
  • maplabs at light42 dot com (aka darkblue_b)
Last edited 2 years ago by jef (previous) (diff)

comment:16 in reply to:  14 Changed 2 years ago by jef

Replying to ajolma:

So my gmail.com address is a problem? What could I do?

Not yours specifically, looks like we're just trying to send too much mail to gmail.

comment:17 in reply to:  13 Changed 2 years ago by jef

Replying to jef:

Sorry, actually above is a different message than yesterday - which pointed at "authentication" via SPF.

That's the one:

421-4.7.0 suspicious due to the very low reputation of the sending domain. To
421-4.7.0 best protect our users from spam, the message has been blocked.
421-4.7.0 Please visit
421 4.7.0  https://support.google.com/mail/answer/188131 for more information. y40si1011389pla.813 - gsmtp (in reply to end of DATA command)

comment:18 Changed 2 years ago by neteler

Concerning SPF, see also #1454

Last edited 2 years ago by jef (previous) (diff)

comment:19 Changed 2 years ago by jef

Hm, is a SPF for osgeo.org enough? Shouldn't we also have one for lists.osgeo.org (which is where the mail originated from)?

Apparently yes: https://serverfault.com/questions/322949/do-spf-records-for-primary-domain-apply-to-subdomains

comment:20 Changed 2 years ago by strk

we could have all domains delegate to spf.osgeo.org to centralize control of SPF record. Who has access to DNS ?

comment:21 in reply to:  20 Changed 2 years ago by jef

Replying to strk:

we could have all domains delegate to spf.osgeo.org to centralize control of SPF record. Who has access to DNS ?

http://wiki.osgeo.org/wiki/SAC_DNS_Registry

comment:22 Changed 2 years ago by wildintellect

Anyone with access to Secure can access the DNS, also the Treasurer of the Board.

Can you please clarify what changes need to be made?

The SPF record for osgeo.org currently has

v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org

Here is the syntax guide: http://www.openspf.org/SPF_Record_Syntax

comment:23 Changed 2 years ago by wildintellect

I copied the @ SPF record to lists and mail subdomains as TXT records. Apparently dedicated SPF records are depricated and no longer recommended.

@ » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org"
lists » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org -all"
mail » "v=spf1 mx a:mail.osgeo.org a:lists.osgeo.org a:mail.osgeo.osuosl.org -all"

Several online validators show these are ok. Though I'm getting mixed answers from the documentation about the -all at the end. Let's see if gmail is happy now.

comment:24 Changed 2 years ago by strk

I'd use ?all (neutral)

comment:25 Changed 2 years ago by Jeff McKenna

Today there seems to be a 2+ hour delay again for emails to OSGeo lists. The problem is back.

comment:26 Changed 2 years ago by Jeff McKenna

Cc: strk wildintellect jef neteler added

Repeating last message (and adding people to the CC, to throw direct Trac notifications to interested members): problem is back today, with 2+ hour delay again. Likely if left alone this will be 7+ hours again by Monday.

comment:27 Changed 2 years ago by Jeff McKenna

update: several international SMS messages later....today is also a big deadline for the first stage of proposals for FOSS4G-2019. Well this caused chaos because the Conference-Dev list is now more than 2 hours delayed....

comment:28 Changed 2 years ago by jef

More than 45000 mails queued. looks like chinese spam from @qq.com. cleaning...

http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/postfix_mailqueue.html

Last edited 2 years ago by jef (previous) (diff)

comment:29 Changed 2 years ago by jef

Down to 1333. @qq.com gets rejected in postfix now.

comment:30 Changed 2 years ago by jef

also rejecting and cleaned @163.com and @139.com. "just" 651 requests left in queue.

comment:31 Changed 2 years ago by jef

update: spammer is still trying to feed us mails from numerous IPs. postfix is rejecting them and in turn fail2ban bans the IPs.

~600 queued requests for quite a while now.

comment:32 Changed 2 years ago by strk

Jef: are these rejections automatic or are you doing that manually ?

comment:33 in reply to:  32 Changed 2 years ago by jef

Replying to strk:

Jef: are these rejections automatic or are you doing that manually ?

Postfix rejects those three domains automatically after they were added to /etc/postfix/access (and use of access was setup).

comment:34 Changed 2 years ago by strk

The /etc directory on that server is under a local git, could you add the relevant configurations in that repository to better track changes ?

comment:35 in reply to:  34 Changed 2 years ago by jef

Replying to strk:

The /etc directory on that server is under a local git, could you add the relevant configurations in that repository to better track changes ?

done.

comment:36 Changed 2 years ago by Jeff McKenna

@qq.com has been posting as a non-member to OSGeo lists for a while now...thanks Jürgen!!!!!!!!!

comment:37 Changed 2 years ago by jef

domains also blocked on projects (otherwise relays to osgeo6, which osgeo6's postfix blocks and in turn fail2ban blocks all mails from projects).

comment:38 Changed 2 years ago by jef

Resolution: fixed
Status: newclosed

Delivery back to normal for a while now...

comment:39 Changed 18 months ago by Jeff McKenna

Resolution: fixed
Status: closedreopened

qq.com email addresses are spaming our -owner aliases now (see ticket# 1778 https://trac.osgeo.org/osgeo/ticket/1778). Can the fix from comment#28 be applied again? "@qq.com gets rejected in postfix now"

comment:40 Changed 18 months ago by Jeff McKenna

more info from the headers of one of the many @qq.com spams received today, showing how OSGeo postfix processes it:

from osgeo6.osgeo.osuosl.org (localhost [127.0.0.1]) by lists.osgeo.org (Postfix) with ESMTP id 23FB8605B830 for <jmckenna@gatewaygeomatics.com>; Tue,  5 Jun 2018 10:12:28 -0700 (PDT)

from plasticscrap.us (unknown [123.8.242.201]) by lists.osgeo.org (Postfix) with SMTP id CC8D960650B2 for <atlanticcanada-owner@lists.osgeo.org>; Tue,  5 Jun 2018 10:12:24 -0700 (PDT)

from plasticscrap.us (unknown (96.164.231.142]) by plasticscrap.us with SMTP id e2ef1992-d623-451a-a68c-94630c9eefdc; for <2498073052@qq.com>;Wed, 06 Jun 2018 01:12:34 +08:00
}}

comment:42 Changed 17 months ago by Jeff McKenna

current queue: 29574

bounces: 6692

comment:43 Changed 17 months ago by rouault

Perhaps related or the same as the issue Jeff McKenna? raises above, since a few days, I don't seem to receive copies of the emails I send to mailing lists (observed on difference mailing lists, like gdal-dev, COG, mapserver-dev). I do receive answers from others to my posts, but I don't receive th copy of my own posts. Has any configuration change being done ?

comment:44 Changed 17 months ago by Jeff McKenna

I've examined the logs closely and spent my whole day on this. (funding, anyone?)

The original issue reported in this ticket (spam from qq.com domain) still exists.

  • Typical log message today showing successful emails sent to our list owners from the qq.com domain:
Jun 18 11:08:51 osgeo6 postfix/qmgr[23549]: 173A7600C6B7: from=<123725849@qq.com>, size=956, nrcpt=1 (queue active)
Jun 18 11:08:51 osgeo6 postfix/pipe[24762]: 173A7600C6B7: to=<mapguide-internals-owner@lists.osgeo.org>, relay=mailman, delay=0.69, delays=0.54/0/0/0.15, dsn=2.0.0, status=sent (delivered via mailman service)
  • So I examined our postfix config files.
  • /etc/postfix/access contains: qq.com REJECT
  • so something wasn't right, because the qq.com domain is not being rejected
  • I noticed that the config file /etc/postfix/main.cf was missing the important line:
    smtpd_sender_restrictions = check_sender_access hash:/etc/postfix/access
    
  • restarted service
  • logs say that qq.com now REJECTS the qq.com domain:
    Jun 18 11:35:04 osgeo6 postfix/smtpd[17873]: NOQUEUE: reject: RCPT from unknown[114.228.74.19]: 554 5.7.1 <676479210@qq.com>: Sender address rejected: Access denied; from=<676479210@qq.com> to=<discuss-bounces@lists.osgeo.org> proto=SMTP helo=<mail.tofine.com>
    
  • but that slows the queue as postfix tries to send a rejection email to a broken qq.com sender. So updated the access file to DISCARD instead, when allows postfix to crunch faster:
    Jun 18 11:40:23 osgeo6 postfix/smtpd[20305]: NOQUEUE: discard: RCPT from unknown[125.121.117.70]: <491235343@qq.com>: Sender address triggers DISCARD action; from=<491235343@qq.com> to=<gdal-dev-owner@lists.osgeo.org> proto=SMTP helo=<chinarida.com.cn>
    

I am watching the logs being processed now. I hope this change helps!!!

Last edited 17 months ago by Jeff McKenna (previous) (diff)

comment:45 Changed 17 months ago by Jeff McKenna

I also have removed all @qq.com requests from the queue.

comment:46 Changed 17 months ago by Jeff McKenna

it's crunching away now!!!

comment:47 Changed 17 months ago by Jeff McKenna

PS. we really need to fund this work. I also missed the England-Tunisia soccer game for this!!!!

comment:48 Changed 17 months ago by strk

We're actually funding some sysadmin work, there's a Milestone you can set to assign to the funded person (currently Martin).

I guess we could have more than one person at the time doing this, as it looks like a single one isn't enough.

comment:49 Changed 5 weeks ago by strk

Milestone: Sysadmin Contract 2019-II

It's been reported the problem is back -- I'm adding this to the sysadmin milestone, hopefully it'll get more visibility then :)

comment:50 Changed 5 weeks ago by robe

strk you have any particular lists named. I've definitely been getting mail from lists fine but haven't reconciled with the time the item got sent to the mailing lists.

Unfortunately the mailing list server is one of the servers I know little about. I'll take a look at it this weekend.

comment:51 Changed 5 weeks ago by robe

okay I just got a response from SAC mailing list immediately after submitting this ticket, so maybe the issue is isolated to some domains?

comment:52 Changed 5 weeks ago by strk

It could be my setup but I have the impression mails sent to mantra-request arrive late (me and Kalxas often both reply due to one not seing the reply from the other in time).

comment:53 Changed 5 weeks ago by wildintellect

Is mantra a list or an alias?

comment:54 in reply to:  53 Changed 5 weeks ago by jef

Replying to wildintellect:

Is mantra a list or an alias?

alias

Note: See TracTickets for help on using tickets.