NP_SpamBayes 1.1.0 done !

Post your new plugins here!
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Fri Sep 29, 2006 12:23 pm

You can't by means of the plugin. The only thing you can do right now is copy and past the log message and leave it as a comment yourself. It's still on the todo list to let comments get trough after examination.
But since I don't get false positives, it's very low on my list right now.
verbaljam
Posts: 666
Joined: Wed Jul 31, 2002 4:58 pm
Location: Amsterdam, The Netherlands
Contact:

Postby verbaljam » Sun Oct 01, 2006 9:44 pm

Some feedback, Xiffy:

The filter works excellent. There is only one thing: somehow certain trackback spam seems to pass the filter. If I remember correctly it is supposed to block trackback spam too. The past week however, I received over 50 trackback spam messages. The trackback plugin blocks all trackbacks by default untill I approve them, so it's not very serious.
I trained a lot of these messages as spam (and in the trackback admin there is a column that marks them as spam indeed). Another strange thing is, that all the trackback spam messages are targeted to one single posting.
Do you have an explanation for this? :?
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Sun Oct 01, 2006 10:57 pm

It does, only in it's original version trackback keeps it's own current 'blocked' trackback state. If you look at each entry you'll see that both 'blocked' and 'spam' show a yes or a '1'. I've modified trackback myself to block trackback completly if spam is found by the spam plugins. Because trackback still did email me about blocked trackbacks and did place the trackbacks in the queue.
[edited!]

Code: Select all

// 4. SPAM check
            $spamcheck = array (
               'type'     => 'trackback',
               'id'           => $tb_id,
               'title'      => $title,
               'excerpt'   => $excerpt,
               'blogname'     => $blog_name,
               'url'      => $url,
               'return'   => true,
               'live'      => true,

               /* Backwards compatibility with SpamCheck API 1*/
               'data'      => $url . ' ' . $title . ' ' . $excerpt . ' ' . $blog_name,

make the 'return' line say

Code: Select all

               'return'   => false,

[/edited]

The last question is interesting and you could find out why they target that particular item. Probably it shows high in a google search for the keywords they are targetting. To know which keywords they are targetting you must follow the links in the spam message and find what get's promoted by those site, search for google addwords that kind of stuff.
This is the place where it all get's fishy and stinky. Lot of cheap rancid tricks as forwarding, link-hotels and splogs (spam logs) but if you are curious enough you could be able to find out. And also look in your referrals for that item and see how other / real visitors arrive. The first one targetting your item will most likely have followed one of those searches
verbaljam
Posts: 666
Joined: Wed Jul 31, 2002 4:58 pm
Location: Amsterdam, The Netherlands
Contact:

Postby verbaljam » Mon Oct 02, 2006 12:33 pm

Thanks very much, Xiffy! I changed the trackback plugin. I guess all leaks are closed now (except from the notifyme plugin, that gets spammed with mail addresses. But I believe Admun is working on that...).
besonen
Posts: 2
Joined: Wed Oct 25, 2006 3:02 am

Postby besonen » Wed Oct 25, 2006 5:17 pm

is this plugin at all derived from the better known SpamBayes python project?:

http://spambayes.sourceforge.net/


-- david
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Wed Oct 25, 2006 5:54 pm

Not to my knowledge; for me it all started here:
http://www.xhtml.net/php/PHPNaiveBayesianFilter (French warning)
besonen
Posts: 2
Joined: Wed Oct 25, 2006 3:02 am

Postby besonen » Thu Oct 26, 2006 7:33 pm

xiffy wrote:Not to my knowledge; for me it all started here:
http://www.xhtml.net/php/PHPNaiveBayesianFilter (French warning)


thanks for letting me know.

how well does this plugin work? when i was imagining that this plugin interfaced with spambayes.sf.net i was thinking that i could use the work done on this plugin to create similar plugins for wordpress and fudforum. spambayes.sf.net is a very well performing bayesian filter.


-- david
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Thu Oct 26, 2006 8:02 pm

spambayes.sf.net is a very well performing bayesian filter.

I know, but because i wanted to deliver a package and not an interface I did not use it. For me it works extremely well, but results may vary because the filter is build by the user and the ham and spam he gets. The Bayesian part is not too fancy, just some wordscores. spambayes.sf is much more sophisticated so i think it's a nice project ;-)
User avatar
admun
Nucleus Guru
Nucleus Guru
Posts: 4088
Joined: Mon Oct 20, 2003 2:57 am
Location: San Francisco, CA, USA
Contact:

Postby admun » Thu Oct 26, 2006 9:03 pm

verbaljam wrote:Thanks very much, Xiffy! I changed the trackback plugin. I guess all leaks are closed now (except from the notifyme plugin, that gets spammed with mail addresses. But I believe Admun is working on that...).

NP_NotifyMe is now with email auth..... so I just removed all never authenicate subscriptions once in a while.....

As for this small changes, actually I recently hack a small change into 2.0.3 to allow user to decide whether to directly block the spam or go through the manually first... maybe I will post it up sometimes. :wink:
User avatar
admun
Nucleus Guru
Nucleus Guru
Posts: 4088
Joined: Mon Oct 20, 2003 2:57 am
Location: San Francisco, CA, USA
Contact:

Postby admun » Tue Oct 31, 2006 4:08 pm

hey xiffy,

Need some idea from you. I am thinking to add a "Train and delete" function in NP_Trackback's blocked tb menu to battle those tb spam that get through. I am using NP_SpamBayes but some of them still get through because they are using plain harmless word... with only differnet URL.

Any idea what is the best way to call the plugin to train it directly? I see there is a form that does that... and wonder there is anotehr way (yeah... I am a bit lazy to look at the code at the moment)

Any hints will be welcome. :wink:

cheers,
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Thu Nov 02, 2006 10:13 am

admun,
just put up a link to
http://yourserver.com/nucleus/plugins/s ... alprazolam
where expression contains the url-encoded message that needs to be added to the database.
User avatar
admun
Nucleus Guru
Nucleus Guru
Posts: 4088
Joined: Mon Oct 20, 2003 2:57 am
Location: San Francisco, CA, USA
Contact:

Postby admun » Thu Nov 02, 2006 5:17 pm

xiffy wrote:admun,
just put up a link to
http://yourserver.com/nucleus/plugins/s ... alprazolam
where expression contains the url-encoded message that needs to be added to the database.

thanx, I will give it a try.
User avatar
Leng
Nucleus Guru
Nucleus Guru
Posts: 2827
Joined: Sun Sep 19, 2004 2:34 am
Location: Australia
Contact:

Postby Leng » Sun Nov 19, 2006 2:05 am

Bumping to let xiffy know I have installed this on the Nucleus FAQ site, because goodness knows I am sick of going through the FAQ email only to be greeted by thousands of spam. :?

Hopefully this will significantly cut down on it all!

Some other suggestions for when you have time:

1. Can we bring back the "Delete all on current page" option in the Spam Log? While the "Delete current filter" option is great, when there are a lot of spam log events, one loses track of where one is up to with training spam/ham very easily.

2. In line with the above, can we get "Page X of Y" in the spam log event page?

3. Option to change how many spam events are displayed in the log, similar to the default Nucleus blog interface with comments and items.

4. Can SpamBayes be modified to use the core Nucleus function on the admin item searching?

5. Modify the spam log event page to have check boxes so multiple events can be trained at once, with a "Select All" option. It takes a very long time if one has to click on "Train spam" for every single item!

6. Alternative option to suggestion #1 in combination with #5: allow "Select All", then "Train as spam and remove from event log" (and the corresponding "Train as ham and remove from event log"). Then it is easy to see what new events registered that have not been appropriately trained yet.
Image
deborahlau.com | To-Do List
Questions? See the FAQ, read the docs, or browse our plugins!!
User avatar
Leng
Nucleus Guru
Nucleus Guru
Posts: 2827
Joined: Sun Sep 19, 2004 2:34 am
Location: Australia
Contact:

Postby Leng » Wed Dec 06, 2006 1:39 pm

Bumping to add another suggestion:

Can NP_SpamBayes be modified to log the messages send via the membermailform? At the moment in the FAQ, I am just getting blank eventlogs, with no message body.
Image

deborahlau.com | To-Do List
Questions? See the FAQ, read the docs, or browse our plugins!!
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Wed Dec 06, 2006 2:31 pm

I'll look into this 2morrow. Then i'll post an alternative NP_SpamBayes.php which has ham logging enabled for a couple of other events as well, like trackback (a question of Admun a couple of weeks ago)
But I know that I had a lot of empty messages going through the member mail form as well. Probably some spammers testing the possibilities of misuse.
User avatar
Leng
Nucleus Guru
Nucleus Guru
Posts: 2827
Joined: Sun Sep 19, 2004 2:34 am
Location: Australia
Contact:

Postby Leng » Fri Dec 08, 2006 11:16 am

xiffy wrote:But I know that I had a lot of empty messages going through the member mail form as well. Probably some spammers testing the possibilities of misuse.

That is a possibility, however I know for sure some of those are valid messages as I was using the member mail form to send suggestions to the faq email. :D

Thanks for looking into it; I look forward to the new version!
Image

deborahlau.com | To-Do List
Questions? See the FAQ, read the docs, or browse our plugins!!
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Fri Dec 08, 2006 3:36 pm

Leng, now I remember. The membermailform does not call the SpamEvent as it supposed to do. I patched this on my local install and send the patch to the developer list. You need to have an altered ACTION.php (included in zip below)
To enable 'ham logging' on the external events (everything besides comment checking like referrer, membermail, trackback and so on) The attached zip contains a modded NP_SpamBayes.php, the other files are unaffected. For Admun, this does contain 'ham loggin for trackback so the occasionally missed spam can be taught easily.
For the php-hackers; the fun starts on line 142 in the NP_SpamBayes.php file

Download zipfile
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Fri Jan 05, 2007 9:12 pm

It's been a while I know. But sometimes I like to think I'm busy or like to be lazy. Today I've seen event 300.000 in my SpamBayes log. That's a lot. There are now 4 different sites I'm monitoring using this plugin and scimming through the logfiles. Just to see what spammers are up to lately. Because the amount coming through is almost zero. I now have the guts again to go on holiday and leave the comments on 8) .

I remembered that there were a couple of tweaks that had been sugested and some things left to be desired by myself.
1. Can we bring back the "Delete all on current page" option in the Spam Log? While the "Delete current filter" option is great, when there are a lot of spam log events, one loses track of where one is up to with training spam/ham very easily.

Will do, but with the foornote that I can not guarantee that it will work on all ersions of mysql. That's why I removed it in the first place, but you are right. Since a couple can't use it there is no need to let no one have it.
The other 5 are noted as well and I see what I can do. Doesn't sound too hard to build and all seems plausible to me for usability on high traffic blogs. (I maintain one with 1000 or so legit comments a day. all Ham. And finding 1 particularly comment can be tricky yes)

After that I will add some sort of 'low level' maintenance on the spam bayes database. Because when training messages, some words will be submitted to the wrong category. I will not change anything to that proces because training complete messages / events is very convenient. But it's nice to be able to correct the database on specific words. And to see which words score in a certain category.
like:
cd0kr | spam | 599
kan | ham | 592
drochu | spam | 481

with filter options on ham or spam. paging and a filter on words begining with userinput.

And i will try to get a explain function working where you can see which words scored as spam / ham words and have their rspective scores attached to them. But i'm not sure how easy this will be to get working. It's on my own wishlist.

Cheers all
verbaljam
Posts: 666
Joined: Wed Jul 31, 2002 4:58 pm
Location: Amsterdam, The Netherlands
Contact:

Postby verbaljam » Sat Jan 06, 2007 2:34 pm

This is of course the best spamfilter that has been made for Nucleus so far. No spam has made it to the comments since I use it.

But may I suggest one small improvement? Every now and then a comment ends up as a false positive: being ham it's considered spam. The only thing I can do now is train it as ham, and many times it takes a lot of persuasiveness to convince spambayes that it's really ham, and not spam. ;-)
The problem is: once a comment has been wrongfully marked as spam, there is no way to recover it. It would be nice if there is an option that still puts the comment back on the weblog.
Is that possible?
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Sat Jan 06, 2007 3:11 pm

Added that option on the wishlist. For that I need to store some extra information along with the logged event. But it is not Impossible. So expect this in one of the next releases...

Return to “Plugin Development”