NP_SpamBayes 1.1.0 done !

Post your new plugins here!
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Mon Jan 08, 2007 12:44 am

Well, the new version has been published.
* Items per page adjustable
* Search within filtered events
* Explain function for each event
* Batch action on selected events
* select all / deselect all
* Promote to weblog

All possible with current implementation. This will not make old logevents publishable. Only new captured events will be publishable.
No need to uninstall, just upload and enjoy.
[note, top post edited to reflect the new url: http://wakka.xiffy.nl/_media/np_spambay ... ache=cache ]
Last edited by xiffy on Wed Jan 10, 2007 12:17 am, edited 1 time in total.
verbaljam
Posts: 666
Joined: Wed Jul 31, 2002 4:58 pm
Location: Amsterdam, The Netherlands
Contact:

Postby verbaljam » Mon Jan 08, 2007 1:27 pm

Great work. Feedback: two things so far:

1. When not selecting any item and selecting training ham with the drop down at the bottom of the page, I get the error:

Code: Select all

Warning: Invalid argument supplied for foreach() in /home/virtual/site136/fst/var/www/html/nucleus/plugins/spambayes/index.php on line 261
--end of batch--


2. I cannot find the option to publish a false positive in the weblog. Or did I misunderstand something?

And could you explain the use of 'rescaled probability' please?
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Mon Jan 08, 2007 2:17 pm

2. I cannot find the option to publish a false positive in the weblog. Or did I misunderstand something?

I think you missed something; it only operates on catched events after the upgrade. The old events can't be promoted to the weblog as a comment since the itemid where it was originally posted has not been stored with the event. So there is no way to know where the comment should have gone.
New spam comments (and only comments) will have a red link in the actionbox stating 'publish' ..

The other issue has been resolved. The warning will not be shown ...

Update will be tonight
verbaljam
Posts: 666
Joined: Wed Jul 31, 2002 4:58 pm
Location: Amsterdam, The Netherlands
Contact:

Postby verbaljam » Mon Jan 08, 2007 3:21 pm

You're right of course. I was too impatient and didn't realise that those were older items in the log. Thanks for all this good work!
User avatar
roel
Nucleus Guru
Nucleus Guru
Posts: 4469
Joined: Tue Apr 16, 2002 12:41 am
Location: Rotterdam, The Netherlands
Contact:

Postby roel » Tue Jan 09, 2007 9:52 am

Is your question not solved yet?
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Wed Jan 10, 2007 12:20 am

Thanks Roel.
Okay, 1.1.0 is final NP_SpamBayes1.1.0.zip, little bugfixes like the one noted by verbaljam. Nothing spectacular. There will be a next version 1.2.0, that will give the option to do SpamBayes maintenance on a word level. So you can adjust the scores on some 'wrongly' trained words as you please.
I'll keep you posted.
User avatar
Leng
Nucleus Guru
Nucleus Guru
Posts: 2827
Joined: Sun Sep 19, 2004 2:34 am
Location: Australia
Contact:

Postby Leng » Sat Jan 13, 2007 2:58 am

Just posting to say thanks for the awesome update, xiffy! The new improvements make using SpamBayes a lot easier.
Image
deborahlau.com | To-Do List
Questions? See the FAQ, read the docs, or browse our plugins!!
cyblot
Nucleus Guru
Nucleus Guru
Posts: 399
Joined: Tue Sep 16, 2003 8:49 pm
Location: Netherlands
Contact:

Postby cyblot » Tue Jan 16, 2007 8:38 pm

Leng wrote:Just posting to say thanks for the awesome update, xiffy! The new improvements make using SpamBayes a lot easier.


Yes, I'll second that. The current version is awesome! :D Thanks a lot, it makes dealing with spam a little bit less annoying.
Blots of Info
http://www.golb.org
User avatar
Leng
Nucleus Guru
Nucleus Guru
Posts: 2827
Joined: Sun Sep 19, 2004 2:34 am
Location: Australia
Contact:

Postby Leng » Wed Jan 17, 2007 8:49 am

Just one small bug I have noticed. When using the "train all selected as spam/ham" option, it does not update the DB statistics at the bottom of the page with the new spam or ham words. The word count of the spam/ham in the database is not updated until one clicks on the little "Train ham/spam" link next to a logged item.
Image

deborahlau.com | To-Do List
Questions? See the FAQ, read the docs, or browse our plugins!!
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Wed Jan 17, 2007 9:22 am

Well spotted, I noticed the same yesterday. Will fix this this week.
MacFrog
Posts: 113
Joined: Thu Aug 28, 2003 3:54 am

Postby MacFrog » Wed Jan 17, 2007 7:08 pm

Great plug-in!


I think I'll make my database available as "pre-loaded" for willing people as I seem to be getting 100% hits on spam / ham after training it with a bunch of crap.


category probability wordcount
ham 0.978124226281 122468
spam 0.0218757737187 2739
User avatar
wgroleau
Posts: 402
Joined: Sat Jun 10, 2006 4:20 pm
Location: Indiana / USA

Postby wgroleau » Sat Jan 20, 2007 8:03 pm

Is it enough to overwrite the old PHP file with the new one?
Or do I need to uninstall/reinstall ?
thanks
Wes Groleau
MacFrog
Posts: 113
Joined: Thu Aug 28, 2003 3:54 am

Postby MacFrog » Tue Jan 23, 2007 9:31 pm

Ok problem:

Why is this marked spam:

word Ham Spam
con 11 3
selka 11 0
wtb 4 0
pls 3 0
trumpd 2 0
2dex 2 0
4str 2 0
-4con 2 0
Rescaled probability: 0 1

Seems to be it should be marked HAM .... even training it ham and re doing it still gets it tagged.

I'm submitting this to the plugin via the data tag.
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Tue Jan 23, 2007 10:05 pm

Testresult: Ham! [score:0.58920470815143]
Tested:con selka wtb pls trumpd 2dex 4str -4con

I'm missing one factor here; the spam and ham probability. My guess would be that your basian statistics would show a 0.9 probability for hand and a 0.1 for spam.
[upd]
ah scrolled back a message and saw my guess was right. A low spam probability means that IF you see a spammy word the weigth of the word is considered high. What explain does is show the occurrences of the words. It does not show you the magic math on a per word basis combined with the probability count of both categories.
[/upd]
verbaljam
Posts: 666
Joined: Wed Jul 31, 2002 4:58 pm
Location: Amsterdam, The Netherlands
Contact:

Postby verbaljam » Wed Jan 24, 2007 2:52 pm

wgroleau wrote:Is it enough to overwrite the old PHP file with the new one?
Or do I need to uninstall/reinstall ?
thanks


Just overwrite the old php-file with the new one. No uninstall needed.
MacFrog
Posts: 113
Joined: Thu Aug 28, 2003 3:54 am

Postby MacFrog » Thu Jan 25, 2007 1:51 am

xiffy wrote:
Testresult: Ham! [score:0.58920470815143]
Tested:con selka wtb pls trumpd 2dex 4str -4con

I'm missing one factor here; the spam and ham probability. My guess would be that your basian statistics would show a 0.9 probability for hand and a 0.1 for spam.
[upd]
ah scrolled back a message and saw my guess was right. A low spam probability means that IF you see a spammy word the weigth of the word is considered high. What explain does is show the occurrences of the words. It does not show you the magic math on a per word basis combined with the probability count of both categories.
[/upd]



So you recommend I train more spam then to fix the problem?
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Thu Jan 25, 2007 2:29 am

either that or you add 'con' to the ignore words list. that should solve the problem as well. (You can test this with the log, just add to the ignore words list and run explain again, it will show the actual spam / ham level)
MacFrog
Posts: 113
Joined: Thu Aug 28, 2003 3:54 am

Postby MacFrog » Thu Jan 25, 2007 6:02 am

xiffy wrote:either that or you add 'con' to the ignore words list. that should solve the problem as well. (You can test this with the log, just add to the ignore words list and run explain again, it will show the actual spam / ham level)


Out of curiosity what are your scores for spam and ham?
User avatar
xiffy
Nucleus Guru
Nucleus Guru
Posts: 1194
Joined: Wed Mar 27, 2002 6:37 pm
Location: Deventer
Contact:

Postby xiffy » Thu Jan 25, 2007 10:33 am

category probability wordcount
ham 0.59953037418573 94980
spam 0.40046962581427 63444
User avatar
fishy
Posts: 21
Joined: Tue Nov 21, 2006 7:35 pm
Location: Beijing, China
Contact:

SQL error during installation

Postby fishy » Thu Jan 25, 2007 3:56 pm

Hi, I'm using dreamhost and when I install SpamBayes 1.1.0, it says:

mySQL error with query CREATE TABLE IF NOT EXISTS nucleus_nucleus_plug_sb_wf (word varchar(250) NOT NULL default '', catcode varchar(250) NOT NULL default '', wordcount bigint(20) NOT NULL default '0', PRIMARY KEY (word, catcode)): Specified key was too long; max key length is 1000 bytes

So this table wasn't created and certainly it won't work (right?)

I've installed an older version (1.0.x, I'm not sure about the exact version number) and it didn't have this problem.

Any suggestion about it?
My recently played tracks list on last.fm:
Image

Return to “Plugin Development”