Whoosh Igo 0.7 | Coderz Product

whoosh-igo 0.7

Last updated:

0 purchases

whoosh-igo 0.7 Image
whoosh-igo 0.7 Images

Free

Languages

Categories

Add to Cart

Description:

whooshigo 0.7

About
Tokenizers for Whoosh full text search library designed for Japanese language.
This package conteins two Tokenizers.

IgoTokenizer



requires igo-python(http://pypi.python.org/pypi/igo-python/) and its dictionary.



TinySegmenterTokenizer



requires TinySegmenter in Python(https://code.google.com/p/mhagiwara/source/browse/trunk/nltk/jpbook/tinysegmenter.py)



MeCabTokenizer



requires MeCab python binding(http://mecab.sourceforge.net/bindings.html)




How To Use
IgoTokenizer:
import igo.Tagger
import whooshjp
from whooshjp.IgoTokenizer import IgoTokenizer

tk = IgoTokenizer(igo.Tagger.Tagger('ipadic'))
scm = Schema(title=TEXT(stored=True, analyzer=tk), path=ID(unique=True,stored=True), content=TEXT(analyzer=tk))
TinySegmenterTokenizer:
import tinysegmenter
import whooshjp
from whooshjp.TinySegmenterTokenizer import TinySegmenterTokenizer

tk = TinySegmenterTokenizer(tinysegmenter.TinySegmenter())
scm = Schema(title=TEXT(stored=True, analyzer=tk), path=ID(unique=True,stored=True), content=TEXT(analyzer=tk))


Changelog for Japanese Tokenizers for Whoosh

2011-02-19 – 0.1

first release.


2011-02-21 – 0.2

add TinySegmenterTokenizer
change module name


2011-02-24 – 0.3

add FeatureFilter


2011-02-27 – 0.4

add MeCabTokenizer
add a mode for don’t pickle igo tagger to minimize index.


2011-04-17 – 0.5

correct char offsets


2011-04-17 – 0.6

correct char offsets(TinySegmenterTokenizer)


2012-04-14 – 0.7

rename package(WhooshJapaneseTokenizer to whooshjp)
no longer import sub modules automatically
Python3 compatibility(3.2, 3.3)
Drop Python2.5 support

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files In This Product: (if this is empty don't purchase this product)

Customer Reviews

There are no reviews.