modify weka stemmer for persian text - text

I want to use weka for text classification for Persian text. But I have a problem.
Tokenizer, stoplist and stemmer in Persian is different from these in English. So I should use my stemmer, tokenizer and stoplist in weka's interface there is a soulution to use my own stoplist but there is no way to change stemmer and tokennizer.
I want to know is there anyway to change them without modify weka's source code?
Because I am new in java and I don't know how I should modify weka source code.

i find my answer!it's impossible do it without modify weka's source code
i forced to modify weka's source code.i had so much trouble to do it .because i am new in java!and so i put a brief steps to modifying weka's code to help others :
first you should set java environment variable that described in this link:
http://www.ntu.edu.sg/home/ehchua/programming/howto/Environment_Variables.html
and then instal ant that described in this links :
http://ant.apache.org/bindownload.cgi
and finally see this video to find out how should you modify weka 's code:
http://www.youtube.com/watch?v=buCpG7uV_v4

Related

Do I need to add updated phoneme sequence of words to .dict file while adapting AM using cmusphinx?

I am trying to adapt en-us acoustic model with indian english accent recordings. Since many words are pronounced in different accent, do I need to add the updated phoneme representation of words? Currently I am following this link: https://cmusphinx.github.io/wiki/tutorialadapt/#accumulating-observation-counts and here nothing is mentioned about updating your .dict file.
PS: Should I add new words directly in the dictionary?
There is Indian English model in downloads, you should use it instead. It comes with Indian English dictionary.

Using Stanford Tregex in Python for German text

The accepted answer in
Using Stanford Tregex in Python
is almost solving my problem, but I don't know how to set the language to German. Can anybody help me?
It should work if when you start the server you add -serverProperties StanfordCoreNLP-german.properties. Let me know if that doesn't work. Also make sure you have the German models jar on your CLASSPATH.

Work on a defined slide with pptx python

I'm currently working on a way to generate Powerpoint presentation from Excel file. So, I decided to use Python as there is python-pptx which allows to work on .pptx files.
I have to use a standard file on which I will add some shapes and texts but just whithin the first slide.
I've read python-pptx documentation but I didn't find a way to work on a defined slide (the first slide of my standard file). I found just a way to add a slide and work on it.
Can someone explain how to do it ?
If you don't understand my problem, tell me I will try to rephrase it.
Thanks
N.B : I'm French, on the web I didn't find a French documentation so I had to search English documentation. It's possible that I didn't understand something about my problem when I read it. Sorry, if you find it easily. I'm still working my English :D
Is this what you are looking for?
prs = Presentation('existing-prs-file.pptx')
first_slide = prs.slides[0]
# do something with the content of the slides
prs.save('new-file-name.pptx')
btw I have just copied it from the docs.

text to phonemes converter

I'm searching for a tool that converts text to phonemes, (like text to speech software)
I can program one but it will not be without errors and takes a lot of time!
so my question is:
is there a simple tool for converting e.g.
"hello" to "HH AH0 L OW1"
maybe some command-line tool so i can capture the stdout?
i'm searching for the phonemes in 'Arpabet' style (see the 'hello' example).
espeak does something like that but the output is not in Arpabet style and the phonemes are
not split by some determiner.
If you had searched for Arpabet on wiki you would have found your answer. The CMU guys have prepared scripts which convert most english words to their respective Arpabet phonetic break up.
If you want the phone sequence of a couple of words you can use their interface here. But, if you want it for a big file then you might have to run their scripts on your own. They used to have a working page here, but it seems to be not working now.

Is there a b5paper japanese style in latex?

My thesis is written in b5j documentclass style.
\documentclass[b5j,twoside,12pt]{report}
I have a paper that is appended at the end. However this is written in b5paper style as an article.
\documentclass[12pt,b5paper,twoside]{article}
How do I get the paper to follow the japanese style? Havent found any b5paperj options in the geometry package.. :-/
It is possible to build the paper that must be appended separately and input it in your document using pdfpages. This way you don't have to control both styles and the package provides enough flexibility to make it look like you want to.

Resources