I'm new to Python, nltk and nlp. I have written simple grammar. But when running the program it gives below error. Please help me to solve this error
Grammar:-
S -> NP
NP -> PN|PRO|D[NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM
PP -> P NP
D[NUM=sg] -> 'a'
D -> 'the'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
NOM -> A NOM|N[NUM=?n]
Code:-
import nltk
grammar = nltk.data.load('file:english_grammer.cfg')
rdparser = nltk.RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)
for tree in trees: print (tree)
Error:-
ValueError: Expected a nonterminal, found: [NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM
I don't think NLTK CFG grammar readers can read the format of your CFG with square brackets.
First let's try a CFG grammar without the square brackets:
from nltk.grammar import CFG
grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
'''
grammar = CFG.fromstring(grammar_string)
print grammar
[out]:
Grammar with 18 productions (start state = S)
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'
PN -> 'dinesh'
PRO -> 'she'
PRO -> 'he'
PRO -> 'we'
A -> 'tall'
A -> 'naughty'
A -> 'long'
A -> 'three'
A -> 'black'
P -> 'with'
P -> 'in'
P -> 'from'
P -> 'at'
QP -> 'some'
Now let's put the square brackets in:
from nltk.grammar import CFG
grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
'''
grammar = CFG.fromstring(grammar_string)
print grammar
[out]:
Traceback (most recent call last):
File "test.py", line 33, in <module>
grammar = CFG.fromstring(grammar_string)
File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 519, in fromstring
encoding=encoding)
File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 1273, in read_grammar
(linenum+1, line, e))
ValueError: Unable to parse line 10: N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
Expected an arrow
Going back to your grammar, it seems like you're using the square brackets to denote constraints or uncontraints, so the solution would be:
Using underscore for contrainted non-terminals and
to make a rule for unconstrainted non-terminals
So your cfg rules will look as such:
from nltk.parse import RecursiveDescentParser
from nltk.grammar import CFG
grammar_string = '''
S -> NP
NP -> PN | PRO | D N | D A N | D N PP | QP N | A N | D NOM PP | D NOM
PP -> P NP
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
D -> D_def | D_sg
D_def -> 'the'
D_sg -> 'a'
N -> N_sg | N_pl
N_sg -> 'boy'|'girl'|'room'|'garden'|'hair'
N_pl -> 'dogs'|'cats'
'''
grammar = CFG.fromstring(grammar_string)
rdparser = RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)
for tree in trees:
print (tree)
[out]:
(S (NP (D (D_sg a)) (N (N_pl dogs))))
It looks like you're trying to use NLTK's feature grammars, which do use the square bracket syntax to denote features and feature agreement. NLTK's parser to use feature grammars is the FeatureEarleyChartParser (as opposed to RecursiveDescentParser).
From the NLTK documentation:
>>> from __future__ import print_function
>>> import nltk
>>> from nltk import grammar, parse
>>> g = """
... % start DP
... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a]
... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that'
... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those'
... D[AGR=[NUM='pl', PERS=1]] -> 'we'
... D[AGR=[PERS=2]] -> 'you'
... N[AGR=[NUM='sg', GND='m']] -> 'boy'
... N[AGR=[NUM='pl', GND='m']] -> 'boys'
... N[AGR=[NUM='sg', GND='f']] -> 'girl'
... N[AGR=[NUM='pl', GND='f']] -> 'girls'
... N[AGR=[NUM='sg']] -> 'student'
... N[AGR=[NUM='pl']] -> 'students'
... """
>>> grammar = grammar.FeatureGrammar.fromstring(g)
>>> tokens = 'these girls'.split()
>>> parser = parse.FeatureEarleyChartParser(grammar)
>>> trees = parser.parse(tokens)
>>> for tree in trees: print(tree)
(DP[AGR=[GND='f', NUM='pl', PERS=3]]
(D[AGR=[NUM='pl', PERS=3]] these)
(N[AGR=[GND='f', NUM='pl']] girls))
store the grammar with .fcfg extension and use load_parser in nltk package.
eg: english_grammer.fcfg
I used following code to load it.
import nltk
from nltk import load_parser
chart = load_parser('file:english_grammer.fcfg')
sent = 'the girl gave the dog a bone'.split()
trees = chart.nbest_parse(sent)
for tree in trees: print tree
That solve the issue for me.
Related
I am trying to use gnuplot package for Haskell (https://hackage.haskell.org/package/gnuplot) for building a 4D plot as described here (4D plot with gnuplot). But I cann't figure out how to set appropriate 3DGraph type.
My problem is to draw a function like A = f(x,y,z) and A should be encoded with the color.
After few days I find the solution that is suit for my purpose. Maybe someone will find it useful:
module PrintToGraph where
import qualified Graphics.Gnuplot.Advanced as GP
import qualified Graphics.Gnuplot.Frame as Frame
import qualified Graphics.Gnuplot.Frame.OptionSet as OptsSet
import qualified Graphics.Gnuplot.Plot.ThreeDimensional as Plot3D
import qualified Graphics.Gnuplot.Graph.ThreeDimensional as Graph3D
import qualified Graphics.Gnuplot.LineSpecification as LineSpec
import GHC.Exts (groupWith )
import qualified Graphics.Gnuplot.Value.Atom as Atom
import Graphics.Gnuplot.ColorSpecification ( paletteFrac )
import Data.Foldable ( Foldable(foldMap') )
import Data.List ( elemIndex )
import Data.Maybe ( fromJust )
defltOpts :: OptsSet.T (Graph3D.T Double Double Double)
defltOpts = OptsSet.key False OptsSet.deflt
waveFuncVis :: (Double -> (Double, Double, Double) -> Double) -> Double -> Double -> Frame.T (Graph3D.T Double Double Double)
waveFuncVis func depth precision =
let x = Plot3D.linearScale 100 (-10, 10)
testedRange = (groupWith (\(x,y,z) -> test func (x,y,z) depth precision) . filter (\(x,y,z) -> funcWrapper func x y z^2 >= precision)) [(x1,y1,z1) | x1<-x, y1<-x, z1<-x]
range = [(x1,y1,z1) | x1<-x, y1<-x, z1<-x]
calcColor :: [(Double,Double,Double)] -> Double
calcColor array = fromIntegral (fromJust (elemIndex array testedRange)) / fromIntegral (length testedRange)
linespec array = Graph3D.lineSpec $ LineSpec.lineColor (paletteFrac (calcColor array)) LineSpec.deflt
graph array = linespec array <$> Plot3D.cloud Graph3D.points array
in Frame.cons defltOpts $ foldMap' graph testedRange
test :: (Double -> (Double, Double, Double) -> Double)
-> (Double, Double, Double) -> Double -> Double -> Integer
test func (x, y , z) depth precision
| funcWrapper func x y z^2 >= precision = round $ funcWrapper func x y z^2 * depth
| otherwise = 0
funcWrapper :: (Double -> (Double, Double, Double) -> Double) -> Double -> Double -> Double -> Double
funcWrapper func x' y' z' = func 1.0 (toR x' y' z', toTau x' y' z', toPhi x' y' z')
--2pz Hydrogen function
waveHfunc2pz :: Double -> (Double, Double, Double) -> Double
waveHfunc2pz z (r, tau, phi) = a * b * c* e
where a,b,c,e :: Double
a = 1.0/(4.0*sqrt (2.0*pi))
b = (z/aBohr)**2.5
c = pureTrig cos tau
e = r*exp(-1.0 * (z*r/(2.0*aBohr)))
main :: IO ()
main = sequence_ [GP.plotDefault (waveFuncVis waveHfunc2pz 10000 0.0005)]
Briefly:
We throw away function's values that less, than precision. (I use filter in testedRange for this purpose)
Thanks to the groupWith we receive list of the coordinates' lists - [[(x,y,z)]]. Each sublist here contains coordinates which gives the same function value.
To colorize them we convert sublist's index to the Double value and use it as an argument for PaletteFrac.
As a result we receive cloud of colored dots, where each color correspond to the one function value.
Example picture for 2pz hydrogen atom.
I have provided some code below that demonstrates the basic concept of a project. I have modules that are set up as interfaces; I implement the interfaces to build modules. In the example below, I built an Alpha.
type Ticker = String
type Shares = Int
type Price = Float
data Insight = Down | Flat | Up deriving (Show, Eq, Ord)
type Target = Float
data Universe = Universe {generateUniverse :: [(Ticker, Price)] -> [(Ticker, Price)]}
data Alpha = Alpha {generateInsights :: [(Ticker, Price)] -> [(Ticker, Insight)]}
data Portfolio = Portfolio {generateTargets :: [(Ticker, Insight)] -> [(Ticker, Target)]}
data Execution = Execution {generateOrders :: [(Ticker, Price)] -> [(Ticker, Target)] -> [(Ticker, Shares)]}
convert :: (Ticker, Price) -> (Ticker, Insight)
convert (t, p)
| p < 500 = (t, Down)
| p == 500 = (t, Flat)
| p > 500 = (t, Up)
split :: [(Ticker, Price)] -> [(Ticker, Insight)]
split xs = foldr (\tp acc -> (convert tp):acc) [] xs
splitAlpha :: Alpha
splitAlpha = Alpha {
generateInsights = split
}
main :: IO ()
main = do
let
alpha = splitAlpha
print (generateInsights alpha [("TSLA", 500.0), ("RKT", 10.0), ("AMC", 750)])
How can I compress my definition of splitAlpha so that there is not as much nesting in the definition of generateInsights? I have attempted the example below...
convert :: (Ticker, Price) -> (Ticker, Insight)
convert (t, p)
| p < 500 = (t, Down)
| p == 500 = (t, Flat)
| p > 500 = (t, Up)
splitAlpha :: Alpha
splitAlpha = Alpha {
generateInsights xs = foldr (\tp acc -> (convert tp):acc) [] xs
}
and recieved this error:
ghci> :cmd return $ unlines [":l itk", ":main"]
[1 of 1] Compiling Main ( itk.hs, interpreted )
itk.hs:23:20: error: parse error on input `xs'
|
23 | generateInsights xs = foldr (\tp acc -> (convert tp):acc) [] xs
| ^^
Failed, no modules loaded.
<interactive>:60:53: error:
* Variable not in scope: main :: IO a0
* Perhaps you meant `min' (imported from Prelude)
You can work with a lambda expression, so:
splitAlpha :: Alpha
splitAlpha = Alpha {
generateInsights = \xs -> foldr (\tp acc -> (convert tp):acc) [] xs
}
In this specific case however, this is just a mapping function, so you can work with:
splitAlpha :: Alpha
splitAlpha = Alpha {
generateInsights = map convert
}
As Willem Van Onsem wrote, this example is super easy because the whole thing boils down to generateInsights = map convert. But more generally, it wouldn't be so easy. Lambda syntax only works for single-clause functions with no guards
splitAlpha = Alpha
{ generateInsights = \xs -> ...
}
More generally, you can always use let to have a proper definition-scope in which you can define any function locally with full syntax available, but avoiding to populate any other namespace:
splitAlpha = Alpha
{ generateInsights
= let gi xs = foldr (\tp acc -> (convert tp):acc) [] xs
in gi
}
I'm searching for a pattern in a list of string elements.
As far my code is working fine, but some data is unable to produce required result.
Code
ss = '''
X A
B A
A C
A D
E A
A F
'''.strip()
lst = []
for r in ss.split('\n'):
lst.append(r.split())
paths = []
for e in lst:
# each row in source data
pnew = [] # new path
for p in paths:
if e[0] in p: # if start in existing path
if p.index(e[0]) == len(p)-1: # if end of path
p.append(e[1]) # add to path
else:
pnew.append(p[:p.index(e[0])+1]+[e[1]]) # copy path then add
break
else: # loop completed, not found
paths.append(list(e)) # create new path
if len(pnew): # copied path
paths.extend(pnew) # add copied path
print('\n'.join([' -> '.join(e) for e in paths]))
what i'm getting is
X -> A -> C
B -> A
X -> A -> D
E -> A
X -> A -> F
what my requried result is
B -> A -> C
X -> A -> D
E -> A -> F
X -> A -> C
B -> A -> D
B -> A -> F
X -> A- > F
Based on Cr & Dr I'm Trying to get the pattern (Cr & Dr are optional)
X A Cr
B A Cr
A C Dr
A D Dr
E A Cr
A F Dr
It's easier to handle this with pandas:
import pandas as pd
from io import StringIO
ss = '''
X A
B A
A C
A D
E A
A F
'''.strip()
df = pd.read_csv(StringIO(ss), sep=' ', names=['source', 'target'])
df = df.merge(df, how='inner', left_on='target', right_on='source')
df = df[['source_x', 'target_x', 'target_y']]
df.apply(lambda x: ' -> '.join(x), axis=1).sort_values()
I'd like to create my own workspace viewer. How can I get a list of workspace, and their corresponding window titles?
I can't seem to find any relavent function to getting these values in the documentation.
It can be done via the following (the main function being workspacesGrouped :: X [(WorkspaceId, [String])]):
import XMonad.Util.XUtils
import XMonad
import XMonad.Core
import XMonad.Config.Prime
import XMonad.Util.Font
import XMonad.StackSet as W
import FileLogger
import Control.Monad
import Data.List
import Foreign.C.String
workspacesGrouped :: X [(WorkspaceId, [String])]
workspacesGrouped = do
ws <- gets windowset
let x = map (W.workspace) (W.current ws : W.visible ws)
let y = (W.hidden ws)
sequence $ fmap (\v -> fmap ((,) $ W.tag v) (getWorkspaceWindowTitles v)) $ x ++ y
getWorkspaceWindowTitles :: Workspace i l Window -> X [String]
getWorkspaceWindowTitles w = do
withDisplay $ \d ->
(liftIO $ forM
(integrate' $ stack w)
(\z -> getWindowTitle z d)
)
getWindowTitle :: Window -> Display -> IO String
getWindowTitle w d = getTextProperty d w wM_NAME >>= (peekCString . tp_value)
So I have about a 8mb file of each with 6 ints seperated by a space.
my current method for parsing this is:
tuplify6 :: [a] -> (a, a, a, a, a, a)
tuplify6 [l, m, n, o, p, q] = (l, m, n, o, p, q)
toInts :: String -> (Int, Int, Int, Int, Int, Int)
toInts line =
tuplify6 $ map read stringNumbers
where stringNumbers = split " " line
and mapping toInts over
liftM lines . readFile
which will return me a list of tuples. However, When i run this, it takes nearly 25 seconds to load the file and parse it. Any way I can speed this up? The file is just plain text.
You can speed it up by using ByteStrings, e.g.
module Main (main) where
import System.Environment (getArgs)
import qualified Data.ByteString.Lazy.Char8 as C
import Data.Char
main :: IO ()
main = do
args <- getArgs
mapM_ doFile args
doFile :: FilePath -> IO ()
doFile file = do
bs <- C.readFile file
let tups = buildTups 0 [] $ C.dropWhile (not . isDigit) bs
print (length tups)
buildTups :: Int -> [Int] -> C.ByteString -> [(Int,Int,Int,Int,Int,Int)]
buildTups 6 acc bs = tuplify6 acc : buildTups 0 [] bs
buildTups k acc bs
| C.null bs = if k == 0 then [] else error ("Bad file format " ++ show k)
| otherwise = case C.readInt bs of
Just (i,rm) -> buildTups (k+1) (i:acc) $ C.dropWhile (not . isDigit) rm
Nothing -> error ("No Int found: " ++ show (C.take 100 bs))
tuplify6:: [a] -> (a, a, a, a, a, a)
tuplify6 [l, m, n, o, p, q] = (l, m, n, o, p, q)
runs pretty fast:
$ time ./fileParse IntList
200000
real 0m0.119s
user 0m0.115s
sys 0m0.003s
for an 8.1 MiB file.
On the other hand, using Strings and your conversion (with a couple of seqs to force evaluation) also took only 0.66s, so the bulk of the time seems to be spent not parsing, but working with the result.
Oops, missed a seq so the reads were not actually evaluated for the String version. Fixing that, String + read takes about four seconds, a bit above one with the custom Int parser from #Rotsor's comment
foldl' (\a c -> 10*a + fromEnum c - fromEnum '0') 0
so parsing apparently did take a significant amount of the time.