Structure output of 'systemctl -h' command into a list and/or dictionary - python-3.x

Before I spend hours trying to code this I thought I'd ask the experts to see if anyone has already accomplished this. I've tried to do some searching but to be honest I'm not sure how to search for what I'm looking for. So, I'll do my best to describe it here and maybe one of you can either explain a way to search for a solution to my problem or possibly even provide a solution!
I wish to gather the output of 'systemctl -h' and parse it into a Python3 list of dictionaries; with each list entry being a dictionary of the possible options listed in the Help output.
What is interesting/hard about this desired output is that there are a lot of caveats that need considered:
Some lines in the Help output aren't options (like the first line and blank lines).
Some lines are "headings" for a group of commands. <-- Using this as the 'section' keyword would be a "nice to have".
Some of the options have short-key values "-h" in addition to "--help". While other commands don't follow that format at all.
Separation of the option's values is done using spacing/newline. Sometimes it is a single space, sometimes multiple spaces, sometimes it is a newline along with more spaces.
I think the resulting Python list of dictionaries should have something like this as its structure:
help_output = [
{'section': '<section name>',
'options': {
'shortcode': '<-h>',
'longcode': '<--help or list-unit-files>',
'description': '<blah>'
}
},
]
Alas, I lack the scripting foo to deal with all these caveats in a "clean" way. So, instead of me hacking together some non-Pythonic garbage I would like to get some input from you all.
Thanks for your time and I hope you find my question/challenge worthy of answering. :-)

Related

Way to find a number at the end of a string in Smalltalk

I have different commands my program is reading in (i.e., print, count, min, max, etc.). These words can also include a number at the end of them (i.e., print3, count1, min2, max6, etc.). I'm trying to figure out a way to extract the command and the number so that I can use both in my code.
I'm struggling to figure out a way to find the last element in the string in order to extract it, in Smalltalk.
You didn't told which incarnation of Smalltalk you use, so I will explain what I would do in Pharo, that is the one I'm familiar with.
As someone that is playing with Pharo a few months at most, I can tell you the sheer amount of classes and methods available can feel overpowering at first, but the environment actually makes easy to find things. For example, when you know the exact input and output you want, but doesn't know if a method already exists somewhere, or its name, the Finder actually allow you to search by giving a example. You can open it in the world menu, as shown bellow:
By default it seeks selectors (method names) matching your input terms:
But this default is not what we need right now, so you must change the option in the upper right box to "Examples", and type in the search field a example of the input, followed by the output you want, both separated by a ".". The input example I used was the string 'max6', followed by the desired result, the number 6. Pharo then gives me a list of methods that match that:
To get what would return us the text part, you can make a new search, changing the example output from number 6 to the string 'max':
Fortunately there is several built-in methods matching the description of your problem.
There are more elegant ways, I suppose, but you can make use of the fact that String>>#asNumber only parses the part it can recognize. So you can do
'print31' reversed asNumber asString reversed asNumber
to give you 31. That only works if there actually is a number at the end.
This is one of those cases where we can presume the input data has a specific form, ie, the only numbers appear at the end of the string, and you want all those numbers. In that case it's not too hard to do, really, just:
numText := 'Kalahari78' select: [ :each | each isDigit ].
num := numText asInteger. "78"
To get the rest of the string without the digits, you can just use this:
'Kalahari78' withoutTrailingDigits. "Kalahari"6
As some of the Pharo "OGs" pointed out, you can take a look at the String class (just type CMD-Return, type in String, hit Return) and you will find an amazing number of methods for all kinds of things. Usually you can get some ideas from those. But then there are times when you really just need an answer!

Python 3 Tips to Shorten Code for Assignment and Getting Around TextIO

I've been going through a course and trying to find ways to shorten my code. I had this assignment to open a text file, split it, then add all of the unique values to a list, then finally sort it. I passed the assignment, but I have been trying to shorten it to learn some ways to apply any shortening concepts to future codes. The main issues I keep running into is trying to make the opened file into strings to turn them into lists to append and such without read(). If I don't used read() I get back TextIO errors. I tried looking into it but what I found involved importing os and doing some other funky stuff, which seems like it would take more time.
So if anyone would mind giving me tips to more effectively code this that are beginner friendly I would be appreciative.
romeo = open('romeo').read()
mylist = list()
for line in romeo.split() :
if line not in mylist:
mylist.append(line)
mylist.sort()
print(mylist)
I saw that set() is pretty good for unique values, but then I don't think I can sort it. Then trying flip flop between a list and set would seem wacky. I tried those swanky one line for loop boys, but couldn't get it to work. like for line not in mylist : mylist.append(line) I know that's not how to do it or even close, but I don't know how to convey what I mean.
So to iterate:
1. How to get the same result without read() / getting around textIO
2. How to write this code in a more stream lined way.
I'm new to the site and coding, so hopefully I didn't trigger anyone.

Arbitrary lookaheads in PLY

I am trying to parse a config, which would translate to a structured form. This new form requires that comments within the original config be preserved. The parsing tool is PLY. I am running into an issue with my current approach which I will describe in detail below, with links to code as well. The config file is going to look contain multiple config blocks, each of which is going to be of the following format
<optional comments>
start_of_line request_stmts(one or more)
indent reply_stmts (zero or more)
include_stmts (type 3)(zero or more)
An example config file looks like this.
While I am able to partially parse the config file with the grammar below, I fail to accomodate comments which would exist within the block.
For example, a block like this raises syntax errors, and any comments in a block of config fail to parse.
<optional comments>
start_of_line request_stmts(type 1)(one or more)
indent reply_stmts (type 2)(one or more)
<comments>
include_stmts (type 3)(one or more)(optional)
The parser.out mentions one shift/reduce conflict which I think arises because once the reply_stmts are parsed, a comments section which follows could mark start of a new block or comments within the subblock. Current grammar parsing result for the example file
[['# test comment ', '# more of this', '# does this make sense'], 'DEFAULT', [['x', '=',
'y']], [['y', '=', '1']], ['# Transmode', '# maybe something else', '# comment'],
'/random/location/test.user']
As you might notice, the second config block complete misses the username, request_stmt, reply_stmt sections.
What I have tried
I have tried moving the comments section around in the grammar, by specifying it before specific blocks or in the statement grammar. In the code link pasted above, the comments section has been specified in the overall statement grammar. Both of these approaches fail to parse comments within a config block.
username : comments username
| username
include_stmt : comments includes
| includes
I have two main questions:
Is there a mistake I am making in the implementation/understanding of LR parsing, solving which I could achieve what I want to ?
Is there a better way to achieve the same goal than my current approach ? (PLY-fu, different parser, different grammar)
P.S Wasn't able to include the actual code in the question, mentioned in the comments
You are correct that the problem is that when the parser sees a comment, it cannot know whether the comment belongs to the same section or whether the previous section is finished. In the former case, the parser needs to shift the comment, while in the latter case it needs to reduce the configuration section.
Since there could be any number of comments, the necessary lookahead could be arbitrarily large, in which case LR parsing wouldn't be possible. But a simple trick can reduce the lookahead to two tokens: just combine consecutive comments into a single token.
Any LR(k) grammar has an equivalent LR(1) grammar. In effect, the LR(1) grammars works by delaying all decisions for k-1 tokens, accumulating these tokens into the parser state. That's a massive increase in grammar size, but it's usually possible to achieve the same effect in other ways, and that's certainly the case here.
The basic idea is that any comment is (temporarily) accumulated into a list of comments. When a non-comment token is encountered, this temporary list is attached to that token.
This can be done either in the lexical scanner or in the parser actions, depending on your inclinations.
Before attempting all that, you should make sure that retaining comments is really useful to your application. Comments are normally not relevant to the semantics of a program (or configuration file), and it would certainly be much simpler for the lexer to just drop comments into the bit-bucket. If your application will end up reformatting the input, then it will have to retain comments. But if it only needs to extract information from the configuration, putting a lot of effort into handling comments is hard to justify.

How to speed up search including special character alternatives and nested loops (Python/Django webapp)?

I have three loops nested in a python/django webapp backend. all_recommended_services has all the service info I need to go through. alternatives has the search criteria entered in the search bar, including all special character alternatives (for example: u is substituted with ú, ö with ő and so on...). Finally, the loop for value in alternative: goes through all search words individually split by empty space.
There are search keyword combinations which yield millions of alternatives, which totally kills the webapp. Is there an efficient way to speed this up? I tried to look into itertools.product to use cartesian, but it didn't really help me avoid more loops or speed up the process. Any help is much appreciated!
for service in all_recommended_services:
county_str = get_county_by_id(all_counties, service['county_id'])
for alternative in alternatives:
something_found = False
for value in alternative:
something_found = search_in_service(service, value, county_str)
if not something_found:
break
if something_found:
if not service in recommended_services:
recommended_services.append(service)
As you are searching, I will suggest this package named Django-haystack. It is easy to use and is highly customizable to fit your needs. Since you didn't include more detail, I can't provide a more detailed demo, but the documentation is comprehensive.

How to make this Groovy string search code more efficient?

I'm using the following groovy code to search a file for a string, an account number. The file I'm reading is about 30MB and contains 80,000-120,000 lines. Is there a more efficient way to find a record in a file that contains the given AcctNum? I'm a novice, so I don't know which area to investigate, the toList() or the for-loop. Thanks!
AcctNum = 1234567890
if (testfile.exists())
{
lines = testfile.readLines()
words = lines.toList()
for (word in words)
{
if (word.contains(AcctNum)) { done = true; match = 'YES' ; break }
chunks += 1
if (done) { break }
}
}
Sad to say, I don't even have Groovy installed on my current laptop - but I wouldn't expect you to have to call toList() at all. I'd also hope you could express the condition in a closure, but I'll have to refer to Groovy in Action to check...
Having said that, do you really need it split into lines? Could you just read the whole thing using getText() and then just use a single call to contains()?
EDIT: Okay, if you need to find the actual line containing the record, you do need to call readLines() but I don't think you need to call toList() afterwards. You should be able to just use:
for (line in lines)
{
if (line.contains(AcctNum))
{
// Grab the results you need here
break;
}
}
When you say efficient you usually have to decide which direction you mean: whether it should run quickly, or use as few resources (memory, ...) as possible. Often both lie on opposite sites and you have to pick a trade-off.
If you want to search memory-friendly I'd suggest reading the file line-by-line instead of reading it at once which I suspect it does (I would be wrong there, but in other languages something like readLines reads the whole file into an array of strings).
If you want it to run quickly I'd suggest, as already mentioned, reading in the whole file at once and looking for the given pattern. Instead of just checking with contains you could use indexOf to get the position and then read the record as needed from that position.
I should have explained it better, if I find a record with the AcctNum, I extract out other information on the record...so I thought I needed to split the file into multiple lines.
if you control the format of the file you are reading, the solution is to add in an index.
In fact, this is how databases are able to locate records so quickly.
But for 30MB of data, i think a modern computer with a decent harddrive should do the trick, instead of over complicating the program.

Resources