How can I convert a list that has entries like "MAX_PAYLOAD NVARCHAR(5)" to a dictionary that contains key value pairs like "MAX_PAYLOAD" : "NVARCHAR(5)" ?
Assuming that this string is representative of what you're going to see, you might be able to simply split on the space in the middle to create a tuple, then create a dictionary from those tuples.
lines = [
"MAX_PAYLOAD NVARCHAR(5)",
"MIN_PAYLOAD NVARCHAR(4)",
]
broken_lines = [line.split(maxsplit=1) for line in lines]
assert dict(broken_lines) == {'MAX_PAYLOAD': 'NVARCHAR(5)', 'MIN_PAYLOAD': 'NVARCHAR(4)'}
If you might have other whitespaces or something, figure out a different splitting function that'll work for you.
Related
I have the following list:
original_list = [('Anger', 'Envy'), ('Anger', 'Exasperation'), ('Joy', 'Zest'), ('Sadness', 'Suffering'), ('Joy', 'Optimism'), ('Surprise', 'Surprise'), ('Love', 'Affection')]
I am trying to create a random list comprising of the 2nd element of the tuples (of the above list) using the random method in such a way that duplicate values appearing as the first element are only considered once.
That is, the final list I am looking at, will be:
random_list = [Exasperation, Suffering, Optimism, Surprise, Affection]
So, in the new list random_list, strings Envy and Zest are eliminated (as they are appearin the the original list twice). And the process has to randomize the result, i.e. with each iteration would produce a different list of Five elements.
May I ask somebody to show me the way how may I do it?
You can use dictionary to filter the duplicates from original_list (shuffled before with random.sample):
import random
original_list = [
("Anger", "Envy"),
("Anger", "Exasperation"),
("Joy", "Zest"),
("Sadness", "Suffering"),
("Joy", "Optimism"),
("Surprise", "Surprise"),
("Love", "Affection"),
]
out = list(dict(random.sample(original_list, len(original_list))).values())
print(out)
Prints (for example):
['Optimism', 'Envy', 'Surprise', 'Suffering', 'Affection']
I am quite new to python.
And i want to only get a certain format from a bigger list, example:
Whats in the list:
/ABC/EF213
/ABC/EF
/ABC/12AC4
/ABC/212
However the only on i want listed are the ones with this format /###/##### while the rest gets discarded
You could use a generator expression or a for loop to check each element of the list to see if it matches a pattern. One way of doing this would be to check if the item matches a regex pattern.
As an example:
import re
original_list = ["Item I don't want", "/ABC/EF213", "/ABC/EF", "/ABC/12AC4", "/ABC/212", "123/456", "another useless item", "/ABC/EF"]
filtered_list = [item for item in original_list if re.fullmatch("\/\w+\/\w+", item) is not None]
print(filtered_list)
outputs
['/ABC/EF213', '/ABC/EF', '/ABC/12AC4', '/ABC/212', '/ABC/EF']
If you need help making regex patterns, there are many great websites such as regexr which can help you
Every String can be used as a list without any conversion. If the only format you want to check is /###/##### then you can simply make if commands like these:
for text in your_list:
if len(text) == 10 and text[0] == "/" and text[4] == "/" (and so on):
print(text)
Of course this would require a lot of if statements and would take a pretty long time. So I would recomend doing a faster and simpler scan. We could perform this one by, for example, splitting the texts, which would look something like this:
for text in your_list:
checkstring = text.split("/")
Now you have your text Split in parts, and you can simply check what lengths these new parts have with the len() command.
I'm very new in python (I usually write in php). I want to understand how to store information in an associative array, and if you can explain me whats the difference of "tuples", "arrays", "dictionary" and "list" will be wonderful (I tried to read different source but I still not caching it).
So This is my code:
#!/usr/bin/python3.4
import csv
import string
nidless_keys = dict()
nidless_keys = ['test_string1','test_string2'] #this contain the string to
# be searched in linesreader
data = {'type':[],'id':[]} #here I want to store my information
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader: #every line in this csv have a url like
#www.test.com/?test_string1&id=123456
current_row_string = str(row)
for needle in nidless_keys:
current_needle = str(needle)
if current_needle in current_row_string:
data[current_needle[current_row_string[-8:]]) += 1 # also I
#need to count per every id how much rows there are.
In conclusion:
my_data_stored = [current_needle][current_row_string[-8]]
current_row_string[-8] is a url which the last 8 digit of the url is an ID.
So the array should looks like this at the end of the script:
test_string1 = 123456 = 20
= 256468 = 15
test_string2 = 123155 = 10
Edit 1:
Which type I need here to store the information?
Can you tell me how to resolve this script?
It seems you want to count how many times an ID in combination with a test string occurs.
There can be multiple ID/count combinations associated with every test string.
This suggests that you should use a dictionary indexed by the test strings to store the results. In that dictionary I would suggest to store collections.Counter objects.
This way, you would have to add a special case when a key in the results dictionary isn't found to add an empty Counter. This is a common problem, so there is a specialized form of dictionary in the collections module called defaultdict.
import collections
import csv
# Using a tuple for the keys so it cannot be accidentally modified
keys = ('test_string1', 'test_string2')
result = collections.defaultdict(collections.Counter)
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader:
for key in keys:
if key in row:
id = row[-6:] # ID's are six digits in your example.
# The first index is into the dict, the second into the Counter.
result[key][id] += 1
There is an even easier way, by using regular expressions.
Since you seem to treat every row in a CSV file as a string, there is little need to use the CSV reader, so I'll just read the whole file as text.
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
pattern = r'\?(.*)&id=(\d+)'
The pattern is a regular expression. This is a large topic in and of itself, so I'll only cover briefly what it does. (You might also want to check out the relevant HOWTO) At first glance it looks like complete gibberish, but it is actually a complete language.
In looks for two things in a line. Anything between ? and &id=, and a sequence of digits after &id=.
I'll be using IPython to give an example.
(If you don't know it, check out IPython. It is great for trying things and see if they work.)
In [1]: import re
In [2]: pattern = r'\?(.*)&id=(\d+)'
In [3]: text = """www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=234567
....: www.test.com/?foo&id=234567
....: www.test.com/?foo&id=123456
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234"""
The text variable points to the string which is a mock-up for the contents of your CSV file.
I am assuming that:
every URL is on its own line
ID's are a sequence of digits.
If these assumptions are wrong, this won't work.
Using findall to extract every match of the pattern from the text.
In [4]: re.findall(pattern, test)
Out[4]:
[('test_string1', '123456'),
('test_string1', '123456'),
('test_string1', '234567'),
('foo', '234567'),
('foo', '123456'),
('foo', '1234'),
('foo', '1234'),
('foo', '1234')]
The findall function returns a list of 2-tuples (that is key, ID pairs). Now we just need to count those.
In [5]: import collections
In [6]: result = collections.defaultdict(collections.Counter)
In [7]: intermediate = re.findall(pattern, test)
Now we fill the result dict from the list of matches that is the intermediate result.
In [8]: for key, id in intermediate:
....: result[key][id] += 1
....:
In [9]: print(result)
defaultdict(<class 'collections.Counter'>, {'foo': Counter({'1234': 3, '123456': 1, '234567': 1}), 'test_string1': Counter({'123456': 2, '234567': 1})})
So the complete code would be:
import collections
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
result = collections.defaultdict(collections.Counter)
pattern = r'\?(.*)&id=(\d+)'
intermediate = re.findall(pattern, test)
for key, id in intermediate:
result[key][id] += 1
This approach has two advantages.
You don't have to know the keys in advance.
ID's are not limited to six digits.
A brief summary of the python data types you mentioned:
A dictionary is an associative array, aka hashtable.
A list is a sequence of values.
An array is essentially the same as a list, but limited to basic datatypes. My impression is that they only exists for performance reasons, don't think I've ever used one. If performance is that critical to you, you probably don't want to use python in the first place.
A tuple is a fixed-length sequence of values (whereas lists and arrays can grow).
Lets take them one by one.
Lists:
List is a very naive kind of data structure similar to arrays in other languages in terms of the way we write them like:
['a','b','c']
This is a list in python , but seems very similar to array structure.
However there is a very large difference in the way lists are used in python and the usual arrays.
Lists are heterogenous in nature. This means that we can store any kind of data simultaneously inside it like:
ls = [1,2,'a','g',True]
As you can see, we have various kinds of data within a list and is a valid list.
However, one important thing about them is that we can access the list items using zero based indices. So we can write:
print ls[0],ls[3]
output: 1 g
Dictionary:
This datastructure is similar to a hash map data structure. It contains a (key,Value) pair. An empty dictionary looks like:
dc = {}
Now, to store a key,value pair, e.g., ('potato',3),(tomato,5), we can do as:
dc['potato'] = 3
dc['tomato'] = 5
and we saved the data in the dictionary dc.
The important thing is that we can even store another data structure element like a list within a dictionary like:
dc['list1'] = ls , where ls is the list defined above.
This shows the power of using dictionary.
In your case, you have difined a dictionary like this:
data = {'type':[],'id':[]}
This means that your dictionary will consist of only two keys and each key corresponds to a list, which are empty for now.
Talking a bit about your script, the expression :
current_row_string[-8:]
doesn't make a sense. The index should have been -6 instead of -8 that would give you the id part of the current row.
This part is the id and should have been stored in a variable say :
id = current_row_string[-6:]
Further action can be performed as seen the answer given by Roland.
I have a list like this:
[
'C:\\Users\\Rash\\Downloads\\Programs\\a.txt',
'C:\\Users\\Rash\\Downloads\\a.txt',
'C:\\Users\\Rash\\a.txt',
'C:\\Users\\ab.txt',
'C:\\Users\\aa.txt'
]
and I want to sort it based on two conditions:
Sort by no of "\" present in the string.
By Alphabets.
My end result should be this:
[
'C:\\Users\\aa.txt',
'C:\\Users\\ab.txt',
'C:\\Users\\Rash\\a.txt',
'C:\\Users\\Rash\\Downloads\\a.txt',
'C:\\Users\\Rash\\Downloads\\Programs\\a.txt'
]
I am just learning lambda function in python and wrote this code:
print(sorted(mylist, key=lambda x:x.count("\\")))
but this code only sorts by count of "\". It does not sort it by alphabets. The result is that I am seeing the key "'C:\Users\ab.txt'" before the key "'C:\Users\aa.txt'".
I could sort the list two times, but I want to do it in one line. What should I add in the lambda code ? Since I am new to this whole "lambda" thing, I cannot think of a way to do this. Thanks for replying!! :)
Return a sequence from the key function that contains the items you want to sort by.
key=lambda x: (x.count('\\'), x.split('\\'))
I am trying to write numeric data from a function return in the format:
[1,2,3,4,5][10,11]
[6,7,8,9,10][13,14]
...
The problem is that I want the text file to be in the format:
1,2,3,4,5[::tab::]10,11
6,7,8,9,10[::tab::]13,14
...
At the moment I am using
with open(outfilename,'a') as outfile:
outfile.writelines("%s%s\n"%(major,minor))
I would be thankful for help please. I am still totally confused by lists, reading and writing to text files, and achieving specific formatting of data. I always end up with unwanted [] or parentheses.
You can simply write strings using the elements from the list, joining each element with a ,, or the separator you want; then, print the lists using a \t between them.
with open(outfilename, 'at') as outfile:
major = ",".join(str(i) for i in major)
minor = ",".join(str(i) for i in minor)
outfile.writelines("%s\t%s\n" % (major, minor))
The magic behind the following line is very simple, actually:
major = ",".join(str(i) for i in major)
Python offers a feature named list comprehension, that allows you to perform an action over the elements of a list with a very clear and straightforward syntax.
For example, if you have a list l = [1, 2, 3, 4, 5], you can run over the list, converting the elements from integers to strings, calling the method str():
do _something_ with _element_ foreach _element_ in list _l_
str(element) for element in l
When you do this, each resultant string will be created at a time, and you have a generator in this statement:
(str(element) for element in l)
As the method join works by iterating on a list of elements, or receiving this elements from a generator, you can have:
delimiter = ","
delimiter.join(str(element) for element in l)
Hope it clarifies the operation above.