writing data to a file in a specific format using Python

writing data to a file in a specific format using Python - python-3.x

I am trying to write numeric data from a function return in the format:
[1,2,3,4,5][10,11]
[6,7,8,9,10][13,14]
...
The problem is that I want the text file to be in the format:
1,2,3,4,5[::tab::]10,11
6,7,8,9,10[::tab::]13,14
...
At the moment I am using
with open(outfilename,'a') as outfile:
outfile.writelines("%s%s\n"%(major,minor))
I would be thankful for help please. I am still totally confused by lists, reading and writing to text files, and achieving specific formatting of data. I always end up with unwanted [] or parentheses.

You can simply write strings using the elements from the list, joining each element with a ,, or the separator you want; then, print the lists using a \t between them.
with open(outfilename, 'at') as outfile:
major = ",".join(str(i) for i in major)
minor = ",".join(str(i) for i in minor)
outfile.writelines("%s\t%s\n" % (major, minor))
The magic behind the following line is very simple, actually:
major = ",".join(str(i) for i in major)
Python offers a feature named list comprehension, that allows you to perform an action over the elements of a list with a very clear and straightforward syntax.
For example, if you have a list l = [1, 2, 3, 4, 5], you can run over the list, converting the elements from integers to strings, calling the method str():
do _something_ with _element_ foreach _element_ in list _l_
str(element) for element in l
When you do this, each resultant string will be created at a time, and you have a generator in this statement:
(str(element) for element in l)
As the method join works by iterating on a list of elements, or receiving this elements from a generator, you can have:
delimiter = ","
delimiter.join(str(element) for element in l)
Hope it clarifies the operation above.

Related

Generate a list of strings from another list using python random and eliminate duplicates

I have the following list:
original_list = [('Anger', 'Envy'), ('Anger', 'Exasperation'), ('Joy', 'Zest'), ('Sadness', 'Suffering'), ('Joy', 'Optimism'), ('Surprise', 'Surprise'), ('Love', 'Affection')]
I am trying to create a random list comprising of the 2nd element of the tuples (of the above list) using the random method in such a way that duplicate values appearing as the first element are only considered once.
That is, the final list I am looking at, will be:
random_list = [Exasperation, Suffering, Optimism, Surprise, Affection]
So, in the new list random_list, strings Envy and Zest are eliminated (as they are appearin the the original list twice). And the process has to randomize the result, i.e. with each iteration would produce a different list of Five elements.
May I ask somebody to show me the way how may I do it?

You can use dictionary to filter the duplicates from original_list (shuffled before with random.sample):
import random
original_list = [
("Anger", "Envy"),
("Anger", "Exasperation"),
("Joy", "Zest"),
("Sadness", "Suffering"),
("Joy", "Optimism"),
("Surprise", "Surprise"),
("Love", "Affection"),
]
out = list(dict(random.sample(original_list, len(original_list))).values())
print(out)
Prints (for example):
['Optimism', 'Envy', 'Surprise', 'Suffering', 'Affection']

Python filter string

With the following command i can print the balance of my assets from my binance ac.
Command:
USDT_BAL = client.futures_account_balance(asset='USDT')
Return:
[{'accountAlias': 'sRuXXqTioCfWFz', 'asset': 'BNB', 'balance': '0.00000142', 'withdrawAvailable': '0.00000142', 'updateTime': 1621516315044}, {'accountAlias': 'sRuXXqTioCfWFz', 'asset': 'USDT', 'balance': '0.00000000', 'withdrawAvailable': '0.00000000', 'updateTime': 0}, {'accountAlias': 'sRuXXqTioCfWFz', 'asset': 'BUSD', 'balance': '0.00000000', 'withdrawAvailable': '0.00000000', 'updateTime': 0}]
It returns the balances of other assets, but i only need the balance of the USDT asset. How could I filter the USDT_BAL variable for it?

Expanding on my comment:
You have a list of dict. list access is done by iteration (for loops) or by indexes. my_list[0], etc..
dict access can, also done by iteration, but a big benefit is keyed access. my_dict['some_key'], etc..
Python has simplified ways to do common list and dict building commonly called "comprehensions".
So a list comprehension for something like:
my_list = []
for i in range(10):
my_list.append(i)
Could be written as
my_list = [i for i in range(10)]
What I gave you isn't necessarily a list comprehension but follows the same idea. It's called a "generator expression". The difference is it generates some output when you iterate over it but it's output as a whole isn't in the form of some built-in collection (list or dict).
The reason it makes sense in this context is:
I need to iterate over the list to find dict with the correct 'asset' key.
I expect there is only one occurrence of this so I care only about the first occurrence.
So to break it down you have a generator expression:
(i['balance'] for i in USDT_BAL if i['asset'] == 'USDT')
Which is roughly equivalent to.
def my_gen():
for i in USDT_BAL:
if i['asset'] == 'USDT':
yield i['balance']
Or if you're not familiar with generators and would like it as a list:
my_list = []
for i in USDT_BAL:
if i['asset'] == 'USDT':
my_list.append(i['balance'])
So now you can see we have a problem.
If we have it as a list comprehension it's in the form of a list with one element.
print(my_list) # ['0.00000000']
We could access it with my_list[0] but that looks ugly IMO but to each it's own.
So that's where the next function comes in.
According to the docs next calls the __next__ method on an iterator (which a generator is) and basically advances the generator.
So if our generator were to produce 1 then 2 then 3, calling next(my_gen) would produce 1 then calling it again would produce 2 and so on.
Since I expect this generator expression to only produce 1 item, I only call it once. Giving it a default of None means, if it's empty, rather than raising an error it will produce None.
So:
next((i['balance'] for i in USDT_BAL if i['asset'] == 'USDT'), None)
creates a generator that iterates over your list, only produces the 'balance' key of dicts who's 'asset' key equals 'USDT' and calls next on that generator with a default of None.

How to include the last element in the for loop? (Run the last element along with all other elements)

I have searched around, and have only found answers to questions like, "How to detect the last element?", or, "How to extract/change/skip to the last element?", which are not quite what I need.
I want to run a for loop of a list, from the first element to the last element of the list, instead of to the second last element. I don't want the last element to be any more special than all other elements; I just want it to be included while I run the for loop.
If there is a small piece of code that I can embed in the function, that would be great. I'm also fine with creating a separate function, then call that function when needed (I'm a newbie, so if you are going to do this approach, please also include how I would call another function within my function).
Here's my code (without running the last element):
**import sympy
from sympy import Q
k,l,m,n=sympy.symbols('k l m n')
Z=sympy.symbols('Z')
sol_list=[]
def findZ(k,l,m,n,Z):
max_loop=len(coeff_list)
for i in range(max_loop): # <- Here's the trouble-maker
k=coeff_list[i][0]
l=coeff_list[i][1]
m=coeff_list[i][2]
n=coeff_list[i][3]
expr=k*Z**3+l*Z**2+m*Z+n
sol=sympy.solve(expr,'Z')
for sol in sol:
if sympy.ask(Q.real(sol))==True:
sol_list.append(sol)
if len(sol_list)>1:
max(sol_list)
remove_dup(sol_list)
return sol_list**
coeff_list=[[1, -1.0000007185951756, 0.2959936217035236, -2.1269958855852493e-07], [1, -1.0000006956368632, 0.27738242112279193, -1.9295743735357098e-07], [1, -1.0000006741001182, 0.26047291179805043, -1.755848206487531e-07],[...]
If it helps, coeff_list is a list within a list, with the length of the coeff_list is 24. Each nested list contains 4 items.

List, tuples or dictionary, differences and usage, How can I store info in python

I'm very new in python (I usually write in php). I want to understand how to store information in an associative array, and if you can explain me whats the difference of "tuples", "arrays", "dictionary" and "list" will be wonderful (I tried to read different source but I still not caching it).
So This is my code:
#!/usr/bin/python3.4
import csv
import string
nidless_keys = dict()
nidless_keys = ['test_string1','test_string2'] #this contain the string to
# be searched in linesreader
data = {'type':[],'id':[]} #here I want to store my information
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader: #every line in this csv have a url like
#www.test.com/?test_string1&id=123456
current_row_string = str(row)
for needle in nidless_keys:
current_needle = str(needle)
if current_needle in current_row_string:
data[current_needle[current_row_string[-8:]]) += 1 # also I
#need to count per every id how much rows there are.
In conclusion:
my_data_stored = [current_needle][current_row_string[-8]]
current_row_string[-8] is a url which the last 8 digit of the url is an ID.
So the array should looks like this at the end of the script:
test_string1 = 123456 = 20
= 256468 = 15
test_string2 = 123155 = 10
Edit 1:
Which type I need here to store the information?
Can you tell me how to resolve this script?

It seems you want to count how many times an ID in combination with a test string occurs.
There can be multiple ID/count combinations associated with every test string.
This suggests that you should use a dictionary indexed by the test strings to store the results. In that dictionary I would suggest to store collections.Counter objects.
This way, you would have to add a special case when a key in the results dictionary isn't found to add an empty Counter. This is a common problem, so there is a specialized form of dictionary in the collections module called defaultdict.
import collections
import csv
# Using a tuple for the keys so it cannot be accidentally modified
keys = ('test_string1', 'test_string2')
result = collections.defaultdict(collections.Counter)
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader:
for key in keys:
if key in row:
id = row[-6:] # ID's are six digits in your example.
# The first index is into the dict, the second into the Counter.
result[key][id] += 1
There is an even easier way, by using regular expressions.
Since you seem to treat every row in a CSV file as a string, there is little need to use the CSV reader, so I'll just read the whole file as text.
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
pattern = r'\?(.*)&id=(\d+)'
The pattern is a regular expression. This is a large topic in and of itself, so I'll only cover briefly what it does. (You might also want to check out the relevant HOWTO) At first glance it looks like complete gibberish, but it is actually a complete language.
In looks for two things in a line. Anything between ? and &id=, and a sequence of digits after &id=.
I'll be using IPython to give an example.
(If you don't know it, check out IPython. It is great for trying things and see if they work.)
In [1]: import re
In [2]: pattern = r'\?(.*)&id=(\d+)'
In [3]: text = """www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=234567
....: www.test.com/?foo&id=234567
....: www.test.com/?foo&id=123456
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234"""
The text variable points to the string which is a mock-up for the contents of your CSV file.
I am assuming that:
every URL is on its own line
ID's are a sequence of digits.
If these assumptions are wrong, this won't work.
Using findall to extract every match of the pattern from the text.
In [4]: re.findall(pattern, test)
Out[4]:
[('test_string1', '123456'),
('test_string1', '123456'),
('test_string1', '234567'),
('foo', '234567'),
('foo', '123456'),
('foo', '1234'),
('foo', '1234'),
('foo', '1234')]
The findall function returns a list of 2-tuples (that is key, ID pairs). Now we just need to count those.
In [5]: import collections
In [6]: result = collections.defaultdict(collections.Counter)
In [7]: intermediate = re.findall(pattern, test)
Now we fill the result dict from the list of matches that is the intermediate result.
In [8]: for key, id in intermediate:
....: result[key][id] += 1
....:
In [9]: print(result)
defaultdict(<class 'collections.Counter'>, {'foo': Counter({'1234': 3, '123456': 1, '234567': 1}), 'test_string1': Counter({'123456': 2, '234567': 1})})
So the complete code would be:
import collections
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
result = collections.defaultdict(collections.Counter)
pattern = r'\?(.*)&id=(\d+)'
intermediate = re.findall(pattern, test)
for key, id in intermediate:
result[key][id] += 1
This approach has two advantages.
You don't have to know the keys in advance.
ID's are not limited to six digits.

A brief summary of the python data types you mentioned:
A dictionary is an associative array, aka hashtable.
A list is a sequence of values.
An array is essentially the same as a list, but limited to basic datatypes. My impression is that they only exists for performance reasons, don't think I've ever used one. If performance is that critical to you, you probably don't want to use python in the first place.
A tuple is a fixed-length sequence of values (whereas lists and arrays can grow).

Lets take them one by one.
Lists:
List is a very naive kind of data structure similar to arrays in other languages in terms of the way we write them like:
['a','b','c']
This is a list in python , but seems very similar to array structure.
However there is a very large difference in the way lists are used in python and the usual arrays.
Lists are heterogenous in nature. This means that we can store any kind of data simultaneously inside it like:
ls = [1,2,'a','g',True]
As you can see, we have various kinds of data within a list and is a valid list.
However, one important thing about them is that we can access the list items using zero based indices. So we can write:
print ls[0],ls[3]
output: 1 g
Dictionary:
This datastructure is similar to a hash map data structure. It contains a (key,Value) pair. An empty dictionary looks like:
dc = {}
Now, to store a key,value pair, e.g., ('potato',3),(tomato,5), we can do as:
dc['potato'] = 3
dc['tomato'] = 5
and we saved the data in the dictionary dc.
The important thing is that we can even store another data structure element like a list within a dictionary like:
dc['list1'] = ls , where ls is the list defined above.
This shows the power of using dictionary.
In your case, you have difined a dictionary like this:
data = {'type':[],'id':[]}
This means that your dictionary will consist of only two keys and each key corresponds to a list, which are empty for now.
Talking a bit about your script, the expression :
current_row_string[-8:]
doesn't make a sense. The index should have been -6 instead of -8 that would give you the id part of the current row.
This part is the id and should have been stored in a variable say :
id = current_row_string[-6:]
Further action can be performed as seen the answer given by Roland.

Need help working with lists within lists

I'm taking a programming class and have our first assignment. I understand how it's supposed to work, but apparently I haven't hit upon the correct terms to search to get help (and the book is less than useless).
The assignment is to take a provided data set (names and numbers) and perform some manipulation and computation with it.
I'm able to get the names into a list, and know the general format of what commands I'm giving, but the specifics are evading me. I know that you refer to the numbers as names[0][1], names[1][1], etc, but not how to refer to just that record that is being changed. For example, we have to have the program check if a name begins with a letter that is Q or later; if it does, we double the number associated with that name.
This is what I have so far, with ??? indicating where I know something goes, but not sure what it's called to search for it.
It's homework, so I'm not really looking for answers, but guidance to figure out the right terms to search for my answers. I already found some stuff on the site (like the statistics functions), but just can't find everything the book doesn't even mention.
names = [("Jack",456),("Kayden",355),("Randy",765),("Lisa",635),("Devin",358),("LaWanda",452),("William",308),("Patrcia",256)]
length = len(names)
count = 0
while True
count < length:
if ??? > "Q" # checks if first letter of name is greater than Q
??? # doubles number associated with name
count += 1
print(names) # self-check
numberNames = names # creates new list
import statistics
mean = statistics.mean(???)
median = statistics.median(???)
print("Mean value: {0:.2f}".format(mean))
alphaNames = sorted(numberNames) # sorts names list by name and creates new list
print(alphaNames)

first of all you need to iter over your names list. To do so use for loop:
for person in names:
print(person)
But names are a list of tuples so you will need to get the person name by accessing the first item of the tuple. You do this just like you do with lists
name = person[0]
score = person[1]
Finally to get the ASCII code of a character, you use ord() function. That is going to be helpful to know if name starts with a Q or above.
print(ord('A'))
print(ord('Q'))
print(ord('R'))
This should be enough informations to get you started with.

I see a few parts to your question, so I'll try to separate them out in my response.
check if first letter of name is greater than Q
Hopefully this will help you with the syntax here. Like list, str also supports element access by index with the [] syntax.
$ names = [("Jack",456),("Kayden",355)]
$ names[0]
('Jack', 456)
$ names[0][0]
'Jack'
$ names[0][0][0]
'J'
$ names[0][0][0] < 'Q'
True
$ names[0][0][0] > 'Q'
False
double number associated with name
$ names[0][1]
456
$ names[0][1] * 2
912
"how to refer to just that record that is being changed"
We are trying to update the value associated with the name.
In theme with my previous code examples - that is, we want to update the value at index 1 of the tuple stored at index 0 in the list called names
However, tuples are immutable so we have to be a little tricky if we want to use the data structure you're using.
$ names = [("Jack",456), ("Kayden", 355)]
$ names[0]
('Jack', 456)
$ tpl = names[0]
$ tpl = (tpl[0], tpl[1] * 2)
$ tpl
('Jack', 912)
$ names[0] = tpl
$ names
[('Jack', 912), ('Kayden', 355)]
Do this for all tuples in the list
We need to do this for the whole list, it looks like you were onto that with your while loop. Your counter variable for indexing the list is named count so just use that to index a specific tuple, like: names[count][0] for the countth name or names[count][1] for the countth number.
using statistics for calculating mean and median
I recommend looking at the documentation for a module when you want to know how to use it. Here is an example for mean:
mean(data)
Return the sample arithmetic mean of data.
$ mean([1, 2, 3, 4, 4])
2.8
Hopefully these examples help you with the syntax for continuing your assignment, although this could turn into a long discussion.
The title of your post is "Need help working with lists within lists" ... well, your code example uses a list of tuples
$ names = [("Jack",456),("Kayden",355)]
$ type(names)
<class 'list'>
$ type(names[0])
<class 'tuple'>
$ names = [["Jack",456], ["Kayden", 355]]
$ type(names)
<class 'list'>
$ type(names[0])
<class 'list'>
notice the difference in the [] and ()
If you are free to structure the data however you like, then I would recommend using a dict (read: dictionary).

I know that you refer to the numbers as names[0][1], names[1][1], etc, but
not how to refer to just that record that is being changed. For
example, we have to have the program check if a name begins with a
letter that is Q or later; if it does, we double the number associated
with that name.
It's not entirely clear what else you have to do in this assignment, but regarding your concerns above, to reference the ith"record that is being changed" in your names list, simply use names[i]. So, if you want to access the first record in names, simply use names[0], since indexing in Python begins at zero.
Since each element in your list is a tuple (which can also be indexed), using constructs like names[0][0] and names[0][1] are ways to index the values within the tuple, as you pointed out.
I'm unsure why you're using while True if you're trying to iterate through each name and check whether it begins with "Q". It seems like a for loop would be better, unless your class hasn't gotten there yet.
As for checking whether the first letter is 'Q', str (string) objects are indexed similarly to lists and tuples. To access the first letter in a string, for example, see the following:
>>> my_string = 'Hello'
>>> my_string[0]
'H'
If you give more information, we can help guide you with the statistics piece, as well. But I would first suggest you get some background around mean and median (if you're unfamiliar).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

writing data to a file in a specific format using Python - python-3.x

Related

Generate a list of strings from another list using python random and eliminate duplicates

Python filter string

How to include the last element in the for loop? (Run the last element along with all other elements)

List, tuples or dictionary, differences and usage, How can I store info in python

Need help working with lists within lists

Categories

Resources