Create JSONL with Python - python-3.x

I can't figure out how to create JSONL using Python3.
test = [{'a': 'b'}, {'a': 'b'}, {'a': 'b'}]
with open("data.json", 'w') as f:
for item in test:
json.dump(item, f)
with open("data.json") as f:
for line in f:
// only 1 line here!
print(line)
// prints
{"a": "b"}{"a": "b"}{"a": "b"}
I've tried using indent option to dump but it appears to make no different and the separators option don't seem to be a great usecase. Not sure what I'm missing here?

Use .write with newline \n
Ex:
import json
test = [{'a': 'b'}, {'a': 'b'}, {'a': 'b'}]
with open("data.json", 'w') as f:
for item in test:
f.write(json.dumps(item) + "\n")
with open("data.json") as f:
for line in f:
print(line)

Related

yaml dump a python dictionary without quotes

when I try to dump a python dictionary to yaml file, I get results with quotes both keys and values.
For example I create a python dictonary:
d = { 'a': '1', 'b': '2' }
with open(attributes_file, "w") as fw:
yaml.dump(dict_attributes, fw)
I get this output:
'a': '1'
'b': '2'
How to remove quotes?
You must convert values to int like this
d2 = {k: int(v) for k,v in d.items()}
and save d2 dict.

How to not quote empty values with csv.QUOTE_NONNUMERIC?

I'm using a dictwriter as follows:
csv.DictWriter(output_file, keys, delimiter=';', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
which gives me the desired output when all keys have non-numeric values:
key1;key2;key3
"value1";"value2";"value3"
now i have keys without values and the dictwriter quotes the empty strings as well:
dic.update(key2=None)
{'key1':'value1', 'key2': None, 'key3':'value3'}
key1;key2;key3
"value1";"";"value3"
what i would like to have is:
key1;key2;key3
"value1";;"value3"
how's that possible? any idea?
i wasn't able to find a trivial solution, so i decided to open the existing file and to replace those "" values:
text = open(filename, 'r')
text = ''.join([i for i in text]) \
.replace('""', '')
x = open(filename,'w')
x.writelines(text)
x.close()
that gave me the desired output
from:
key1;key2;key3
"value1";"";"value3"
to:
key1;key2;key3
"value1";;"value3"
all good.. but.. is there a neater way of doing this?
Use csv.QUOTE_MINIMAL where you have None type in dictionary and use csv.QUOTE_NONNUMERIC elsewhere:
>>> d = {'key1': 'value1', 'key2': None, 'key3': 'value3'}
>>> for k,v in d.items():
if d[k] is None:
with open('test.csv', 'w') as f:
w = csv.DictWriter(f, d.keys(), csv.QUOTE_MINIMAL)
w.writeheader()
w.writerow(d)
else:
with open('test.csv', 'w') as f:
w = csv.DictWriter(f, d.keys(), csv.QUOTE_NONNUMERIC)
w.writeheader()
w.writerow(d)
Output:
key1,key2,key3
value1,,value3

How to rename columns with Pandas? Object has no attribute error when using rename

I am trying to write a simple code in Python3 that takes all xls files in a folder, converts all text to uppercase, combines all the files into one file and save as an xlsx file. This all works. However, I also want to alter the names of the header row using rename. I can't get the code to rename anything I get the following error message:
data.rename(columns={'A 1': 'A1',
AttributeError: 'ExcelFile' object has no attribute 'rename'
Anyone help please? Thanks.
This is my code so far:
all_data = pd.DataFrame()
for f in glob.glob(r'C:\\Test\\*.xls'):
df = pd.read_excel(f)
df = df.applymap(lambda s:s.upper() if type(s) == str else s)
all_data = all_data.append(df, ignore_index=True)
writer = pd.ExcelWriter(r'C:\\Test\\alldata.xlsx', engine='xlsxwriter')
all_data.to_excel(writer)
writer.save()
print("All data in upload folder combined into one file")
files = glob.glob(r'C:\\Test\\*.xls')
for f in files:
os.remove(f)
data = pd.ExcelFile(r'C:\\Test\\alldata.xlsx')
data.rename(columns={'A 1': 'A1',
'A 2': 'B1',
'A 3: 'C1',
}, inplace=True)
data.ExcelFile.save
Try read_excel and to_excel instead of pd.ExcelFile and data.ExcelFile.save respectively. In the code you upload you have also forget the single quote after the 'A:3'. Here is an example:
all_data = pd.DataFrame()
data = pd.read_excel('test.xlsx')
data.rename(columns={'A 1': 'A1',
'A 2': 'B1',
'A 3': 'C1',
}, inplace=True)
data.to_excel("output.xlsx")

python: pickle can't to store local function (Can't pickle local object)

In my class method I use count to generate unique IDs, which I want store within the class:
from collections import Counter, defaultdict
import pickle
class A:
def __init__(self):
self.models = {}
def func1(self, data, names):
ll = []
for n, seq_list in enumerate(data):
__counter = count(start=0)
def _genid():
return next(__counter)
__ids = defaultdict(_genid)
# Here we process seq_list and fill __ids
# ...
self.models[name] = [None, None, None, __ids]
def save_models(self, obj, filename="models.bin"):
try:
with open(filename, 'wb') as f:
pickle.dump(obj, f)
except IOError as e:
raise Exception("Failed to open file for writing: %s" % e)
...
data = [['A', 'BB', 'B', 'B', 'B'],
[['C', 'D', 'E', 'B'], ['E', 'B'], ['C', 'D', 'E', 'B', 'B', 'F']],
['A', 'G'],
[['AA', 'C', 'B', 'B'], ['F', 'D'], ['BB', 'E', 'E', 'F', 'B', 'A']]]
names = ['name1', 'name2', 'name3', 'name4'
aclass = A()
aclass.func1(data, names)
aclass.save_models(aclass)
At this point save_models() generates error: AttributeError: Can't pickle local object 'A.func1.._genid'
So I have two questions:
- Is there a way to have pickle save my class?
- If not, if there is easier way to store ids in the class and be able to pickle it?
Option 1: Avoid _genid and use a simple integer variable for counting because pickle module doesn't support local functions.
Option 2: Use dill for pickling (replace import pickle with import dill as pickle)
Reference: Can Python pickle lambda functions?
The class cannot be pickled as is, but can be easily changed to make it pickle'able. Directly use __counter.__next__ instead of the _genid local wrapper:
__ids = defaultdict(__counter.__next__)
All of defaultdict, the count and its __next__ method can be pickled, allowing the resulting object to be pickled as well.

Extracting Rows by specific keyword in Python (Without using Pandas)

My csv file looks like this:-
ID,Product,Price
1,Milk,20
2,Bottle,200
3,Mobile,258963
4,Milk,24
5,Mobile,10000
My code of extracting row is as follow :-
def search_data():
fin = open('Products/data.csv')
word = input() # "Milk"
found = {}
for line in fin:
if word in line:
found[word]=line
return found
search_data()
While I run this above code I got output as :-
{'Milk': '1,Milk ,20\n'}
I want If I search for "Milk" I will get all the rows which is having "Milk" as Product.
Note:- Do this in only Python don't use Pandas
Expected output should be like this:-
[{"ID": "1", "Product": "Milk ", "Price": "20"},{"ID": "4", "Product": "Milk ", "Price": "24"}]
Can anyone tell me where am I doing wrong ?
In your script every time you assign found[word]=line it will overwrite the value that was before it. Better approach is load all the data and then do filtering:
If file.csv contains:
ID Product Price
1 Milk 20
2 Bottle 200
3 Mobile 10,000
4 Milk 24
5 Mobile 15,000
Then this script:
#load data:
with open('file.csv', 'r') as f_in:
lines = [line.split() for line in map(str.strip, f_in) if line]
data = [dict(zip(lines[0], l)) for l in lines[1:]]
# print only items with 'Product': 'Milk'
print([i for i in data if i['Product'] == 'Milk'])
Prints only items with Product == Milk:
[{'ID': '1', 'Product': 'Milk', 'Price': '20'}, {'ID': '4', 'Product': 'Milk', 'Price': '24'}]
EDIT: If your data are separated by commas (,), you can use csv module to read it:
File.csv contains:
ID,Product,Price
1,Milk ,20
2,Bottle,200
3,Mobile,258963
4,Milk ,24
5,Mobile,10000
Then the script:
import csv
#load data:
with open('file.csv', 'r') as f_in:
csvreader = csv.reader(f_in, delimiter=',', quotechar='"')
lines = [line for line in csvreader if line]
data = [dict(zip(lines[0], l)) for l in lines[1:]]
# # print only items with 'Product': 'Milk'
print([i for i in data if i['Product'].strip() == 'Milk'])
Prints:
[{'ID': '1', 'Product': 'Milk ', 'Price': '20'}, {'ID': '4', 'Product': 'Milk ', 'Price': '24'}]

Resources