JSON to CSV with UTF8 encoding in Python

JSON to CSV with UTF8 encoding in Python - python-3.x

I'm working with analysis of feelings and after having got twitter data with twython and saving them in a txt file in json format, I need to write them in CSV format. I can do this but special characters are not written, for example "Inclusão" is written "Inclus \ xc3 \ xa3o"
here is the code:
import json
from csv import writer
with open('data.txt') as data_file:
data = json.load(data_file)
tweets = data['statuses']
#variables
times = [tweet['created_at'] for tweet in tweets]
users = [tweet['user']['name'] for tweet in tweets]
texts = [tweet['text'] for tweet in tweets]
#output file
out = open('tweets_file.csv', 'w')
print(out, 'created,user,text')
rows = zip(times,users,texts)
csv = writer(out)
for row in rows:
values = [value.encode('utf8') for value in row]
csv.writerow(values)
out.close()

i already solved the problem guys, thank you! the problem is that my text has already been encoded and i was trying to do this again.

Related

How to extract the specific numbers or text using regex in python?

I have written the code to extract the numbers and the company name from the extracted pdf file.
sample pdf content:
#88876 - Sample1, GTRHEUSKYTH, -99WED,-0098B
#99945 - SAMPLE2, DJWHVDFWHEF, -8876D,-3445G
The above example is what my pdf file contains. I wanted to extract the App number which is after # (i.e) five numbers(88876) and App name which is after the (-) (i.e) Sample1. An write that to an excel file as separate columns which is App_number and App_name.
Please refer the below code which I have tried.
import PyPDF2, re
import csv
for k in range(1,100):
pdfObj = open(r"C:\\Users\merge.pdf",'rb')
object = PyPDF2.PdfFileReader("C:\\Users\merge.pdf")
pdfReader = PyPDF2.PdfFileReader(pdfObj)
NumPages = object.getNumPages()
pdfReader.numPages
for i in range(0, NumPages):
pdfPageObj = pdfReader.getPage(i)
text = pdfPageObj.extractText()
x=re.findall('(?<=#).[0-9]+', text)
y=re.findall("(?<=\- )(.*?)(?=,)", text)
print(x)
print(y)
with open("out.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(x)
Please pour some suggestions.

Try this:
text = '#88876 - Sample1, GTRHEUSKYTH'
App_number = re.search('(?<=#).[0-9]+', text).group()
App_name = re.search("(?<=\- )(.*?)(?=,)", text).group()
In the first regex you get the first consecutive digits after #, in the second one you get everything between - and ,
Hope it helped

How to Append List in Python by reading csv file

I am trying to write a simple program that should give the following output when it reads csv file which contains several email ids.
email_id = ['emailid1#xyz.com','emailid2#xyz.com','emailid3#xyz.com'] #required format
but the problem is the output I got is like this following:
[['emailid1#xyz.com']]
[['emailid1#xyz.com'], ['emailid2#xyz.com']]
[['emailid1#xyz.com'], ['emailid2#xyz.com'], ['emailid3#xyz.com']] #getting this wrong format
here is my piece of code that I have written: Kindly suggest me the correction in the following piece of code which would give me the required format. Thanks in advance.
import csv
email_id = []
with open('contacts1.csv', 'r') as file:
reader = csv.reader(file, delimiter = ',')
for row in reader:
email_id.append(row)
print(email_id)
NB.: Note my csv contains only one column that has email ids and has no header. I also tried the email_id.extend(row) but It did not work also.

You need to move your print outside the loop:
with open('contacts1.csv', 'r') as file:
reader = csv.reader(file, delimiter = ',')
for row in reader:
email_id.append(row)
print(sum(email_id, []))
The loop can also be like this (if you only need one column from the csv):
for row in reader:
email_id.append(row[0])
print(email_id)

Save text in JSON format from Python Selenium

I am trying to scrape data from a webpage and save the scraped text in JSON format.
I have reached until the step where i can gather text which i want but then i cant save it in expected format. Csv or txt format is also sufficient if possible
Please help me how to save scraped text in JSON. Here is my code which i have extracted
for k in range(0, len(op3)):
selectweek.select_by_index(k)
table = driver.find_element_by_xpath("//table[#class='list-table']")
for row in table.find_elements_by_xpath('//*[#id="dvFixtureInner"]/table/tbody/tr[2]/td[6]/a'):
row.click()
mainpage = driver.window_handles[0]
print(mainpage)
popup = driver.window_handles[1]
driver.switch_to.window(popup)
time.sleep(3)
#Meta details of match
team1 = driver.find_element_by_xpath('//*[#id="match-details"]/div/div[1]/div/div[2]/div[1]/div[1]/a') #Data to save
team2 = driver.find_element_by_xpath('//*[#id="match-details"]/div/div[1]/div/div[2]/div[3]/div[1]/a') #Data to save
ht = driver.find_element_by_xpath('//*[#id="dvHTScoreText"]') #Data to save
ft = driver.find_element_by_xpath('//*[#id="dvScoreText"]') #Data to save

Create dictionary and convert it into JSON format using json module.
import json
dictionary = {"team1" : team1, "team2": team2, "ht": ht, "ft": ft}
json_dump = json.dumps(dictionary)
with open("YourFilePath", "w") as f:
f.write(json_dump)

You can create a dictionary and add key-value to it. I don't know the structure of the json but this can give an idea:
json_data = dict()
ht = 1
ft = 2
json_data["team1"] = {"ht": ht, "ft": ft}
print(json_data)
>>> {'team1': {'ht': 1, 'ft': 2}}

Jupyter Notebooks Python - Import text from text path as string

I'm able to import text as a string. I understand also read_csv.
with open('text.txt', 'r') as file:
text = file.read().replace('\n', '')
My question is if I data frame with many records, and I have the text location. How can bulk import text as strings to a new column?
Example data frame:
Filename,Text Path
File1,C:\Text\File1.txt
File2,C:\Text\File2.txt
File3,C:\Text\File3.txt
Example Result:
Filename,Text Path,Text
File1,C:\Text\File1.txt,This is some text.
File2,C:\Text\File2.txt,Other kinds of text.
File3,C:\Text\File3.txt,Even more text.

I'm not aware of any library that can do this directly. I think you need to step through each row of the dataframe and add the text to a new column. Assuming you are using pandas and your example dataframe is "df":
for i in range(len(df['Text Path'])):
with open(df.loc[i,'Text Path'], 'r') as file:
df.loc[i,'Text'] = file.read()
EDIT:
this could be a bit faster (apply a function to generate the new column):
def readtxt(f):
with open(f, 'r') as file:
return file.read()
df['Text'] = df['Text Path'].apply(readtxt)

Extract numbers and text from csv file with Python3.X

I am trying to extract data from a csv file with python 3.6.
The data are both numbers and text (it's url addresses):
file_name = [-0.47, 39.63, http://example.com]
On multiple forums I found this kind of code:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines,)
But this works for numbers only, the url addresses are read as NaN.
If I add dtype:
data = numpy.genfromtxt(file_name, delimiter=',', skip_header=skiplines, dtype=None)
The url addresses are read correctly, but they got a "b" at the beginning of the address, such as:
b'http://example.com'
How can I remove that? How can I just have the simple string of text?
I also found this option:
file = open(file_path, "r")
csvReader = csv.reader(file)
for row in csvReader:
variable = row[i]
coordList.append(variable)
but it seems it has some issues with python3.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

JSON to CSV with UTF8 encoding in Python - python-3.x

i already solved the problem guys, thank you! the problem is that my text has already been encoded and i was trying to do this again.

Related

How to extract the specific numbers or text using regex in python?

How to Append List in Python by reading csv file

Save text in JSON format from Python Selenium

Jupyter Notebooks Python - Import text from text path as string

Extract numbers and text from csv file with Python3.X

Categories

Resources