Python: Universal XML parser - python-3.x

I'm trying to make simple Python 3 program to read weather information from XML web source, convert it into Python-readable object (maybe dictionary) and process it (for example visualize multiple observations into graph).
Source of data is national weather service's (direct translation) xml file at link provided in code.
What's different from typical XML parsing related question in Stack Overflow is that there are repetitive tags without in-tag identificator (<station> tags in my example) and some with (1st line, <observations timestamp="14568.....">). Also I would like to try parse it straight from website, not local file. Of course, I could create local temporary file too.
What I have so far, is simply loading script, that gives string containing xml code for both forecast and latest weather observations.
from urllib.request import urlopen
#Read 4-day forecast
forecast= urlopen("http://www.ilmateenistus.ee/ilma_andmed/xml/forecast.php").read().decode("iso-8859-1")
#Get current weather
observ=urlopen("http://www.ilmateenistus.ee/ilma_andmed/xml/observations.php").read().decode("iso-8859-1")
Shortly, I'm looking for as universal as possible way to parse XML to Python-readable object (such as dictionary/JSON or list) while preserving all of the information in XML-file.
P.S I prefer standard Python 3 module such as xml, which I didn't understand.

Try xmltodict package for simple conversion of XML structure to Python dict: https://github.com/martinblech/xmltodict

Related

Read temperature, humdity, etc from grib2 files with EECodes in python3

I am trying to use EECodes in python to get various weather information, such as temperature, humidity, etc out of grib2 files. I am using the GFS files. I would like to be able to extract the data as (lat,lon,alt,$data_point), and as a 2d array for each altitude.
I have tried the example programs located here: https://confluence.ecmwf.int/display/ECC/grib_iterator_bitmap
I can't figure out what I am looking in the output of that program. When I load the messages using their keys, it is not obvious how to make a grid. When I load the grid, the data doesn't have labels I understand.
#craeft have a look to https://github.com/ecmwf/cfgrib. cfgrib is the new standard for python and grib file handling. It is easy to install and easy to access files. Please install the latest version because it supports GFS files.

Python 3: Read text file that is in list format

I have one large text file that contains data in the form of a list and its just in one line. See examples
Text file contents: [{"input": "data1"}, {"input": "data2"}, {"input": "data2"}]
I am reading this file using python 3 and when I use the read() method, I get one large string however I want to convert this string to a list while maintaining the same format that is in the text file. Is there anyway that this can be achieved? Most of the posts talk about using the split method to achieve this which does not work for this case.
In JavaScript I generally use the stringify and parse methods to do these kinds of conversions but I am not able to find this in python. Any help will be appreciated. Thank you.
You can load json from a a file using Python's built-in json package.
>>> import json
>>> with open('foo.json') as f:
... data = json.load(f)
...
>>> print(data)
[{'input': 'data1'}, {'input': 'data2'}, {'input': 'data2'}]

Extracting title from pdf using pypdf2 not working

I'm trying to extract the title of PDF files using pyPDF2. The output is either none or a wrong title. I tried using PDFminer as well, still the same result. I tried using 3 different pdf files. Is there a better way to extract the title with better accuracy?
This is the code I used:
from PyPDF2 import PdfFileReader
def get_pdf_title(pdf_file_path):
pdf_reader = PdfFileReader(open(pdf_file_path, "rb"))
return pdf_reader.getDocumentInfo().title
title = get_pdf_title('C:/PythonPrograms/Test.pdf')
print(title)
Your code is working, at least for me on python 3.5.2. Check in the PDF properties that he indeed has a title.
PDF's title is part of its metadata, that needs to be set. It is not mandatory, not related to its content (other than by the will of the person writing it), nor with its filename.
If you use your snippet on a file with no title, it's output will be an empty string.

Receive CSV format in Strongloop

I'm trying to receive some data in csv format, what I read is that StrongLoop only works with json data. So can I receive csv and transform to json to process the data?
Thanks.
This isn't a StrongLoop specific question. It is a general Node.js and data question. As such, I will answer in a generic fashion, but it is applicable to StrongLoop.
You will need to use a library to convert the delimited file into a JavaScript object. There are many packages on npm for reading/parsing/transforming/etc. CSV files: search npm.
The package that I have used extensively is David's CSV parser.
These libraries will allow you to parse and transform CSV into JavaScript objects (JSON).
Beware, however, that most CSV that I have dealt with does not conform to well formatted CSV. They don't properly escape quotes, quote strings with delimiters, etc.

How do i display the data contents of my xml file using pyqt4?

I am trying to build a tiny app to read from an xml file and display on a widget. I don't know which widget to use exactly; QTextBrowser, QTextedit and QWebView. I can't seem to find a good explanation. Please help as much as you can. Before i get, I'm so new to Python, PyQt and my programming ain't good at all.
I suggest you first interprete the xml content into a dom object, and then show whatever you want from that object into your widget. For the first part (detailed info here):
from xml.dom import minidom
dom = minidom.parse('my_xml.xml')
print(dom.toxml()) # .toxml() creates a string from the dom object
def print_some_info(node):
print('node representation: {0}'.format(node))
print('.nodeName: ' + node.nodeName)
print('.nodeValue: {0}'.format(node.nodeValue))
for child in node.childNodes:
print_some_info(child)
print_some_info(child)
(using e.g. an xml example in file 'my_xml.xml' from here)

Resources