Getting data from Excel into a Shelve (Python3) - excel

I have come across a Python module for reading Apple serial numbers and gift ving back the product type. Please see Mac Model Shelf
It's very powerful and has a vast a catalogue of product types - some stuff going well back into the 90's. Slightly overkill for my purposes. The results it gives back tend to be a little vague (for new macs at least). No processor type, speed etc.
I have decided to re-write it slightly with my own database of serial numbers and corresponding machine types.
E.g "W88010010P2" will give back "Black Macbook 2008 2GHz - PG 1001", for the sake of the example. The 'PG' stands for Product Group, the reference code I use to find identical macs in my Filemaker database.
import shelve
databaseOfMacs = shelve.open("macmodelshelfNEW")
inputSerial = "W88010010P2"
modelCodeIsolatedFromSerial = ""
#extracting the model code. 3 digits for older macs, 4 for newer
#newer macs have 12 digits, older ones just 11
if len(inputSerial) == 12:
modelCodeIsolatedFromSerial = inputSerial[-4:]
elif len(inputSerial) == 11:
modelCodeIsolatedFromSerial = inputSerial[-3:]
#setting a key-value pair, for the sake of example
databaseOfMacs['0P2'] = "Black Macbook 2008 2GHz - PG 1001"
model = databaseOfMacs[modelCodeIsolatedFromSerial]
print model
This will produce as the output...
Black Macbook 2008 2GHz - PG 1001
Process finished with exit code 0
Adding key-value pairs inside the script is not practical. I have started to build up an Excel (xlsx) file of the key-value pairs of model codes and their actual descriptions. Just two columns.
A B
0P2 Black Macbook 2008 2GHz - PG 1001
G8WP MacBook Pro 205Ghz i7 (Retina, 15-inch, Mid 2015) - PG 786
I have searched online on SO but cannot find a clean way of getting this data into the shelve file. The suggested solutions are importing the values into a dictionary and after the fact, importing the dictionary into the shelve. I am getting stuck on just the first part, errant '0's are getting into the dictionary, based on python creating dictionary from excel
,stackoverflow
If this could be condensed into a single step it would change everything!
Thanks for any help which is given.
UPDATE
I think I have figured it out myself... just the getting the excel data into a shelve...
import pandas as pd
import shelve
excelDict = pd.read_excel('serials_and_models.xlsx', header=None, index_col=0, squeeze=True).to_dict()
excelMacDataBaseShelve = shelve.open("excelMacDataBaseShelve")
excelMacDataBaseShelve.update(excelDict)
# to verify all is well
for key in excelMacDataBaseShelve:
print(key, excelMacDataBaseShelve[key])
The wonderful thing about shelves is that I can just update the excel file as I go and when I need to retrieve some data via the python script it will always be up to date.
If anybody can point out something I've done wrong or could perhaps improve, please leave a comment!!

Related

Daily anomalies using climate data operator (CDO) in Cygwin (Windows 10)

I'm complete beginner with Cywing and CDO, which both have been installed in Windows 10. I'm working with 3 variables from ERA 5 Land hourly data: 2m temperature, total precipitation and runoff. Some facts about these vars:
three variables are in netCDF format.
2m temperature: contains hourly values and its units are in Kelvin.
total precipitation and runoff: contains hourly values and their units are depth in metres.
I want to obtain daily anomalies of 2017 arising from 30-year period (1981-2010). This post gave me a general idea what to do but I'm not pretty sure how to replicate. Intuitively, I think this would be the setps:
Convert units according to each var (e.g. K to C for 2m temperature, metres to mm for total precipitation)
Convert data from hourly to daily values
Obtain mean values for 2017 data and 1981-2010 data
Substract: 30-year mean values minus 2017 mean value
Download the file containing 2017 anomalies
Not sure about the order of procedures.
What the coding would be like in Cygwin terminal?
before you start I would recommend strongly to abandon cygwin and install the linux subsystem under windows (i.e. not parallel boot), if you do a quick search you will see that it is very easy to install ubuntu directly within windows itself, that way you can open a linux terminal and easily install anything you want with sudo apt install , e.g.
sudo apt install cdo
Once you have done that to answer some of your questions:
Convert units according to each var (e.g. K to C for 2m temperature, metres to mm for total precipitation)
e.g. to convert temperature:
cdo subc,273.15 in.nc out.nc
similar for rain using mulc [recall that this doesn't change the metadata "units", you need to use nco for that]
Convert data from hourly to daily values
for instantaneous fields like temperature
cdo daysum in.nc daymean.nc
for flux field (like rain)
cdo daymean -shifttime,-1hour in.nc raindaymean.nc
Obtain mean values for 2017 data and 1981-2010 data.
cdo selyear,2017 -yearmean in.nc year2017_anom.nc
Substract: 30-year mean values minus 2017 mean value
Erm, usually you want to do this the other way round no? 2017-long term mean, so you can see if it is warmer or cooler?
cdo sub year2017_anom.nc -timmean alldata_daymean.nc
Download the file containing 2017 anomalies
I don't understand this question, haven't you already downloaded the hourly data from the CDS platform ? This question only makes sense if you are using the CDS toolbox, which doesn't seem to be the case - anyway, for the downloading step, if this is not clear then you can take a look at my video on this topic on my youtube channel here: https://www.youtube.com/watch?v=AXG97K6NYD8&t=469s

Can't find migration biosphere-2-3-categories

I import an Excel database:
imp = bw.ExcelImporter(os.path.join("myfile.xls"))
And then apply strategies to it:
imp.apply_strategies()
But this issue arises:
AssertionError: Can't find migration biosphere-2-3-categories
I would like to understand what's happening and actually what is this "migration biosphere 2-3"?
My exchanges in my excel file concern only biosphere-3... but I suppose it doesn't have much to do with it.
Running the function bw2io.bw2setup calls the function bw2io.migrations.create_core_migrations, which installs sets of metadata to translate from one nomenclature system to another. If you can't find this migration data, then you need to import and call create_core_migrations.
You can see the actual changes in the names of elementary flow categories from ecoinvent version 2 to version 3 here.

Addressing Reliable Output in Newspaper3k

Current Behavior:
In attempting to use the News-aggregator package Newspaper3k , I am unable to produce consistent/reliable output.
System/Environment Setup:
Windows 10
Miniconda3 4.5.12
Python 3.7.1
Newspaper3k 0.2.8
Steps (Code) to Reproduce:
import newspaper
cnn_paper = newspaper.build('http://cnn.com')
print(cnn_paper.size())
Expected Behavior/Output (varies based on current links posted on cnn):
Produce consistent number of posted links on cnn on consecutive Print output runs.
Actual Behavior/Output
Running the code the first time produces a different number of links than code run immediately after.
1st Run Print output: 94 (as of time of posting this question)
2nd Run Print output: 0
3rd Run Print output: 18
4th Run Print output: 7
Printing the actual links will vary the same way as the above link count print. I have tried using a number of different news sources, and the same unexpected variance results. Do I need to change my User-Agent Header? Is this a detection issue? How do I produce reliable results?
Any help would be much appreciated.
Thanks.
My issue was resolved by better understanding of the default caching found under the heading 6.1.3 Article caching in the user documentation .
Apart from my general ignorance, my confusion came from the fact that the read the docs 'Documentation' listed the caching function as a TODO as can be seen here
Upon better scrutiny, I discovered:
By default, newspaper caches all previously extracted articles
andeliminates any article which it has already ex-tracted.This feature
exists to prevent duplicate articles and to increase extraction speed.
The return value of cbs_paper.size()changes from 1030 to 2 because
when we first crawled cbs we found 1030 articles. However, on our
second crawl, we eliminate all articles which have already been
crawled. This means 2 new articles have been published since our first
extraction.
You may opt out of this feature with the
memoize_articlesparameter.
You may also pass in the lower
level ‘‘Config‘‘ objects as covered in the advanced section.
>>>import newspaper
>>>cbs_paper = newspaper.build('http://cbs.com', memoize_articles=False)
>>>cbs_paper.size()1030

Reading a grib2 message into an Iris cube

I am currently exploring the notion of using iris in a project to read forecast grib2 files using python.
My aim is to load/convert a grib message into an iris cube based on a grib message key having a specific value.
I have experimented with iris-grib, which uses gribapi. Using iris-grib I have not been to find the key in the grib2 file, althrough the key is visible with 'grib_ls -w...' via the cli.
gribapi does the job, but I am not sure how to interface it with iris (which is what, I assume, iris-grib is for).
I was wondering if anyone knew of a way to get a message into an iris cube based on a grib message key having a specific value. Thank you
You can get at anything that the gribapi understands through the low-level grib interface in iris-grib, which is the iris_grib.GribMessage class.
Typically you would use for msg in GribMessage.messages_from_filename(xxx): and then access it like e.g. msg.sections[4]['productDefinitionTemplateNumber']; msg.sections[4]['parameterNumber'] and so on.
You can use this to identify required messages, and then convert to cubes with iris_grib.load_pairs_from_fields().
However, Iris-grib only knows how to translate specific encodings into cubes : it is quite strict about exactly what it recognises, and will fail on anything else. So if your data uses any unrecognised templates or data encodings it will definitely fail to load.
I'm just anticipating that you may have something unusual here, so that might be an issue?
You can possibly check your expected message contents against the translation code at iris_grib:_load_convert.py, starting at the convert() routine.
To get an Iris cube out of something not yet supported, you would either :
(a) extend the translation rules (i.e. a Github PR), or
(b) sometimes you can modify the message so that it looks like something
that can be recognised.
Failing that, you can
(c) simply build an Iris cube yourself from the data found in your GribMessage : That can be a little simpler than using 'gribapi' directly (possibly not, depending on detail).
If you have a problem like that, you should definitely raise it as an issue on the github project (iris-grib issues) + we will try to help.
P.S. as you have registered a Python3 interest, you may want to be aware that the newer "ecCodes" replacement for gribapi should shortly be available, making Python3 support for grib data possible at last.
However, the Python3 version is still in beta and we are presently experiencing some problems with it, now raised with ECMWF, so it is still almost-but-not-quite achievable.

How to compare data between a database and a guide which are differently structured?

A rather complicated problem in data exchange between a database and a bookform:
The organisation in which I work has a database in mysql for all social profit organisations in Brussels, Belgium. At the same time there is a booklet created in Indesign which was developed in a different time and with different people than the database and consequently has a different structure.
Every year a new book is published and the data needs to be compared manually because of this difference in structure. The book changes its way of displaying entries according to the need of a chapter. It would help to have a crossplatform search and change tool, best not with one keyword but with all the relevant data for an entry in the book.
An example of an entry in the booklet:
BESCHUTTE WERKPLAATS BOUCHOUT
Neromstraat 26 • 1861 Wolvertem • Tel 02-272 42 80 • Fax 02-269 85 03 • Gsm 0484-101 484 E-mail info#bwbouchout.be • Website www.bwbouchout.be Werkdagen: 8u - 16u30, vrijdag tot 14u45.
Personen met een fysieke en/of verstandelijke handicap. Ook psychiatrische patiënten en mensen met een meervoudige handicap.
Capaciteit: 180 tewerkstellingsplaatsen.
A problem: The portable phone number is written in another format as in the database. The database would say: 0484 10 14 84 the book says: 0484-101 484
The opening times are formulated completely different, but some of it is similar.
Are there tools which would make life easier? Tools where you would be able to find similar data something like: similar data finder for excel but then cross platform and with more possibilities? I believe most data exchange programs work very "one-way same for every entry". Is there a program which is more flexible?
For clarity: I need to compare the data, not to generate the data out of the database.
It could mean saving a lot of time, money and eyestrain. Thanks,
Erik Willekens
Erik,
The specific problem of comparing two telephone number which are formatted differently is relatively easy to overcome by stripping all non-numeric characters.
However I don't think that's really what you are trying to achieve. I believe you're attempting to compare whether the booklet data is different to the database data but disregard certain formatting.
Realistically this isn't possible without having some very well defined rules on the formatting. For instance formatting on the organisation name is probably very significant whereas telephone number formatting is not.
Instead you should be tracking changes within the database and then manually check the booklet.
One possible solution is to store the booklet details for each record in your database alongside the correctly formatted ones. This allows you to perform a manual conversion once for the entire booklet and then each subsequent year lets you just compare the new booklet values to the old booklet values stored in the DB.
An example might make this clearer. Imagine you had this very simple record:
Org Name Booklet Org Name GSM Booklet GSM
-------- ---------------- --- -----------
BESCHUTTE BESCHUTTE WERKP 0484 10 14 84 0484-101 484
When you get next year's booklet, then as long as the GSM number in the new booklet still says 0484-101 484 you won't have to worry about converting it to your database format and then checking to see if it has changed.
This would not be a good approach if a large proportion of details in the booklet changed each year

Resources