I'm processing time series from some instruments, and the numbers came in different patterns.
Sometimes I have the time as year, month, day, hour, etc., and I create the datetime array/list directly. When I print one element of the array (or list) created, I have something like this
datetime.datetime(2018, 4, 6, 12, 0, 0)
2018-04-06 12:00:00
And when I use it with matploblib.pcolormesh, it works.
However, now I have the time as 'seconds since 1970/1/1'. My first try was doing this
And I can use the array time_dt with plt.plot(), but when I use it with plt.pcolormesh() I got
TypeError: Incompatible X, Y inputs to pcolormesh; see help(pcolormesh)
After checking and re-checking everything, the only difference was the way I create the datetime array using matplotlib.dates, I guess. When I create a time list using this
everything goes fine! In the present case, I can go this way. But in other cases I can't create the time array and must convert from the files from variable sources... I need to understand what's going on.
What's happening? What do I missing?
Working further with this data... I found this other strange behavior!
I have the time_dt, created with matplotlib.dates.num2date()
and I have the time_dt2, created just with datetime.datetime()
Making a plot for the same variable with the two time arrays, I got this
So, it seems that time_dt and time_dt2 are about the same.
When I do this
This is my expected result from the begging. But... when I do this
The pcolormesh is gone, no error! Quite weird...
Related
I have a timestamp embedded in some JSON data as a string, for ease of inspection and modification. An example looks like this:
"debug_time": 1670238819.9747384,
"last_saved": "2022-12-05 11:13:39.974725 UTC",
When loaded back in, I need to convert it back to a float for comparison against time.time() and similar things, however, I can't seem to find the magic incantations to have it restore the correct value.
In restoring the JSON data, I attempt to convert the string to a float via strptime() like this:
loaded_time = datetime.datetime.strptime(obj.last_saved, '%Y-%m-%d %H:%M:%S.%f %Z')
This does restore the timestamp to a valid datetime object, however calling .tzname() results in None, and my attempts to use loaded_time.replace(tzinfo=zoneinfo.ZoneInfo('UTC')) have not yielded any useful results.
In short, emitting loaded_time.timestamp() yields 1670267619.974725, which is 8 hours ahead of what it should be. I've tried using .astimezone(), in various permutations, but can't find a way to correctly have it convert to the client's local time.
I even tried to hard-code in my own time zone US/Pacific but it stubbornly refuses to give me that original debug_time value back.
This doesn't seem like it should be a difficult problem, but clearly I'm misunderstanding something about how python 3's time handling works. Any ideas are welcome!
Thank you for your time!
you have to use built in function replace like
.strftime("%Y-%m-%dT%H:%M:%S.%f%Z").replace("UTC", "")
I'm studying how to work with data right now and so I'm following along with a tutorial for working with Time Series data. Among the first things he does is read_csv on the file path and then use squeeze=True to read it as a Series. Unfortunately (and as you probably know), squeeze has been depricated from read_csv.
I've been reading documentation to figure out how to read a csv as a series, and everything I try fails. The documentation itself says to use pd.read_csv('filename').squeeze('columns') , but, when I check the type afterward, it is always still a Dataframe.
I've looked up various other methods online, but none of them seem to work. I'm doing this on a Jupyter Notebook using Python3 (which the tutorial uses as well).
If anyone has any insights into why I cannot change the type in this way, I would appreciate it. I'm not sure if I've misunderstood the tutorial altogether or if I'm not understanding the documentation.
I do literally type .squeeze("columns") when I write this out because when I write a column name or index, it fails completely. Am I doing that correctly? Is this the correct method or am I missing a better method?
Thanks for the help!
shampoo = pd.read_csv('shampoo_with_exog.csv',index_col= [0], parse_dates=True).squeeze("columns")
I would start with this...
#Change the the stuff between the '' to the entire file path of where your csv is located.
df = pd.read_csv(r'c:\user\documents\shampoo_with_exog.csv')
To start this will name your dataframe as df which is kind of the unspoken industry standard the same as pd for pandas.
Additionally, this will allow you to use a "raw" (the r) string which makes it easier to insert directories into your python code.
Once you are are able to successfully run this you can simply put df in a separate cell in jupyter. This will show you what your data looks like from your CSV. Once you have done all that you can start manipulating your data. While you can use the fancy stuff in pd.read_csv() I mostly just try to get the data and manipulate it from the code itself. Obviously there are reasons not to only do a pd.read_csv but as you progress you can start adding things here and there. I almost never use squeeze although I'm sure there will be those here to comment stating how "essential" it is for whatever the specific case might be.
Long time listener first time caller. I am new to Python, about 3 days into this and I cannot figure out for the life of me, what is happening in this particular instance.
I brought in an XLSX file as a dataframe called dfInvoice. I want to use groupby on two columns (indexes?) but something funky is happening I think. I can't see my new grouped dataframe with the code below.
uniqueLocation = dfInvoice.groupby(['Location ID','Location'])
When I call uniqueLocation, all that is returned is this:
<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001B9A1C61198>
I have two questions from here.
1) what the heck is going on? I followed these steps almost identically to this (https://www.geeksforgeeks.org/python-pandas-dataframe-groupby).
2) this string of text between the carets, what should I refer to this as? I didn't know how to search for this happening because I don't exactly understand what this return is.
Want to only use the first email if multiple are added. Made a function which looks for ',', if it finds it, displays a message and returns the first email.
For a strange reason, it seems to loop through the dataframe twice when using 'applymap', because it prints the message twice.
When I use the 'apply' function on the series, it -as expected-prints out once. Any idea why is this discrepancy?
From the documentation, version 0.25.0, I quote :
Notes
In the current implementation applymap calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.
An external R library uses the as.character function to convert objects of different classes to strings. That works fine except for my date objects (classes "POSIXlt" and "POSIXt"): Normally the output is like "2010-11-04 10:43:00" (which is perfect) but every time when time is 00:00:00 (midnight) the time component is ommited and only the date component is shown like "2010-11-04". But for further processing I need a consistent output format. So the time component should be displayed in any case.
I can't simply use the format function because the external library does the call. So I thought that overwriting the as.character function for the classes "POSIXlt" and "POSIXt" could be a solution but I don't know how. Other ideas are welcome :)
You can overwrite the as.character method for POSIXct objects simply by creating your own.
as.character.POSIXct <- function(x, format="%Y-%m-%d %H:%M:%S", ...)
format(x, format=format, ...)
In this case though, there is no existing as.character.POSIXct so you're not actually overwriting anything. You are, however, overriding the default as.character.POSIXt method, which is what would be called in the absence of a POSIXct method.