requests.get() receives a response which is of the type bytes. It looks like:
b'{"Close":8506.25,"DownTicks":164,"DownVolume":207,"High":8508.25,"Low":8495.00,"Open":8496.75,"Status":13,"TimeStamp":"\\/Date(1583530800000)\\/","TotalTicks":371,"TotalVolume":469,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":207,"UpVolume":262,"OpenInterest":0}\r\n{"Close":8503.00,"DownTicks":152,"DownVolume":203,"High":8509.50,"Low":8502.00,"Open":8506.00,"Status":13,"TimeStamp":"\\/Date(1583531100000)\\/","TotalTicks":282,"TotalVolume":345,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":130,"UpVolume":142,"OpenInterest":0}\r\n{"Close":8494.00,"DownTicks":160,"DownVolume":206,"High":8505.75,"Low":8492.75,"Open":8503.25,"Status":13,"TimeStamp":"\\/Date(1583531400000)\\/","TotalTicks":275,"TotalVolume":346,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":115,"UpVolume":140,"OpenInterest":0}\r\n{"Close":8499.00,"DownTicks":136,"DownVolume":192,"High":8500.25,"Low":8492.25,"Open":8493.75,"Status":13,"TimeStamp":"\\/Date(1583531700000)\\/","TotalTicks":299,"TotalVolume":402,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":163,"UpVolume":210,"OpenInterest":0}\r\n{"Close":8501.75,"DownTicks":176,"DownVolume":314,"High":8508.25,"Low":8495.75,"Open":8498.50,"Status":536870941,"TimeStamp":"\\/Date(1583532000000)\\/","TotalTicks":340,"TotalVolume":510,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":164,"UpVolume":196,"OpenInterest":0}\r\nEND'
Please note that while the actual string is much longer, it is always a long string of shorter strings separated by '\r\n', ignoring the final word "END". You can see how similarly-structured these short strings are:
for i in response.text.split('\r\n')[:-1]: print(i, '\n\n')
{"Close":8506.25,"DownTicks":164,"DownVolume":207,"High":8508.25,"Low":8495.00,"Open":8496.75,"Status":13,"TimeStamp":"\/Date(1583530800000)\/","TotalTicks":371,"TotalVolume":469,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":207,"UpVolume":262,"OpenInterest":0}
{"Close":8503.00,"DownTicks":152,"DownVolume":203,"High":8509.50,"Low":8502.00,"Open":8506.00,"Status":13,"TimeStamp":"\/Date(1583531100000)\/","TotalTicks":282,"TotalVolume":345,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":130,"UpVolume":142,"OpenInterest":0}
{"Close":8494.00,"DownTicks":160,"DownVolume":206,"High":8505.75,"Low":8492.75,"Open":8503.25,"Status":13,"TimeStamp":"\/Date(1583531400000)\/","TotalTicks":275,"TotalVolume":346,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":115,"UpVolume":140,"OpenInterest":0}
{"Close":8499.00,"DownTicks":136,"DownVolume":192,"High":8500.25,"Low":8492.25,"Open":8493.75,"Status":13,"TimeStamp":"\/Date(1583531700000)\/","TotalTicks":299,"TotalVolume":402,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":163,"UpVolume":210,"OpenInterest":0}
{"Close":8501.75,"DownTicks":176,"DownVolume":314,"High":8508.25,"Low":8495.75,"Open":8498.50,"Status":536870941,"TimeStamp":"\/Date(1583532000000)\/","TotalTicks":340,"TotalVolume":510,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":164,"UpVolume":196,"OpenInterest":0}
Goal parsing a few of the fields and saving them in a dataframe with the field "Timestamp" as the dataframe's index.
What I have done:
response_text = response.text
import ast
df = pd.DataFrame(columns = [ 'o', 'h', 'l', 'c', 'v'])
for i in response_text.split('\r\n')[:-1]:
i_dict = ast.literal_eval(i)
epoch_in_milliseconds = int(i_dict['TimeStamp'].split('(')[1].split(')')[0])
time_stamp = datetime.datetime.fromtimestamp(float(epoch_in_milliseconds)/1000.)
o = i_dict['Open']
h = i_dict['High']
l = i_dict['Low']
c = i_dict['Close']
v = i_dict['TotalVolume']
temp_df = pd.DataFrame({'o':o, 'h':h, 'l':l, 'c':c, 'v':v}, index = [time_stamp])
df = df.append(temp_df)
which gets me:
In [546]df
Out[546]:
o h l c v
2020-03-06 16:40:00 8496.75000 8508.25000 8495.00000 8506.25000 469
2020-03-06 16:45:00 8506.00000 8509.50000 8502.00000 8503.00000 345
2020-03-06 16:50:00 8503.25000 8505.75000 8492.75000 8494.00000 346
2020-03-06 16:55:00 8493.75000 8500.25000 8492.25000 8499.00000 402
2020-03-06 17:00:00 8498.50000 8508.25000 8495.75000 8501.75000 510
which is exactly what I need.
Issue this method feels clumsy to me, like a patch-work, and prone to breaking due to possible slight differences in the response text.
Is there any more robust and faster way of extracting this information from the original bytes? (When the server response is in JSON format, I have none of this headache)
This is somewhat cleaner, I believe:
ts = """
b'{"Close":8506.25,"DownTicks":164,"DownVolume":207,"High":8508.25,"Low":8495.00,"Open":8496.75,"Status":13,"TimeStamp":"\\/Date(1583530800000)\\/","TotalTicks":371,"TotalVolume":469,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":207,"UpVolume":262,"OpenInterest":0}\r\n{"Close":8503.00,"DownTicks":152,"DownVolume":203,"High":8509.50,"Low":8502.00,"Open":8506.00,"Status":13,"TimeStamp":"\\/Date(1583531100000)\\/","TotalTicks":282,"TotalVolume":345,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":130,"UpVolume":142,"OpenInterest":0}\r\n{"Close":8494.00,"DownTicks":160,"DownVolume":206,"High":8505.75,"Low":8492.75,"Open":8503.25,"Status":13,"TimeStamp":"\\/Date(1583531400000)\\/","TotalTicks":275,"TotalVolume":346,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":115,"UpVolume":140,"OpenInterest":0}\r\n{"Close":8499.00,"DownTicks":136,"DownVolume":192,"High":8500.25,"Low":8492.25,"Open":8493.75,"Status":13,"TimeStamp":"\\/Date(1583531700000)\\/","TotalTicks":299,"TotalVolume":402,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":163,"UpVolume":210,"OpenInterest":0}\r\n{"Close":8501.75,"DownTicks":176,"DownVolume":314,"High":8508.25,"Low":8495.75,"Open":8498.50,"Status":536870941,"TimeStamp":"\\/Date(1583532000000)\\/","TotalTicks":340,"TotalVolume":510,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":164,"UpVolume":196,"OpenInterest":0}\r\nEND'
"""
import pandas as pd
from datetime import datetime
import json
data = []
tss = ts.replace("b'","").replace("\r\nEND'","")
tss2 = tss.strip().split("\r\n")
for t in tss2:
item = json.loads(t)
epo = int(item['TimeStamp'].split('(')[1].split(')')[0])
eims = datetime.fromtimestamp(epo/1000)
item.update(TimeStamp=eims)
data.append(item)
pd.DataFrame(data)
Output:
Close DownTicks DownVolume High Low Open Status TimeStamp TotalTicks TotalVolume UnchangedTicks UnchangedVolume UpTicks UpVolume OpenInterest
0 8506.25 164 207 8508.25 8495.00 8496.75 13 2020-03-06 16:40:00 371 469 0 0 207 262 0
1 8503.00 152 203 8509.50 8502.00 8506.00 13 2020-03-06 16:45:00 282 345 0 0 130 142 0
etc. You can drop unwanted columns, change column names and so on.
That's almost JSON format. Or more precisely, it's a series of lines, each of which contains a JSON-formatted object. (Except for the last one.) So that suggests that the optimal solution in some way uses the json module.
json.load doesn't handle files consisting of a series of lines, nor does it directly handle individual strings (much less bytes). However, you're not limited to json.load. You can construct a JSONDecoder object, which does include methods to parse from strings (but not from bytes), and you can use the decode method of the bytes object to construct a string from the input. (You need to know the encoding to do that, but I strongly suspect that in this case all the characters are ascii, so either 'ascii' or the default UTF-8 encoding will work.)
Unless your input is gigabytes, you can just use the strategy in your question: split the input into lines, discard the END line, and pass the rest into a JSONDecoder:
import json
decoder = JSONDecoder()
# Using splitlines seemed more robust than counting on a specific line-end
for line in response_text.decode().splitlines()
# Alternative: use a try/catch around the call to decoder.decode
if line == 'END': break
line_dict = decoder.decode(line)
# Handle the Timestamp member and create the dataframe item
I am reading large files in Fortran that contain mixed string/numeric data such as:
114 MIDSIDE 0 0 O0002 436 437 584 438
115 SURFACE M00002 0 0 359 561 560 356
412236 SOLID M00002 O00001 0 86157 82419 82418 79009
Currently, each line is read as a string and then post-processed to identify the proper terms. I was wondering if there is any way to read each line as an integer followed by four strings separated by space, and then some more integers; i.e. similar to '(I10,4(A6,X),4I10)' format, but without any information on the size of each string.
Does not work (charr is empty, iarr(2:5)=0):
INTEGER IARR(5)
CHARACTER*30 CHARR(4)
C open the file with ID=1
READ(1,*)IARR(1),(CHARR(I),I=1,4),(IARR(I),I=2,5)
Works (only for the last line in the data example):
INTEGER IARR(5)
CHARACTER*30 CHARR(4)
C open the file with ID=1
READ(1,'(I10,4(A7,X),4I10)')IARR(1),(CHARR(I),I=1,4),(IARR(I),I=2,5)
The issue is I don't know a-priori what would be the size of each string.
I actually found out the f77rtl flag was used to compile the project, and when I removed the flag, the issue was resolved. So the list-directed input format works just fine.
Each line contains a special timestamp, the caller number, the receiver number, the duration of the call in seconds and the rate per minute in cents at which this call was charged, all separated by ";”. The file contains thousands of calls looks like this. I created a list instead of a dictionary to access the elements but I'm not sure how to count the number of calls originating from the phone in question
timestamp;caller;receiver;duration;rate per minute
1419121426;7808907654;7807890123;184;0.34
1419122593;7803214567;7801236789;46;0.37
1419122890;7808907654;7809876543;225;0.31
1419122967;7801234567;7808907654;419;0.34
1419123462;7804922860;7809876543;782;0.29
1419123914;7804321098;7801234567;919;0.34
1419125766;7807890123;7808907654;176;0.41
1419127316;7809876543;7804321098;471;0.31
Phone number || # |Duration | Due |
+--------------+-----------------------
|(780) 123 4567||384|55h07m53s|$ 876.97|
|(780) 123 6789||132|17h53m19s|$ 288.81|
|(780) 321 4567||363|49h52m12s|$ 827.48|
|(780) 432 1098||112|16h05m09s|$ 259.66|
|(780) 492 2860||502|69h27m48s|$1160.52|
|(780) 789 0123||259|35h56m10s|$ 596.94|
|(780) 876 5432||129|17h22m32s|$ 288.56|
|(780) 890 7654||245|33h48m46s|$ 539.41|
|(780) 987 6543||374|52h50m11s|$ 883.72|
list =[i.strip().split(";") for i in open("calls.txt", "r")]
print(list)
I have very simple solution for your issue:
First of all use with when opening file - it's a handy shortcut and it provides sames functionality as wrap this funtion into try...except. Consider this:
lines = []
with open("test.txt", "r") as f:
for line in f.readlines():
lines.append(line.strip().split(";"))
print(lines)
counters = {}
# you browse through lists and later through numbers inside lists
for line in lines:
for number in line:
# very basic way to count occurences
if number not in counters:
counters[number] = 1
else:
counters[number] += 1
# in this condition you can tell what number of digits you accept
counters = {elem: counters[elem] for elem in counters.keys() if len(elem) > 5}
print(counters)
This should get you started
import csv
import collections
Call = collections.namedtuple("Call", "duration rate time")
calls = {}
with open('path/to/file') as infile:
for time, nofrom, noto, dur, rate in csv.reader(infile):
calls.get(nofrom, {}).get(noto,[]).append(Call(dur, rate, time))
for nofrom, logs in calls.items():
for noto, callist in logs.items():
print(nofrom, "called", noto, len(callist), "times")