Python3: How to increment a string value within a "for" loop - string

I have a tabular.text file (Named "xfile"). An example of its contents is attached below.
Scaffold2_1 WP_017805071.1 26.71 161 97
Scaffold2_1 WP_006995572.1 26.36 129 83
Scaffold2_1 WP_005723576.1 26.92 130 81
Scaffold3_1 WP_009894856.1 25.77 245 43
Scaffold8_1 WP_017805071.1 38.31 248 145
Scaffold8_1 WP_006995572.1 38.55 249 140
Scaffold8_1 WP_005723576.1 34.88 258 139
Scaffold9_1 WP_005645255.1 42.54 446 144
Note how each line begins with Scaffold(y)_1, with y being a number. I have written the following code to print each line beginning with the following terms, Scaffold2 and Scaffold8.
with open("xfile", 'r') as data:
for line in data.readlines():
if "Scaffold2" in line:
a = line
print(a)
elif "Scaffold8" in line:
b = line
print(b)
I was wondering, is there a way you would recommend incrementing the (y) portion of Scaffold() in the if and elif statements?
The idea would be to allow the script to search for each line containing "Scaffold(y)" and storing each line with a specific number (y) in its own variable to be then printed. This would obviously be much faster than entering in each number manually.

You can try this, it is quite easier than using Regex. If this isn't what you expect, let me know, I'll change the code.
for line in data.readlines():
if line[0:8] == "Scaffold" and line[8].isdigit():
print(line)
I'm just checking the 9th Position in your line, i.e. (8th index). If it's a digit, I'm printing the line. Like you said, I'm printing if your "y" is a digit. I'm not incrementing it. The work of incrementation is already been done by your for loop.

Ok it seems like you want to get something in format like:
entries = {y1: ['Scaffold(y1)_...', 'Scaffold(y1)_...'], y2: ['Scaffold(y2)_...', 'Scaffold(y2)_...'], ...}
Then you can do something like that (I assume all of your lines start in the same manner as you have shown, so the y value is always the 8th position in the string):
entries = dict()
for line in data.readlines():
if not line[8] in entries.keys():
entries.update({line[8]: [line]})
else:
entries[line[8]].append(line)
print(entries)
This way you will have a dictionary in the format I have shown you above - output:
{'2': ['Scaffold2_1 WP_017805071.1 26.71 161 97', 'Scaffold2_1 WP_006995572.1 26.36 129 83', 'Scaffold2_1 WP_005723576.1 26.92 130 81'], '3': ['Scaffold3_1 WP_009894856.1 25.77 245 43'], '8': ['Scaffold8_1 WP_017805071.1 38.31 248 145', 'Scaffold8_1 WP_006995572.1 38.55 249 140', 'Scaffold8_1 WP_005723576.1 34.88 258 139'], '9': ['Scaffold9_1 WP_005645255.1 42.54 446 144']}
EDIT: tbh I still don't fully understand why would you need that tho.

Related

Parsing a http response that is in bytes format

requests.get() receives a response which is of the type bytes. It looks like:
b'{"Close":8506.25,"DownTicks":164,"DownVolume":207,"High":8508.25,"Low":8495.00,"Open":8496.75,"Status":13,"TimeStamp":"\\/Date(1583530800000)\\/","TotalTicks":371,"TotalVolume":469,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":207,"UpVolume":262,"OpenInterest":0}\r\n{"Close":8503.00,"DownTicks":152,"DownVolume":203,"High":8509.50,"Low":8502.00,"Open":8506.00,"Status":13,"TimeStamp":"\\/Date(1583531100000)\\/","TotalTicks":282,"TotalVolume":345,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":130,"UpVolume":142,"OpenInterest":0}\r\n{"Close":8494.00,"DownTicks":160,"DownVolume":206,"High":8505.75,"Low":8492.75,"Open":8503.25,"Status":13,"TimeStamp":"\\/Date(1583531400000)\\/","TotalTicks":275,"TotalVolume":346,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":115,"UpVolume":140,"OpenInterest":0}\r\n{"Close":8499.00,"DownTicks":136,"DownVolume":192,"High":8500.25,"Low":8492.25,"Open":8493.75,"Status":13,"TimeStamp":"\\/Date(1583531700000)\\/","TotalTicks":299,"TotalVolume":402,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":163,"UpVolume":210,"OpenInterest":0}\r\n{"Close":8501.75,"DownTicks":176,"DownVolume":314,"High":8508.25,"Low":8495.75,"Open":8498.50,"Status":536870941,"TimeStamp":"\\/Date(1583532000000)\\/","TotalTicks":340,"TotalVolume":510,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":164,"UpVolume":196,"OpenInterest":0}\r\nEND'
Please note that while the actual string is much longer, it is always a long string of shorter strings separated by '\r\n', ignoring the final word "END". You can see how similarly-structured these short strings are:
for i in response.text.split('\r\n')[:-1]: print(i, '\n\n')
{"Close":8506.25,"DownTicks":164,"DownVolume":207,"High":8508.25,"Low":8495.00,"Open":8496.75,"Status":13,"TimeStamp":"\/Date(1583530800000)\/","TotalTicks":371,"TotalVolume":469,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":207,"UpVolume":262,"OpenInterest":0}
{"Close":8503.00,"DownTicks":152,"DownVolume":203,"High":8509.50,"Low":8502.00,"Open":8506.00,"Status":13,"TimeStamp":"\/Date(1583531100000)\/","TotalTicks":282,"TotalVolume":345,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":130,"UpVolume":142,"OpenInterest":0}
{"Close":8494.00,"DownTicks":160,"DownVolume":206,"High":8505.75,"Low":8492.75,"Open":8503.25,"Status":13,"TimeStamp":"\/Date(1583531400000)\/","TotalTicks":275,"TotalVolume":346,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":115,"UpVolume":140,"OpenInterest":0}
{"Close":8499.00,"DownTicks":136,"DownVolume":192,"High":8500.25,"Low":8492.25,"Open":8493.75,"Status":13,"TimeStamp":"\/Date(1583531700000)\/","TotalTicks":299,"TotalVolume":402,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":163,"UpVolume":210,"OpenInterest":0}
{"Close":8501.75,"DownTicks":176,"DownVolume":314,"High":8508.25,"Low":8495.75,"Open":8498.50,"Status":536870941,"TimeStamp":"\/Date(1583532000000)\/","TotalTicks":340,"TotalVolume":510,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":164,"UpVolume":196,"OpenInterest":0}
Goal parsing a few of the fields and saving them in a dataframe with the field "Timestamp" as the dataframe's index.
What I have done:
response_text = response.text
import ast
df = pd.DataFrame(columns = [ 'o', 'h', 'l', 'c', 'v'])
for i in response_text.split('\r\n')[:-1]:
i_dict = ast.literal_eval(i)
epoch_in_milliseconds = int(i_dict['TimeStamp'].split('(')[1].split(')')[0])
time_stamp = datetime.datetime.fromtimestamp(float(epoch_in_milliseconds)/1000.)
o = i_dict['Open']
h = i_dict['High']
l = i_dict['Low']
c = i_dict['Close']
v = i_dict['TotalVolume']
temp_df = pd.DataFrame({'o':o, 'h':h, 'l':l, 'c':c, 'v':v}, index = [time_stamp])
df = df.append(temp_df)
which gets me:
In [546]df
Out[546]:
o h l c v
2020-03-06 16:40:00 8496.75000 8508.25000 8495.00000 8506.25000 469
2020-03-06 16:45:00 8506.00000 8509.50000 8502.00000 8503.00000 345
2020-03-06 16:50:00 8503.25000 8505.75000 8492.75000 8494.00000 346
2020-03-06 16:55:00 8493.75000 8500.25000 8492.25000 8499.00000 402
2020-03-06 17:00:00 8498.50000 8508.25000 8495.75000 8501.75000 510
which is exactly what I need.
Issue this method feels clumsy to me, like a patch-work, and prone to breaking due to possible slight differences in the response text.
Is there any more robust and faster way of extracting this information from the original bytes? (When the server response is in JSON format, I have none of this headache)
This is somewhat cleaner, I believe:
ts = """
b'{"Close":8506.25,"DownTicks":164,"DownVolume":207,"High":8508.25,"Low":8495.00,"Open":8496.75,"Status":13,"TimeStamp":"\\/Date(1583530800000)\\/","TotalTicks":371,"TotalVolume":469,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":207,"UpVolume":262,"OpenInterest":0}\r\n{"Close":8503.00,"DownTicks":152,"DownVolume":203,"High":8509.50,"Low":8502.00,"Open":8506.00,"Status":13,"TimeStamp":"\\/Date(1583531100000)\\/","TotalTicks":282,"TotalVolume":345,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":130,"UpVolume":142,"OpenInterest":0}\r\n{"Close":8494.00,"DownTicks":160,"DownVolume":206,"High":8505.75,"Low":8492.75,"Open":8503.25,"Status":13,"TimeStamp":"\\/Date(1583531400000)\\/","TotalTicks":275,"TotalVolume":346,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":115,"UpVolume":140,"OpenInterest":0}\r\n{"Close":8499.00,"DownTicks":136,"DownVolume":192,"High":8500.25,"Low":8492.25,"Open":8493.75,"Status":13,"TimeStamp":"\\/Date(1583531700000)\\/","TotalTicks":299,"TotalVolume":402,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":163,"UpVolume":210,"OpenInterest":0}\r\n{"Close":8501.75,"DownTicks":176,"DownVolume":314,"High":8508.25,"Low":8495.75,"Open":8498.50,"Status":536870941,"TimeStamp":"\\/Date(1583532000000)\\/","TotalTicks":340,"TotalVolume":510,"UnchangedTicks":0,"UnchangedVolume":0,"UpTicks":164,"UpVolume":196,"OpenInterest":0}\r\nEND'
"""
import pandas as pd
from datetime import datetime
import json
data = []
tss = ts.replace("b'","").replace("\r\nEND'","")
tss2 = tss.strip().split("\r\n")
for t in tss2:
item = json.loads(t)
epo = int(item['TimeStamp'].split('(')[1].split(')')[0])
eims = datetime.fromtimestamp(epo/1000)
item.update(TimeStamp=eims)
data.append(item)
pd.DataFrame(data)
Output:
Close DownTicks DownVolume High Low Open Status TimeStamp TotalTicks TotalVolume UnchangedTicks UnchangedVolume UpTicks UpVolume OpenInterest
0 8506.25 164 207 8508.25 8495.00 8496.75 13 2020-03-06 16:40:00 371 469 0 0 207 262 0
1 8503.00 152 203 8509.50 8502.00 8506.00 13 2020-03-06 16:45:00 282 345 0 0 130 142 0
etc. You can drop unwanted columns, change column names and so on.
That's almost JSON format. Or more precisely, it's a series of lines, each of which contains a JSON-formatted object. (Except for the last one.) So that suggests that the optimal solution in some way uses the json module.
json.load doesn't handle files consisting of a series of lines, nor does it directly handle individual strings (much less bytes). However, you're not limited to json.load. You can construct a JSONDecoder object, which does include methods to parse from strings (but not from bytes), and you can use the decode method of the bytes object to construct a string from the input. (You need to know the encoding to do that, but I strongly suspect that in this case all the characters are ascii, so either 'ascii' or the default UTF-8 encoding will work.)
Unless your input is gigabytes, you can just use the strategy in your question: split the input into lines, discard the END line, and pass the rest into a JSONDecoder:
import json
decoder = JSONDecoder()
# Using splitlines seemed more robust than counting on a specific line-end
for line in response_text.decode().splitlines()
# Alternative: use a try/catch around the call to decoder.decode
if line == 'END': break
line_dict = decoder.decode(line)
# Handle the Timestamp member and create the dataframe item

Parsing error when reading a specific Pajek (NET) file with Networkx into Jupyter

I am trying to reading this pajek file in Google Colab's version of Jupyter and I get an error when executing the following very simple code:
J = nx.MultiDiGraph()
J=nx.read_pajek("/content/data/graphdatasets/jazz.net")
print(nx.info(J))
The error is the following:
/usr/local/lib/python3.6/dist-packages/networkx/readwrite/pajek.py in parse_pajek(lines)
211 except AttributeError:
212 splitline = shlex.split(str(l))
--> 213 id, label = splitline[0:2]
214 labels.append(label)
215 G.add_node(label)
ValueError: not enough values to unpack (expected 2, got 1)
With pip show networkx, I see that I'm running Networkx version: 2.3. Am I doing something wrong in the code?
Update: Pasting below the file's first few lines:
*Vertices 198
*Arcs
*Edges
1 8 1
1 24 1
1 35 1
1 42 1
1 46 1
1 60 1
1 74 1
1 78 1
According to the Pajek definition the first two lines of your file are not according to the standard. After *vertices n, n lines with details about the vertices are expected. In addition, *edges and *arcs is a duplicate. NetworkX assumes use for an edge list, which started with *arcs a MultiDiGraph and for *edges a MultiGraph (see current code). To resolve your problem, you only need to delete the first two lines of your .net-file.

Reading strings from file separated by space in Fortran

I am reading large files in Fortran that contain mixed string/numeric data such as:
114 MIDSIDE 0 0 O0002 436 437 584 438
115 SURFACE M00002 0 0 359 561 560 356
412236 SOLID M00002 O00001 0 86157 82419 82418 79009
Currently, each line is read as a string and then post-processed to identify the proper terms. I was wondering if there is any way to read each line as an integer followed by four strings separated by space, and then some more integers; i.e. similar to '(I10,4(A6,X),4I10)' format, but without any information on the size of each string.
Does not work (charr is empty, iarr(2:5)=0):
INTEGER IARR(5)
CHARACTER*30 CHARR(4)
C open the file with ID=1
READ(1,*)IARR(1),(CHARR(I),I=1,4),(IARR(I),I=2,5)
Works (only for the last line in the data example):
INTEGER IARR(5)
CHARACTER*30 CHARR(4)
C open the file with ID=1
READ(1,'(I10,4(A7,X),4I10)')IARR(1),(CHARR(I),I=1,4),(IARR(I),I=2,5)
The issue is I don't know a-priori what would be the size of each string.
I actually found out the f77rtl flag was used to compile the project, and when I removed the flag, the issue was resolved. So the list-directed input format works just fine.

python3: Counting repeated occurrence in a list

Each line contains a special timestamp, the caller number, the receiver number, the duration of the call in seconds and the rate per minute in cents at which this call was charged, all separated by ";”. The file contains thousands of calls looks like this. I created a list instead of a dictionary to access the elements but I'm not sure how to count the number of calls originating from the phone in question
timestamp;caller;receiver;duration;rate per minute
1419121426;7808907654;7807890123;184;0.34
1419122593;7803214567;7801236789;46;0.37
1419122890;7808907654;7809876543;225;0.31
1419122967;7801234567;7808907654;419;0.34
1419123462;7804922860;7809876543;782;0.29
1419123914;7804321098;7801234567;919;0.34
1419125766;7807890123;7808907654;176;0.41
1419127316;7809876543;7804321098;471;0.31
Phone number || # |Duration | Due |
+--------------+-----------------------
|(780) 123 4567||384|55h07m53s|$ 876.97|
|(780) 123 6789||132|17h53m19s|$ 288.81|
|(780) 321 4567||363|49h52m12s|$ 827.48|
|(780) 432 1098||112|16h05m09s|$ 259.66|
|(780) 492 2860||502|69h27m48s|$1160.52|
|(780) 789 0123||259|35h56m10s|$ 596.94|
|(780) 876 5432||129|17h22m32s|$ 288.56|
|(780) 890 7654||245|33h48m46s|$ 539.41|
|(780) 987 6543||374|52h50m11s|$ 883.72|
list =[i.strip().split(";") for i in open("calls.txt", "r")]
print(list)
I have very simple solution for your issue:
First of all use with when opening file - it's a handy shortcut and it provides sames functionality as wrap this funtion into try...except. Consider this:
lines = []
with open("test.txt", "r") as f:
for line in f.readlines():
lines.append(line.strip().split(";"))
print(lines)
counters = {}
# you browse through lists and later through numbers inside lists
for line in lines:
for number in line:
# very basic way to count occurences
if number not in counters:
counters[number] = 1
else:
counters[number] += 1
# in this condition you can tell what number of digits you accept
counters = {elem: counters[elem] for elem in counters.keys() if len(elem) > 5}
print(counters)
This should get you started
import csv
import collections
Call = collections.namedtuple("Call", "duration rate time")
calls = {}
with open('path/to/file') as infile:
for time, nofrom, noto, dur, rate in csv.reader(infile):
calls.get(nofrom, {}).get(noto,[]).append(Call(dur, rate, time))
for nofrom, logs in calls.items():
for noto, callist in logs.items():
print(nofrom, "called", noto, len(callist), "times")

HaxeFlixel findPath

I've got a little problem with the findPath function.
I'm working with a mix of two tutorials (one from Haxeflixel, and the second from haxecoder) and the second one uses CSV instead of Ogmo and uses a findPath when we click.
But when I click, the game crashes, nice.
This is my code: http://hastebin.com/xunidubiyi.avrasm
The problem is on the 86th line (split into multiple lines to fit the width):
var nodes:Array<FlxPoint> = _mWalls.findPath(
FlxPoint.get(
_player.x + 16 / 2,
_player.y + 16 / 2
),
FlxPoint.get(
tileCoordX * 16 + 16 / 2,
tileCoordY * 16 + 16 / 2
)
);
Output:
Invalid field access : allowCollisions
Called from flixel.tile.FlxTilemap::computePathDistance line 1806
Called from flixel.tile.FlxTilemap::findPath line 796
Called from PlayState::update line 87
Called from flixel.FlxState::tryUpdate line 155
Called from flixel.FlxGame::update line 700
Called from flixel.FlxGame::step line 648
Called from flixel.FlxGame::onEnterFrame line 493
Called from openfl._legacy.events.EventDispatcher::dispatchEvent line 98
Called from openfl._legacy.display.DisplayObject::__dispatchEvent line 182
Called from openfl._legacy.display.DisplayObject::__broadcast line 161
Called from openfl._legacy.display.DisplayObjectContainer::__broadcast line 286
Called from openfl._legacy.display.Stage::__render line 1103
Called from openfl._legacy.display.Stage::__checkRender line 351
Called from openfl._legacy.display.Stage::__pollTimers line 1084
Called from openfl._legacy.display.Stage::__doProcessStageEvent line 430
Done(1)
Does anybody know what went wrong and why?

Resources