loading files using JLD2 in julia when they're saved with commas - io

I have saved my data in julia using the command below
JLD2.#save "myfile.jld2" a, b, c
and I understand I should have used
JLD2.#save "myfile.jld2" a b c.
Is there still any way to access the data in myfile.jld2 file. Right now, if I run
JLD2.#load "myfile.jld2"
I get
1-element Array{Symbol,1}:
Symbol("(a, b, c)")
and not a, b, c data.

Sure you can - just use "a, b, c" as the data identifier.
Setup:
using JLD2
a,b,c = 255,"some nice text",6666.0
JLD2.#save "file.jld2" a, b, c
Identifying and reading the data:
julia> f=jldopen("file.jld2","r")
JLDFile C:\Users\pszufe\file.jld2 (read-only)
└─� (a, b, c)
julia> keys(f)
1-element Array{String,1}:
"(a, b, c)"
julia> read(f,keys(f)[1])
(255, "some nice text", 6666.0)

Related

Converting Multiple string inputs to int

I'm trying to convert a, b, c, d to integers, but after I've tried doing this they still come up as strings. I've tried using a loop instead of map, but that didn't work either.
inputs = input()
split_input = inputs.split()
a, b, c, d = split_input
split_input = list(map(int, split_input))
Just swap the last 2 lines:
split_input = list(map(int, split_input))
a, b, c, d = split_input
Unless you need split_input later on, you don't need the list conversion at all:
split_input = map(int, split_input)
a, b, c, d = split_input
# OR in fact simply
a, b, c, d = map(int, split_input)

getting error message: name 'a' is not defined [duplicate]

This question already has answers here:
Python NameError: name is not defined
(4 answers)
Closed 1 year ago.
def sum(a, b, c, d):
result = 0
result = result+a+b+c+d
return result
def length():
return 4
def mean(a, b, c, d):
return float(sum(a, b, c, d))/length()
print(sum(a, b, c, d), length(), mean(a, b, c, d))
I am getting the error message name 'a' is not defined
If you don’t define the variable you’ll gonna get these name errors. Let’s say for example you switch the values when you call these functions -
print(sum(a, b, c, d), length(), mean(a, b, c, d))
Here, in this case you’ll gonna get name b is not defined because Python interpreter doesn’t know what’s the value that variable b is storing.
You need to tell the interpreter what are the values for these variables.
For example - a=10, b=2,.. and so on
a variable is undefined means that you need to define (assign a value) to that variable. For example, a=1 should work.

Looping a list of lists, while accessing each elements easily

I apologise in advance if this has been answered before, I didn't know what to search for.
Say, I want to iterate through a list of lists that looks like this:
x = [[a, b, c], [a, b, c], ...]
I figured out I can do this to easily access the lists inside that structure:
for [a, b, c] in x:
doSomethingToElements(a,b,c)
What I want to do is:
for [a, b, c] as wholeList in x:
doSomethingToElements(a,b,c)
doSomethingToWholeLists(wholeList)
However, that syntax is invalid, is there any equivalent way to do it, which is correct and valid?
Or should I do it with enumerate() as stated here?
EDIT: Working through to make enumerate() work, I realise I can do this:
for idx, [a, b, c] in enumerate(x):
doSomethingToElements(a,b,c)
doSomethingToWholeLists(x[idx])
But feel free to post more elegant solutions, or is it elegant enough that it doesn't matter?
There are two options.
The first one is iterate element and list together using zip, and the second one is iterate the list and assign each value.
x = [[1, 2, 3], [4, 5, 6]]
for (a, b, c), z in zip(x, x):
print(a, b, c, z)
for z in x:
a, b, c = z
print(a, b, c, z)
There is not really any syntax similar to that suggestion. Your best bet would be splat-unpacking:
for wholeList in x:
doSomethingToElements(*wholeList)
doSomethingToWholeLists(wholeList)

Search for sequence of items in a list

Is there an easy way to search for a sequence of strings in a list? For example:
testlist = [a,b,c,d,e,f,g,a,b,c,d,j,k,j]
and I want to search for the sequence abc and getting the index returned. So to clarify if the string I want to search consists of more than one element of the list. For some context: I have a list with datablocks and I want to find out how big each datablock is therefore searching for a reoccuring string in the list.
There are many good string search algorithms: KMP, Boyer-Moore, Rabin-Karp. You can use the builtin str.index function on ''.join(L) if you are dealing with characters (str.index implements Boyer-Moore algorithm in CPython: https://github.com/python/cpython/blob/3.7/Objects/stringlib/fastsearch.h).
But in most cases, the naive algorithm is good enough. Check every index of the haystack to find the needle:
>>> a, b, c, d, e, f, g, j, k = [object() for _ in range(9)]
>>> haystack = [a, b, c, d, e, f, g, a, b, c, d, j, k, j]
>>> needle = [a, b, c]
>>> for i in range(len(haystack)-len(needle)+1):
... if haystack[i:i+len(needle)] == needle:
... print(i)
...
0
7
The complexity is O(|haystack|*|needle|).

Most efficient way to remove duplicates

I have a log file that I need to remove duplicate entries from. Each line in the file consists of three parts separated by commas, let's call them A, B and C respectively.
Two entries are duplicates if and only if their A's and C's are equal. If duplicates are found, the one with the greatest B shall remain.
The real log file has a large number of lines, the following serves only as a simplified example:
Log file (input):
hostA, 1507300700.0, xyz
hostB, 1507300700.0, abc
hostB, 1507300800.0, xyz
hostA, 1507300800.0, xyz
hostA, 1507300900.0, xyz
Log file after duplicates have been removed (output):
hostB, 1507300700.0, abc
hostB, 1507300800.0, xyz
hostA, 1507300900.0, xyz
I've tried reading in the file as two lists, then comparing them along the lines of:
for i in full_log_list_a:
for j in full_log_list_b:
if i[0] == j[0] and i[2] == j[2] and i[1] > j[1]:
print(', '.join(i[0]), file=open(new_file, 'a'))
I've also tried a few other things, but whatever I do it ends up iterating over the list too many times and creating a bunch of repeat entries, or it fails to find ONLY the item with the greatest B. I know there's probably an obvious answer, but I'm stuck. Can someone please point me in the right direction?
I think a dict is what you're looking for, instead of lists.
As you read the log file you add entries to the dict, where each entry consists of a key (A, C) and a value B. If a key already exists, you compare B with the value mapped to the key, and remap the key if necessary (i.e. if B is greater than the value currently mapped to the key).
Example (do use better names for variables a, b and c):
log_file_entries = {}
with open(log_file, 'r') as f:
for line in f:
a, b_str, c = line.split(', ')
b = int(b_str)
if (a, c) in log_file_entries:
if b < log_file_entries[(a, c)]:
continue
log_file_entries[(a, c)] = b
It's one loop. Since the required operations on dicts are (typically) constant in time, i.e. O(1), the overall time complexity will be O(n), much better than your nested loops' time complexity of O(n²).
When you later rewrite the file, you can just loop over the dict like so:
with open(new_file, 'a') as f:
for (a, c), b in log_file_entries.items():
print('{0}, {1}, {2}'.format(a, b, c), file=f)
Apologies if any code or terms are incorrect, I haven't touched Python in a while.
(P.S. In your example code you use two lists, whereas you could have used the same list in both loops.)
UPDATE
If you want the value of a key to contain every part of a line in the log file, you could rewrite the above code like so:
log_file_entries = {}
with open(log_file, 'r') as f:
for line in f:
a, b_str, c = line.split(', ')
b = int(b_str)
if (a, c) in log_file_entries:
if b < log_file_entries[(a, c)][1]:
continue
log_file_entries[(a, c)] = (a, b, c)
with open(new_file, 'a') as f:
for entry in log_file_entries.values():
print(', '.join(entry), file=f)

Resources