Wolfram Mathematica import data from multiple files - string

I have a lot of files. Every of which contains data.
I can happy import one file to Mathematica. But there are more than 500 hundreds of files.
I do it so:
Import["~/math/third_ks/mixed_matrices/1.dat", "Table"];
aaaa = %
(*OUTPUT - some data, I can access them!*)
All that I want is just to make circle(I can do it), but I cannot change name of file - 1.dat. I want to change it.
I tried to make such solution. I generated part of possible names and I have written them to separated file.
Import["~/math/third_ks/mixed_matrices/generate_name_of_files.dat", "Table"];
aaaa = %
Output: {{"~/math/third_ks/mixed_matrices/0.dat"}, \
{"~/math/third_ks/mixed_matrices/1.dat"}, ......
All that I want to do is Table[a=Import[aaaa[[i]] ,{i,1,500}]
But the function Import accepts only String " " objects as filename/paths.

You can use FileNames to collect the names of the data files you want to import, with the usual wildcards.
And then just map the Import statement over the list of filenames.
data will then contain a list comprising the data from each file as a separate element.
data = Import[#,"Table"]& /# FileNames["~/math/third_ks/mixed_matrices/*.dat"];

It's a bit hard to work out what is going on without the file of filenames. However, I think you might be able to solve your problem by using Flatten on the list of filenames to make it a vector of String objects that can be passed to Import. Currently your list is an n*1 matrix, where each row is a List containing a String, not a vector of Strings.
Incidentally you could use Map (/#) instead of Table in this instance.

Thank you for your response.
It happened so that I got two solutions in the same time.
I think it would be not fair to forget about second way.
aaaa = "~/math/third_ks/mixed_matrices/" <> ToString[#] <> ".dat" & /# Range[0, 116];
(*This thing generates list of lines
Output:
{"~/math/third_ks/mixed_matrices/0.dat", \
"~/math/third_ks/mixed_matrices/1.dat", \
"~/math/third_ks/mixed_matrices/2.dat", .....etc, until 116
Table[Import[aaaa[[i]], "Table"], {i, 1, 117}];
(*and it just imports data from file*)
bbbb = %; (*here we have all data, voila!*)
Incidentally, it's not my solution.
It was supposed by one my friend:
https://stackoverflow.com/users/1243244/light-keeper

Related

Reading values from a file and outputting each number, largest/smallest numbers, sum, and average of numbers from the file

The issue that I am having is that I am able to read the information from the files, but when I try to convert them from a string to an integer, I get an error. I also have issues where the min/max prints as the entire file's contents.
I have tried using if/then statements as well as using different variables for each line in the file.
file=input("Which file do you want to get the data from?")
f=open('data3.txt','r')
sent='-999'
line=f.readline().rstrip('\n')
while len(line)>0:
lines=f.read().strip('\n')
value=int(lines)
if value>value:
max=value
print(max)
else:
min=value
print(min)
total=sum(lines)
print(total)
I expect the code to find the min/max of the numbers in the file as well as the sum and average of the numbers in the file. The results from the file being processed in the code, then have to be written to a different file. My results have consisted in various errors reading that Python is unable to convert from a str to an int as well as printing the entire file's contents instead of the expected results.
does the following work?
lines = list(open('fileToRead.txt'))
intLines = [int(i) for i in lines]
maxValue = max(intLines)
minvalue = min(intLines)
sumValue = sum(intLines)
print("MaxValue : {0}".format( maxValue))
print("MinValue : {0}".format(minvalue))
print("Sum : {0}".format(sumValue))
print("Avergae : {0}".format(sumValue/len(intLines)))
and this is how my filesToRead.txt is formulated (just a simple one, in fact)
10
20
30
40
5
1
I am reading file contents into a list. Then I create a new list (it can be joined with the previous step as part of some code refactoring) which has all the list of ints.Once when I have the list of ints, its easier to calculate max and min on it.
Note that some of the variables are not named properly. Also reading the whole file in one go (like what I have done here) might be a bad idea if the file is too large. In that case, you should never ever read the whole file in one go. In this case , you need to read it line by line, parse the ints and add them to a list of ints. Once when you are done reading the file, close the file. You can then start your calculations based on the list of ints that you have now obtained.
Please let me know if this resolves your query.
Thanks

Strings containing " - " always break onto newline with ruamel.yaml

I'm fairly new to YAML, within a Python 3.7 project, and decided to use ruamel.yaml to get me started. I intend to use it to store metadata associated with some video files.
I am creating YAML files with the following code:
data[filename] = [{'video': video_path},
{'key_frame': frame_path},
{'processed': get_timestamp()}]
yaml.dump(data, file_handle)
The created YAML file looks like this:
video.mp4:
- video: /Users/xyz/video.mp4
- key_frame: /Users/xyz/imgOutput/frame
- Trigger.jpg
- processed: '2018-07-26 17:09:06'
The issue is that the key_frame is a file called "frame - Trigger.jpg". However, the line always breaks at the " - " (i.e. space-dash-space) in the filename. Result is something that, as a human-readable file, it looks very wrong. In fact, it's processed correctly when it's read back in (using yaml.open), and treated as a single string filename as it should be. It's just the formatting in the YAML file that's wrong.
Any thoughts on the cause? Is this expected behaviour? I've tried many different ways of quoting the string in case that's it (which doesn't make a difference - even quoted it will split over the line), but fundamentally it does work, from a code sense - but as YAML's big selling point is human-readable files, it'd be nice to understand what's causing it and how to fix it.
In YAML plain scalars (i.e. the ones without single or double quotes) can be wrapped to an indented newline on whitespace. That is what's happening.
To reproduce this is difficult as your question is quite incomplete, but some things can be easily seen from the output:
data is a dict
filename, video_path, and frame_path are defined as strings.
file_handle is probably some file stream opened for writing.
Others are less easily deduced:
get_timestamp() doesn't return a datetime.datetime() instance as one would expect from its name, but a string representation thereof. To prevent this string from being interpreted as a timestamp, it has to be quoted.
you are using the default YAML() instance (which equals typ='rt'), as the non-default ones would write the leaf mappings in flow style ( - {video: /Users/xyz/video.mp4}, etc.)
With that and the appropriate imports you can make a functioning program:
import datetime
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML(typ='rt')
def get_timestamp():
return datetime.datetime(2018, 7, 26, 17, 9, 6).isoformat(sep=' ', timespec='seconds')
data = {}
filename = 'video.mp4'
video_path = '/Users/xyz/video.mp4'
frame_path = '/Users/xyz/imgOutput/frame - Trigger.jpg'
file_handle = sys.stdout
data[filename] = [{'video': video_path},
{'key_frame': frame_path},
{'processed': get_timestamp()}]
yaml.dump(data, file_handle)
and this outputs:
video.mp4:
- video: /Users/xyz/video.mp4
- key_frame: /Users/xyz/imgOutput/frame - Trigger.jpg
- processed: '2018-07-26 17:09:06'
So we forgot something and that is:
yaml.width = 24 # range from 24-38 inclusive
with that you get your output:
video.mp4:
- video: /Users/xyz/video.mp4
- key_frame: /Users/xyz/imgOutput/frame
- Trigger.jpg
- processed: '2018-07-26 17:09:06'
so just remove the yaml.width = line and you should be all set.
Next time please provide a minimal, but complete, functioning program that actually produces the output.
My guess is that your frame_path is much longer that you show here, and that you don't have a user xyz. That causes you to get over the default width (defined in the emitter to be 80) and the plain scalar to wrap. Just set yaml.width = 4096 or whatever is necessary for your scalar length and nesting depth.
When in doubt if the YAML output is correct, read it back in (using an YAML(typ='safe').load(input_stream), it should produce the original data.
Can you try str(frame_path)
data[filename] = [{'video': video_path},
{'key_frame': str(frame_path)},
{'processed': get_timestamp()}]
There is nothing special about the dash. If the string is longer than a certain threshold it will break at the first space after that. The examples you gave do not reproduce this behaviour for me, but longer strings do.
The generated YAML is valid. Any string, if quoted or not, can be broken up to several lines.
Maybe you can adjust the threshold in ruamel. I can't find anything in the documentation, though.
(See also my article Strings in YAML)

batch file extract numbers from text file with little information

So This is related to my other two posts. Im dealing with extracting text from a text file and analyzing it and I've run into some problems. For A while I've been using a method that sets all the text between two other strings as a variable, but here is the situation I have. I need to extract the speed (numbers) from the below string: "etc...,query":{"ping":47855},"cmts":...etc. The problem is that the text cmts sometimes changes to something else so really I need to extract all the numbers from this:
,query":{"ping":47855},"
One more thing that makes this difficult is that the characters }," Are all over the file. Thank you for helping me! -Lucas EDG Programmer.
Here's the full file:
{"_id":53291,"ip":"158.69.22.95","domain":"jectile.com","port":25565,"url":"","date_add":1453897770,"status":1,"scan":1,"uptime":99.53,"last_update":1485436105,"geo":{"country":"US","country_name":"United States","city":"Lake Forest"},"info":{"name":" Jectile | jectile.com [1.8-1.11]\n Shoota (Call of Duty) \/ Zambies (Zombie Survival)","type":"FML","version":"1.10","plugins":[],"players":18,"max_players":420,"players_list":[],"map":"world","software":"BungeeCord 1.8.x, 1.9.x, 1.10.x, 1.11.x","avg_player_day":24.458333,"avg_load_day":5.8234,"platform":"MINECRAFT","icon":true},"counter":{"online":47871,"offline":228,"players":{"date":"2017-01-26","total":0},"last_offline":0,"query":{"ping":47855},"cmts":1},"rating":{"main":19.24,"difference":-0.64,"content_up":0.15,"K":0},"last":{"offline":1485415702,"online":1485436105},"chart":{"14:30":14,"14:40":16,"14:50":15,"15:00":18,"15:10":12,"15:20":13,"15:30":9,"15:40":9,"15:50":11,"16:00":12,"16:10":11,"16:20":11,"16:30":18,"16:40":25,"16:50":23,"17:00":27,"17:10":27,"17:20":23,"17:30":24,"17:40":26,"17:50":33,"18:00":31,"18:10":31,"18:20":32,"18:30":37,"18:40":38,"18:50":39,"19:00":38,"19:10":34,"19:20":33,"19:30":40,"19:40":36,"19:50":37,"20:00":38,"20:10":36,"20:20":38,"20:30":37,"20:40":37,"20:50":37,"21:00":34,"21:10":32,"21:20":33,"21:30":33,"21:40":29,"21:50":28,"22:00":26,"22:10":21,"22:20":24,"22:30":29,"22:40":22,"22:50":23,"23:00":27,"23:10":24,"23:20":26,"23:30":25,"23:40":28,"23:50":27,"00:00":32,"00:10":29,"00:20":33,"00:30":32,"00:40":31,"00:50":33,"01:00":40,"01:10":40,"01:20":40,"01:30":41,"01:40":45,"01:50":48,"02:00":43,"02:10":45,"02:20":46,"02:30":46,"02:40":43,"02:50":42,"03:00":39,"03:10":36,"03:20":44,"03:30":34,"03:40":0,"03:50":32,"04:00":35,"04:10":35,"04:20":33,"04:30":43,"04:40":37,"04:50":26,"05:00":31,"05:10":31,"05:20":27,"05:30":25,"05:40":26,"05:50":18,"06:00":13,"06:10":15,"06:20":17,"06:30":18,"06:40":17,"06:50":15,"07:00":16,"07:10":17,"07:20":16,"07:30":16,"07:40":18,"07:50":19,"08:00":14,"08:10":12,"08:20":12,"08:30":13,"08:40":17,"08:50":20,"09:00":18,"09:10":0,"09:20":0,"09:30":27,"09:40":18,"09:50":20,"10:00":15,"10:10":13,"10:20":12,"10:30":10,"10:40":10,"10:50":11,"11:00":13,"11:10":13,"11:20":16,"11:30":19,"11:40":17,"11:50":13,"12:00":10,"12:10":11,"12:20":12,"12:30":16,"12:40":15,"12:50":16,"13:00":14,"13:10":10,"13:20":13,"13:30":16,"13:40":16,"13:50":17,"14:00":20,"14:10":16,"14:20":16},"query":"ping","max_stat":{"max_online":{"date":1470764061,"players":129}},"status_query":"ok"}
By the way, the reason things change is because it looks at info from different servers
Very similar to ther answer I gave you to your first question:
#Echo Off
Set/P var=<some.json
Set var=%var:*:{"ping":=%
Set var=%var:},=&:%
Echo=%var%
Timeout -1

Allocating matrix / structure data instead of string name to variable

I have a script that opens a folder and does some processing on the data present. Say, there's a file "XYZ.tif".
Inside this tif file, there are two groups of datasets, which show up in the workspace as
data.ch1eXYZ
and
data.ch3eXYZ
If I want to continue with the 2nd set, I can use
A=data.ch3eXYZ
However, XYZ usually is much longer and varies per file, whereas data.ch3e is consistent.
Therefore I tried
A=strcat('data.ch3e','origfilename');
where origfilename of course is XYZ, which has (automatically) been extracted before.
However, that gives me a string A (since I practically typed
A='data.ch3eXYZ'
instead of the matrix that data.ch3eXYZ actually is.
I think it's just a problem with ()'s, []'s, or {}'s but Ican't seem to figure it out.
Thanks in advance!
If you know the string, dynamic field references should help you here and are far better than eval
Slightly modified example from the linked blog post:
fldnm = 'fred';
s.fred = 18;
y = s.(fldnm)
Returns:
y =
18
So for your case:
test = data.(['ch3e' origfilename]);
Should be sufficient
Edit: Link to the documentation

Start loop at specific line of text file in groovy

I am using groovy and I am trying to have a text file be altered at specific line, without looping through all of the previous lines. Is there a way to state the line of a text file that you want to wish to alter?
For instance
Text file is:
1
2
3
4
5
6
I would like to say
Line(3) = p
and have it change the text file to:
1
2
p
4
5
6
I DO NOT want to have to do a loop to iterate through the lines to change the value, aka I do not want to use a .eachline {line ->...} method.
Thank you in advance, I really appreciate it!
I dont think you can skip lines and traverse like this. You could do the skip by using the Random Access File in java, but instead of lines you should be specifying the number of bytes.
Try using readLines() on file text. It will store all your lines in a list. To change content at line n, change content at n-1 index on list and then join on list items.
Something like this will do
//We can call this the DefaultFileHandler
lineNumberToModify = 3
textToInsert = "p"
line( lineNumberToModify, textToInsert )
def line(num , text){
list = file.readLines()
list[num - 1] = text
file.setText(list.join("\n"))
}
EDIT: For extremely large files, it is better that you have a custom implementation. May be something on the lines of what Tim Yates had suggested in the comment on your question.
The above readLines() can easily process upto 100000 lines of text within less than a sec. So you can do something like this:
if(file size < 10 MB)
use DefaultFileHandler()
else
use CustomFileHandler()
//CustomFileHandler
- Split the large file into buckets of acceptable size.
- Ex: Bucket 1(1-100000 lines), Bucket 2(100000-200000 lines), etc.
- if (lineNumberToModify falls in bucket range)
insert into line in the bucket
There is no hard and fast rule to define how you implement your CustomFileHandler as it completely depends on the use case scenario. If you need to do the above operation multiple times on the same file, you can choose to do the complete bucket split first, store them in memory and use the buckets for the following operations. Or if it is a one time operation, you can avoid manipulating all the buckets first but deal with only what you need and process the others later on on-demand basis.
And even within the buckets you can define your own intelligence to speed up your job. Say if you want to insert into 99999 line of a bucket with 1-100000 lines, you can exploit groovy's methods and closures to their fullest,
file.readLines().reverse()[1] = "some text"

Resources