filelist.txt contains a list of files:
/path/file1.json
/path/file2.json
/path/fileN.json
Is there a (simple) MATLAB command that will accept filelist.txt and read each file as a string and store each string into a cell array?
Just use readtable, asking it to read each line in full.
>> tbl = readtable('filelist.txt','ReadVariableNames',false,'Delimiter','\n');
>> tbl.Properties.VariableNames = {'filenames'}
tbl =
3×1 table
filenames
__________________
'/path/file1.json'
'/path/file2.json'
'/path/fileN.json'
Then access the elements in a loop
for idx = 1:height(tbl)
this_filename = tbl.filenames{idx};
end
This problem is a bit to specific for a standard function. However, it is easily doable with the combination of two functions:
First, you have to open the file:
fid = fopen('filelist.txt');
Next you can read line by line with:
line_ex = fgetl(fid)
This function includes a counter. If you call the function the next time, it will read the second line and so on. You find more information here.
The whole code might look like this:
% Open file
fid = fopen('testabc');
numberOfLines = 3;
% Preallocate cell array
line = cell(numberOfLines, 1);
% Read one line after the other and save it in a cell array
for i = 1:numberOfLines
line{i} = fgetl(fid);
end
% Close file
fclose(fid);
For this replace the for loop with a while loop:
i=0;
while ~feof(fid)
i=i+1
line{1} = fgetl(fid)
end
Alternative to while loop: Retrieve the number of lines and use in Caduceus' for-loop:
% Open file
fid = fopen('testabc');
numberOfLines = numlinestextfile('testable'); % function defined below
% Preallocate cell array
line = cell(numberOfLines, 1);
% Read one line after the other and save it in a cell array
for i = 1:numberOfLines
line{i} = fgetl(fid);
end
% Close file
fclose(fid);
Custom function:
function [lineCount] = numlinestextfile(filename)
%numlinestextfile: returns line-count of filename
% Detailed explanation goes here
if (~ispc) % Tested on OSX
evalstring = ['wc -l ', filename];
% [status, cmdout]= system('wc -l filenameOfInterest.txt');
[status, cmdout]= system(evalstring);
if(status~=1)
scanCell = textscan(cmdout,'%u %s');
lineCount = scanCell{1};
else
fprintf(1,'Failed to find line count of %s\n',filenameOfInterest.txt);
lineCount = -1;
end
else
if (~ispc) % For Windows-based systems
[status, cmdout] = system(['find /c /v "" ', filename]);
if(status~=1)
scanCell = textscan(cmdout,'%s %s %u');
lineCount = scanCell{3};
disp(['Found ', num2str(lineCount), ' lines in the file']);
else
disp('Unable to determine number of lines in the file');
end
end
end
Related
I have a file with strings of a known length, but no separator.
% What should be the result
vals = arrayfun(#(x) ['Foobar ', num2str(x)], 1:100000, 'UniformOutput', false);
% what the file looks like when read in
strs = cell2mat(vals);
strlens = cellfun(#length, vals);
The most straightforward approach is quite slow:
out = cell(1, length(strlens));
for i=1:length(strlens)
out{i} = fread(f, strlens(i), '*char');
end % 5.7s
Reading everything in and splitting it up afterwards is a lot faster:
strs = fread(f, sum(strlens), '*char');
out = cell(1, length(strlens));
slices = [0, cumsum(strlens)];
for i=1:length(strlens)
out{i} = strs(slices(i)+1:slices(i+1));
end % 1.6s
With a mex function I can get down to 0.6s, so there's still a lot of room for improvement. Can I get comparable performance with pure Matlab (R2016a)?
Edit: the seemingly perfect mat2cell function doesn't help:
out = mat2cell(strs, 1, strlens); % 2.49s
Your last approach – reading everything at once and splitting it up afterwards – looks pretty optimal to me, and is how I do stuff like this.
For me, it's running in about 80 ms seconds when the file is on a local SSD in both R2016b and R2019a, on Mac.
function out = scratch_split_strings(strlens)
%
% Example:
% in_strs = arrayfun(#(x) ['Foobar ', num2str(x)], 1:100000, 'UniformOutput', false);
% strlens = cellfun(#length, in_strs);
% big_str = cat(2, in_strs{:});
% fid = fopen('text.txt'); fprintf(fid, '%s', big_str); fclose(fid);
% scratch_split_strings(strlens);
t0 = tic;
fid = fopen('text.txt');
txt = fread(fid, sum(strlens), '*char');
fclose(fid);
fprintf('Read time: %0.3f s\n', toc(t0));
str = txt;
t0 = tic;
out = cell(1, length(strlens));
slices = [0, cumsum(strlens)];
for i = 1:length(strlens)
out{i} = str(slices(i)+1:slices(i+1))';
end
fprintf('Munge time: %0.3f s\n', toc(t0));
end
>> scratch_split_strings(strlens);
Read time: 0.002 s
Munge time: 0.075 s
Have you stuck it in the profiler to see what's taking up your time here?
As far as I know, there is no faster way to split up a single primitive array into variable-length subarrays with native M-code. You're doing it right.
How would I attempt this?
I'm trying to create something that would remove all quotes (" ") in a Lua file but I have had no luck so far. But it might be because im a newbie at Lua.
I'm using this from GitHub.
function from_base64(to_decode)
local padded = to_decode:gsub("%s", "")
local unpadded = padded:gsub("=", "")
local bit_pattern = ''
local decoded = ''
for i = 1, string.len(unpadded) do
local char = string.sub(to_decode, i, i)
local offset, _ = string.find(index_table, char)
if offset == nil then
error("Invalid character '" .. char .. "' found.")
end
bit_pattern = bit_pattern .. string.sub(to_binary(offset-1), 3)
end
for i = 1, string.len(bit_pattern), 8 do
local byte = string.sub(bit_pattern, i, i+7)
decoded = decoded .. string.char(from_binary(byte))
end
local padding_length = padded:len()-unpadded:len()
if (padding_length == 1 or padding_length == 2) then
decoded = decoded:sub(1,-2)
end
return decoded
end
I'm trying to create something that would remove all quotes (" ") in a Lua file
-- read contents of file into memory
local file = io.open(filename)
local text = file:read('*a')
file:close()
-- remove all double-quotes from the contents
text = text:gsub('"','')
-- write contents back to the file
local file = io.open(filename, 'w+')
local text = file:write(text)
file:close()
I'm having issues writing strings to binary in Lua. There is an existing example and I tried modifying it. Take a look:
function StringToBinary()
local file = io.open("file.bin", "wb")
local t = {}
local u = {}
local str = "Hello World"
file:write("string len = " ..#str ..'\n')
math.randomseed(os.time())
for i=1, #str do
t[i] = string.byte(str[i])
file:write(t[i].." ");
end
file:write("\n")
for i=1, #str do
u[i] = math.random(0,255)
file:write(u[i].." ");
end
file:write("\n"..string.char(unpack(t)))
file:write("\n"..string.char(unpack(u)))
file:close()
end
file:write(t[i].." ") and file:write(u[i].." ") write both tables with integer value. However with my last two writes: unpack(t) displays the original text, while unpack(u) displays the binaries.
It's probably string.byte(str[i]) that is mistaken. What should I replace it with? Am I missing something?
t[i] = string.byte(str[i])
is wrong, it should be:
t[i] = string.byte(str, i)
I have a fairly large text file and am trying to search for a particular term so that i can start a process after that point, but this doesn't seem to be working for me:
fileID = fopen(resfile,'r');
line = 0;
while 1
tline = fgetl(fileID);
line = line + 1;
if ischar(tline)
startRow = strfind(tline, 'OptimetricsResult');
if isfinite(startRow) == 1;
break
end
end
end
The answer I get is 9, but my text file:
$begin '$base_index$'
$begin 'properties'
all_levels=000000000000
time(year=000000002013, month=000000000006, day=000000000020, hour=000000000008, min=000000000033, sec=000000000033)
version=000000000000
$end 'properties'
$begin '$base_index$'
$index$(pos=000000492036, lin=000000009689, lvl=000000000000)
$end '$base_index$'
definitely doesn't have that in the first 9 rows?
If I ctrl+F the file, I know that OptimetricsResult only appears once, and that it's 6792 lines down
Any suggestions?
Thanks
I think your script somehow works, and you were just looking at the wrong variable. I assume that the answer you get is startRow = 9 and not line = 9. Check the variable line. By the way, note that you're not checking an End-of-File, so your while loop might run indefinitely the file doesn't contain your search string.
An alternative approach, (which is much simpler in my humble opinion) would be reading all lines at once (each one stored as a separate string) with textscan, and then applying regexp or strfind:
%// Read lines from input file
fid = fopen(filename, 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
%// Search a specific string and find all rows containing matches
C = strfind(C{1}, 'OptimetricsResult');
rows = find(~cellfun('isempty', C));
I can't reproduce your problem.
Are you sure you've properly closed the file before re-running this script? If not, the internal line counter in fgetl does not get reset, so you get false results. Just issue a fclose all on the MATLAB command prompt, and add a fclose(fileID); after the loop, and test again.
In any case, I suggest modifying your infinite-loop (with all sorts of pitfalls) to the following finite loop:
haystack = fopen(resfile,'r');
needle = 'OptimetricsResult';
line = 0;
found = false;
while ~feof(haystack)
tline = fgetl(haystack);
line = line + 1;
if ischar(tline) && ~isempty(strfind(tline, needle))
found = true;
break;
end
end
if ~found
line = NaN; end
fclose(fileID);
line
You could of course also leave the searching to more specialized tools, which come free with most operating systems:
haystack = 'resfile.txt';
needle = 'OptimetricsResult';
if ispc % Windows
[~,lines] = system(['find /n "' needle '" ' haystack]);
elseif isunix % Mac, Linux
[~,lines] = system(['grep -n "' needle '" ' haystack]);
else
error('Unknown operating system!');
end
You'd have to do a bit more parsing to extract the line number from C, but I trust this will be no issue.
I have strings of 32 chars in a file (multiple lines).
What I want to do is to make a new file and put them there by making columns of 4 chars each.
For example I have:
00000000000FDAD000DFD00ASD00
00000000000FDAD000DFD00ASD00
00000000000FDAD000DFD00ASD00
....
and in the new file, I want them to appear like this:
0000 0000 000F DAD0 00DF D00A SD00
0000 0000 000F DAD0 00DF D00A SD00
Can you anybody help me? I am working for hours now and I can't find the solution.
First, open the input file and read the lines as strings:
infid = fopen(infilename, 'r');
C = textscan(infid, '%s', 'delimiter', '');
fclose(infid);
Then use regexprep to split the string into space-delimited groups of 4 characters:
C = regexprep(C{:}, '(.{4})(?!$)', '$1 ');
Lastly, write the modified lines to the output file:
outfid = fopen(outfilename, 'w');
fprintf(outfid, '%s\n', C{:});
fclose(outfid);
Note that this solution is robust enough to work on lines of variable length.
Import
fid = fopen('test.txt');
txt = textscan(fid,'%s');
fclose(fid);
Transform into a M by 28 char array, transpose and reshape to have a 4 char block on each column. Then add to the bottom a row of blanks and reshape back. Store each line in a cell.
txt = reshape(char(txt{:})',4,[]);
txt = cellstr(reshape([txt; repmat(' ',1,size(txt,2))],35,[])')
Write each cell/line to new file
fid = fopen('test2.txt','w');
fprintf(fid,'%s\r\n',txt{:});
fclose(fid);
Here's one way to do it in Matlab:
% read in file
fid = fopen('data10.txt');
data = textscan(fid,'%s');
fclose(fid);
% save new file
s = size(data{1});
newFid = fopen('newFile.txt','wt');
for t = 1:s(1) % format and save each row
line = data{1}{t};
newLine = '';
index = 1;
for k = 1:7 % seven sets of 4 characters
count = 0;
while count < 4
newLine(end + 1) = line(index);
index = index + 1;
count = count + 1;
end
newLine(end + 1) = ' ';
end
fprintf(newFid, '%s\n', newLine);
end
fclose(newFid);