Search text file in MATLAB for a string to begin process - string

I have a fairly large text file and am trying to search for a particular term so that i can start a process after that point, but this doesn't seem to be working for me:
fileID = fopen(resfile,'r');
line = 0;
while 1
tline = fgetl(fileID);
line = line + 1;
if ischar(tline)
startRow = strfind(tline, 'OptimetricsResult');
if isfinite(startRow) == 1;
break
end
end
end
The answer I get is 9, but my text file:
$begin '$base_index$'
$begin 'properties'
all_levels=000000000000
time(year=000000002013, month=000000000006, day=000000000020, hour=000000000008, min=000000000033, sec=000000000033)
version=000000000000
$end 'properties'
$begin '$base_index$'
$index$(pos=000000492036, lin=000000009689, lvl=000000000000)
$end '$base_index$'
definitely doesn't have that in the first 9 rows?
If I ctrl+F the file, I know that OptimetricsResult only appears once, and that it's 6792 lines down
Any suggestions?
Thanks

I think your script somehow works, and you were just looking at the wrong variable. I assume that the answer you get is startRow = 9 and not line = 9. Check the variable line. By the way, note that you're not checking an End-of-File, so your while loop might run indefinitely the file doesn't contain your search string.
An alternative approach, (which is much simpler in my humble opinion) would be reading all lines at once (each one stored as a separate string) with textscan, and then applying regexp or strfind:
%// Read lines from input file
fid = fopen(filename, 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
%// Search a specific string and find all rows containing matches
C = strfind(C{1}, 'OptimetricsResult');
rows = find(~cellfun('isempty', C));

I can't reproduce your problem.
Are you sure you've properly closed the file before re-running this script? If not, the internal line counter in fgetl does not get reset, so you get false results. Just issue a fclose all on the MATLAB command prompt, and add a fclose(fileID); after the loop, and test again.
In any case, I suggest modifying your infinite-loop (with all sorts of pitfalls) to the following finite loop:
haystack = fopen(resfile,'r');
needle = 'OptimetricsResult';
line = 0;
found = false;
while ~feof(haystack)
tline = fgetl(haystack);
line = line + 1;
if ischar(tline) && ~isempty(strfind(tline, needle))
found = true;
break;
end
end
if ~found
line = NaN; end
fclose(fileID);
line
You could of course also leave the searching to more specialized tools, which come free with most operating systems:
haystack = 'resfile.txt';
needle = 'OptimetricsResult';
if ispc % Windows
[~,lines] = system(['find /n "' needle '" ' haystack]);
elseif isunix % Mac, Linux
[~,lines] = system(['grep -n "' needle '" ' haystack]);
else
error('Unknown operating system!');
end
You'd have to do a bit more parsing to extract the line number from C, but I trust this will be no issue.

Related

MATLAB: How load a list of filenames from .txt file

filelist.txt contains a list of files:
/path/file1.json
/path/file2.json
/path/fileN.json
Is there a (simple) MATLAB command that will accept filelist.txt and read each file as a string and store each string into a cell array?
Just use readtable, asking it to read each line in full.
>> tbl = readtable('filelist.txt','ReadVariableNames',false,'Delimiter','\n');
>> tbl.Properties.VariableNames = {'filenames'}
tbl =
3×1 table
filenames
__________________
'/path/file1.json'
'/path/file2.json'
'/path/fileN.json'
Then access the elements in a loop
for idx = 1:height(tbl)
this_filename = tbl.filenames{idx};
end
This problem is a bit to specific for a standard function. However, it is easily doable with the combination of two functions:
First, you have to open the file:
fid = fopen('filelist.txt');
Next you can read line by line with:
line_ex = fgetl(fid)
This function includes a counter. If you call the function the next time, it will read the second line and so on. You find more information here.
The whole code might look like this:
% Open file
fid = fopen('testabc');
numberOfLines = 3;
% Preallocate cell array
line = cell(numberOfLines, 1);
% Read one line after the other and save it in a cell array
for i = 1:numberOfLines
line{i} = fgetl(fid);
end
% Close file
fclose(fid);
For this replace the for loop with a while loop:
i=0;
while ~feof(fid)
i=i+1
line{1} = fgetl(fid)
end
Alternative to while loop: Retrieve the number of lines and use in Caduceus' for-loop:
% Open file
fid = fopen('testabc');
numberOfLines = numlinestextfile('testable'); % function defined below
% Preallocate cell array
line = cell(numberOfLines, 1);
% Read one line after the other and save it in a cell array
for i = 1:numberOfLines
line{i} = fgetl(fid);
end
% Close file
fclose(fid);
Custom function:
function [lineCount] = numlinestextfile(filename)
%numlinestextfile: returns line-count of filename
% Detailed explanation goes here
if (~ispc) % Tested on OSX
evalstring = ['wc -l ', filename];
% [status, cmdout]= system('wc -l filenameOfInterest.txt');
[status, cmdout]= system(evalstring);
if(status~=1)
scanCell = textscan(cmdout,'%u %s');
lineCount = scanCell{1};
else
fprintf(1,'Failed to find line count of %s\n',filenameOfInterest.txt);
lineCount = -1;
end
else
if (~ispc) % For Windows-based systems
[status, cmdout] = system(['find /c /v "" ', filename]);
if(status~=1)
scanCell = textscan(cmdout,'%s %s %u');
lineCount = scanCell{3};
disp(['Found ', num2str(lineCount), ' lines in the file']);
else
disp('Unable to determine number of lines in the file');
end
end
end

Insert Charc into String in a Range where character X exist at Index - swift 2.0

Please could somebody help me.
I am trying to add a new line (\n) into an existing string.
Lets say the string is 20+ Characters long, I want to find a space " " between the range of 15 and 20 then inset a new line (\n) just after the index to where the char " " (space) is
I hope that makes sense :F
Code i have for this so far is as follows
var newString = string
newString[newString.startIndex..< newString.startIndex.advancedBy(16)]
/* let startIndex = newString.startIndex.advancedBy(16)
let endIndex = newString.endIndex
let newRange = startIndex ..< endIndex
print("start index = \(newRange)")*/
let range: Range<String.Index> = newString.rangeOfString(" ")!
let index: Int = newString.startIndex.distanceTo(range.startIndex)
newString.insert("\n", atIndex: newString.startIndex.advancedBy(index))
label.text = newString
if I try the following
let newIndex = name.startIndex.advancedBy(19).distanceTo(range.endIndex)
I get the error message
fatal error: can not increment endIndex
Ive got a feeling I'm on the right tracks but the above will inset a new line at the index where space first appears in the string and not between the index of e.g. 15 and 20
Thanks for your help in advance
Thomas
The following finds the first space in the range 15..<END_OF_YOUR_STRING, and replaces it with a new line (\n). In your question you stated you explicitly wanted to look for a space in range 15...20, and also insert a new line after the space. Below I have assumed that you actually want:
To replace the space by a new line, since you'll otherwise have a trailing space on the line following the line break.
To search for the first space starting at index 15, but continuing until you find one (otherwise: if you find no space within range 15...20, no line break should be inserted?).
Both of these deviations from your question can be quite easily reverted, so tell me if you'd prefer me to follow your instructions to specifically to the point (rather than including my own reason), and I'll update this answer.
Solution as follows:
var foo = "This is my somewhat long test string"
let bar = 15 /* find first space " " starting from index 'bar' */
if let replaceAtIndex = foo[foo.startIndex.advancedBy(bar)..<foo.endIndex]
.rangeOfString(" ")?.startIndex.advancedBy(bar) {
foo = foo.stringByReplacingCharactersInRange(
replaceAtIndex...replaceAtIndex, withString: "\n")
}
print(foo)
/* This is my somewhat
long test string */
Note that there is a off-by-one difference between finding a space in the range of 15 to 20 and the 15:th to 20:th character (the latter is in the range 14...19). Above, we search for the first space starting at the 16th character (index 15).
Thanks to dfri for pointing me in the right direction. Although his answer was correct for my problem specifically I've provided the following code to help
if string.characters.count >= 20
{
let bar = 15 // where to split the code to begin looking for the character
let beginString = string.substringWithRange(name.startIndex..<name.startIndex.advancedBy(bar))
var endString = string[string.startIndex.advancedBy(bar)..<name.endIndex]
if endString.containsString(" ")
{
let range = endString.rangeOfString(" ")
if let i = range
{
endString.insert("\n", atIndex:i.startIndex.advancedBy(1) )
let newString = "\(beginString)\(endString)"
label.text = newString
}
}
}

Writing strings to binary in Lua

I'm having issues writing strings to binary in Lua. There is an existing example and I tried modifying it. Take a look:
function StringToBinary()
local file = io.open("file.bin", "wb")
local t = {}
local u = {}
local str = "Hello World"
file:write("string len = " ..#str ..'\n')
math.randomseed(os.time())
for i=1, #str do
t[i] = string.byte(str[i])
file:write(t[i].." ");
end
file:write("\n")
for i=1, #str do
u[i] = math.random(0,255)
file:write(u[i].." ");
end
file:write("\n"..string.char(unpack(t)))
file:write("\n"..string.char(unpack(u)))
file:close()
end
file:write(t[i].." ") and file:write(u[i].." ") write both tables with integer value. However with my last two writes: unpack(t) displays the original text, while unpack(u) displays the binaries.
It's probably string.byte(str[i]) that is mistaken. What should I replace it with? Am I missing something?
t[i] = string.byte(str[i])
is wrong, it should be:
t[i] = string.byte(str, i)

Separate chars of a file in matlab

I have strings of 32 chars in a file (multiple lines).
What I want to do is to make a new file and put them there by making columns of 4 chars each.
For example I have:
00000000000FDAD000DFD00ASD00
00000000000FDAD000DFD00ASD00
00000000000FDAD000DFD00ASD00
....
and in the new file, I want them to appear like this:
0000 0000 000F DAD0 00DF D00A SD00
0000 0000 000F DAD0 00DF D00A SD00
Can you anybody help me? I am working for hours now and I can't find the solution.
First, open the input file and read the lines as strings:
infid = fopen(infilename, 'r');
C = textscan(infid, '%s', 'delimiter', '');
fclose(infid);
Then use regexprep to split the string into space-delimited groups of 4 characters:
C = regexprep(C{:}, '(.{4})(?!$)', '$1 ');
Lastly, write the modified lines to the output file:
outfid = fopen(outfilename, 'w');
fprintf(outfid, '%s\n', C{:});
fclose(outfid);
Note that this solution is robust enough to work on lines of variable length.
Import
fid = fopen('test.txt');
txt = textscan(fid,'%s');
fclose(fid);
Transform into a M by 28 char array, transpose and reshape to have a 4 char block on each column. Then add to the bottom a row of blanks and reshape back. Store each line in a cell.
txt = reshape(char(txt{:})',4,[]);
txt = cellstr(reshape([txt; repmat(' ',1,size(txt,2))],35,[])')
Write each cell/line to new file
fid = fopen('test2.txt','w');
fprintf(fid,'%s\r\n',txt{:});
fclose(fid);
Here's one way to do it in Matlab:
% read in file
fid = fopen('data10.txt');
data = textscan(fid,'%s');
fclose(fid);
% save new file
s = size(data{1});
newFid = fopen('newFile.txt','wt');
for t = 1:s(1) % format and save each row
line = data{1}{t};
newLine = '';
index = 1;
for k = 1:7 % seven sets of 4 characters
count = 0;
while count < 4
newLine(end + 1) = line(index);
index = index + 1;
count = count + 1;
end
newLine(end + 1) = ' ';
end
fprintf(newFid, '%s\n', newLine);
end
fclose(newFid);

Append texts tab-delimited text file column wise in C#

I have a tab-delimited text file of size of many GBs. Task here is to append header texts to each column. As of now, I use StreamReader to read line by line and append headers to each column. It takes a lot of time as of now. Is there a way to make it faster ? I was thinking if there is a way to process the file column-wise. One way would be to import the file in database table and then bcp out the data after appending the headers. Is there any other better way, probably by calling powershell, awk/sed in C# code ?
Code is as follows :
StreamReader sr = new StreamReader(#FilePath, System.Text.Encoding.Default);
string mainLine = sr.ReadLine();
string[] fileHeaders = mainLine.Split(new string[] { "\t" }, StringSplitOptions.None);
string newLine = "";
System.IO.StreamWriter outFileSw = new System.IO.StreamWriter(#outFile);
while (!sr.EndOfStream)
{
mainLine = sr.ReadLine();
string[] originalLine = mainLine.Split(new string[] { "\t" }, StringSplitOptions.None);
newLine = "";
for (int i = 0; i < fileHeaders.Length; i++)
{
if(fileHeaders[i].Trim() != "")
newLine = newLine + fileHeaders[i].Trim() + "=" + originalLine[i].Trim() + "&";
}
outFileSw.WriteLine(newLine.Remove(newLine.Length - 1));
}
Nothing else operating on just text files is going to be significantly faster - fundamentally you've got to read the whole of the input file, and you've got to create a whole new output file, as you can't "insert" text for each column.
Using a database would almost certainly be a better idea in general, but adding a column could still end up being a relatively slow business.
You can improve how you're dealing with each line, however. In this code:
for (int i = 0; i < fileHeaders.Length; i++)
{
if(fileHeaders[i].Trim() != "")
newLine = newLine + fileHeaders[i].Trim() + "=" + originalLine[i].Trim() + "&";
}
... you're using string concatenation in a loop, which will be slow if there's a large number of columns. Using a StringBuilder is very likely to be more efficient. Additionally, there's no need to call Trim() on every string in fileHeaders on every line. You can just work out which columns you want once, trim the header appropriately, and filter that way.

Resources