Having some issues with Perl Splitting and Merging Functions - linux

First and foremost, I'm not familiar with Perl at all. I've been studying C++ primarily for the last 1/2 year. I'm in a class now that that is teaching Linux commands, and we have short little topics on languages used in Linux, including Perl, which is totally throwing me for a loop (no pun intended). I have a text file that contains a bunch of random numbers separated by spaces and tabs, maybe even newlines, that gets read into the program via a filehandle. I'm supposed to write 2 lines of code that split the lines of numbers and merge them into one array, inside of a foreach loop. I'm not looking for an answer, just a nudge in the right direction. I've been trying different things for multiple hours and feel totally silly I can't get it, I'm totally lost with the syntax. Its just a bit odd not working inside a compiler and out of my comfort zone working outside of C++. I really appreciate it. I've included a few photos. Basically, the code we are writing it just to store the numbers and the rest of the program will determine the smallest number and sum of all numbers. Mine is currently incorrect because I'm not sure what to do. In the output photo, it will display all the numbers being entered in via the text file, so you can see them.

Several things to fix here. First of all, please don't post screenshots of your sample data or code, as it makes it impossible to copy and paste to test your code or data. Post your code/data by indenting it with four spaces and a newline preceding the code block.
Add use strict; in your script. This should be lesson 0 in your class. After that add my to all variable declarations.
To populate #all_numbers with contents of each line's numbers, without using push, you can use something like this:
foreach my $line (#output_lines)
{
my #numbers = split /\s/, $line;
#all_numbers = (#all_numbers, #numbers);
}

You say you're "not looking for an answer," so here's your nudge:
You're almost there. You split each line well (using split/\s/) and store the numeric values in #all_numbers. However, notice that each time around in the loop, you replace (using the assignment, #all_numbers = ...) the whole contents of #all_numbers with the numbers you found in the current line. Effectively, you're throwing away everything you've stored from the previous lines.
Instead, you want to add to #all_numbers, not replace #all_numbers. Have a look at the push() function for how to do this.
NB: Your split() call is fine, but it's more customary to use split(' ', $line) in this case. (See split(): you can use a single space, ' ', instead of the pattern, /\s/, when you want to split on any whitespace.)

I hope you need to store the all splitting element into array, so you looking for push function.
foreach $line (#input_lines)
{
push(#all_numbers,split(/\s/,$line));
}
Your problem is, in every iteration, the splitted value is over written in an array not to append together. For example,
#array = qw(one two three);
#array = qw(five four seven);
print "#array";
output is five four seven not the one two three five four seven because this is reinitialize with a new values. You want to append the new values in the array in before or after use unshift or push
for example
#array = qw(one two three);
push(#array,qw(five four seven));

Another way:
my #all_numbers = map { split ' ', $_ } #output_lines;
See http://perldoc.perl.org/functions/map.html

Related

I want to join two strings, but separate the two strings by a space

I want to process files quickly through a program called star, but I have many files and want to pre-format the input from my files to save time. The format required is-
sample1read1.fq, sample2read1.fq \space\ sample1read2.fq, sample2read2.fq
[EDIT]: My files look like this:
trimmed_Sample_RX.fq where X can either be 1 or 2.
Star wants me to load all of the R1's together, separated by a comma, then a space and then all of my R2's together separated by a comma. To tackle this problem I have attempted to use the join command in python:
def identifier(x):
return(x[-10:])
read1= list(sorted(fnmatch.filter(os.listdir(PATH),'*_R1.fq'), key= identifier))
read1.append(' ')
read2= list(sorted(fnmatch.filter(os.listdir(PATH),'*_R2.fq'), key= identifier))
first_half= ','.join(read1)
second_half= ','.join(read2)
star_input= first_half + second_half
print(star_input)= 'trimmed_sample1R1.fq,trimmed_sample2R1.fq, , trimmed_sample1R2.fq,trimmed_sample2R2.fq'
I attempt to add a space to the end of my file list read1. Then I turn everything into a string and attempt to join the two strings together, but that space I added into my first half pops up in the concatenation as a comma
'trimmed_sample1R1.fq,trimmed_sample2R1.fq, , trimmed_sample1R2.f,trimmed_sample2R2.fq'
If I remove the step where I append a blank space and then concatenate the two strings I get the following
'trimmed_sample1R1.fq,trimmed_sample2R1.fqtrimmed_sample1R2.fq,trimmed_sample2R2.fq'
So now the comma is gone, but I also lose the space.
Thanks.
I think I found a workaround, but if there is a better way to do this please point it out to me. Basically I kept my code the same, but I change the last step. Now the flow looks like this:
first_half=','.join(read1)
second_half=','.join(read2)
star_input= '{} {}'.format(first_half, second_half)
print(star_input)
'trimmed_sample1R1.fq,trimmed_sample2R1 trimmed_sample1R2.fq,trimmedsample2R2.fq'

Comparing strings in python 2.7

This is my code:
for films in filmlist:
with codecs.open('peliculas.txt', encoding='utf8', mode='r') as lfile:
filmsDone = lfile.read()
filmsDoneList = filmsDone.split(',')
if films not in filmsDoneList:
with codecs.open('peliculas.txt', encoding='utf8', mode='a+') as lfile:
lfile.write(films.strip() + ',')
It will never recognize the last item of the list.
I have printed filmsDoneList and the last item in PyCharm looks like this: u'X Men.Primera Generacion'. I have printed films and they looks like this: X Men.Primera Generacion'
So I have no idea where is the problem. Thanks in advance.
#Rafa, for you to better understand what I meant in the comments, I had to write an entire answer in order for me to attach codes and screenshots.
Let's say the peliculas.txt file has the following format:
You can import such file in python according the following 3 commands:
fileIN=open('peliculas.txt','r')
filmsDoneList=fileIN.readlines()
fileIN.close()
So you basically open the file, import each line thanks to readlines() and then close the file because its contents are available in filmsDoneList. The latter has the following contents (in PyCharm):
Obviously this list is quite long and does not fit in my screen, but you get the point.
You can now get rid of that annoying newline tag '\r\n' by means of the following loop:
for id in range(len(filmsDoneList)):
filmsDoneList[id]=filmsDoneList[id].strip()
and now filmsDoneList has the form:
much better now, innit?
Now, let's say you want to add the following films:
newFilms=['The Exorcist','Back to the Future','Aliens','Back to the Future']
To make your code more robust, I have added Back to the Future twice. Basically you can get rid of duplicates in newFilms by means of the set() function. This will convert newFilms in a set with duplicates removed, but we will convert it back to a list thanks to this command:
newFilms=list(set(newFilms))
and now newFilms has the form:
Now that everything has been sorted, it's time to check if items in newFilms already are in filmsDoneList which, recall, is the contents of peliculas.txt.
Reopen peliculas.txt as follows:
fileOUT=open('peliculas.txt','a')
the 'a' tag means "append", so basically everything you write will be added to the file without removing anything from it.
And the main loop goes:
for film in newFilms:
if film in filmsDoneList:
pass
else:
fileOUT.write(film+'\n')
the pass means "do nothing". The write commands also appends the newline tag to the movie title: this will keep the previous format of 1 title per line. At the end of this loop you might as well close fileOUT.
The resulting peliculas.txt is
and, as you can see, Back to the Future was in newFilms but wasn't appended to the end of this file because already was in it. As instead, The Exorcist and Aliens have been appended to this file, at the bottom.
If your file has titles separated by commas, this approach is still valid. However you must add
filmsDoneList=filmsDoneList[0].split(',')
after the first for loop. Also in the write function (in the last for loop) you might want to replace the newline value with a comma.
This approach is cleaner, I reckon will also fix the problem you've been having and avoids continuous open/close files in a loop. Hope this helps!

How to compare Strings and put it into another program?

i´ve got small problem and before I spend even more time in trying to solve it i´d like to know if what I want to do is even possible ( and maybe input on how to do it^^).
My problem:
I want to take some text and then split it into different strings at every whitespace (for example "Hello my name is whatever" into "Hello" "my" "name" "is" "whatever").
Then I want to set every string with it´s own variable so that I get something alike to a= "Hello" b= "my" and so on. Then I want to compare the strings with other strings (the idea is to get addresses from applications without having to search through them so I thought I could copy a telephone book to define names and so on) and set matching input to variables like Firstname , LastName and street.
Then, and here comes the "I´d like to know if it´s possible" part I want it to put it into our database, this means I want it to copy the string into a text field and then to go to the next field via tab. I´ve done something like this before with AutoIT but i´ve got no idea how to tell AutoIT whats inside the strings so I guess it must be done through the programm itself.
I´ve got a little bit of experience with c++, python and BATCH files so it would be nice if anyone could tell me if this can even be done using those languages (and I fear C++ can do it and I´m just to stupid to do so).
Thanks in advance.
Splitting a string is very simple, there is usually a built in method called .split() which will help you, the method varies from language to language.
When you've done a split, it will be assigned to an array, you can then use an index to get the variables, for example you'd have:
var str = "Hello, my name is Bob";
var split = str.split(" ");
print split[0]; // is "Hello,"
print split[1]; // is "my" etc
You can also use JSON to return data so you could have an output like
print split["LastName"];
What you're asking for is defiantly possible.
Some links that could be useful:
Split a string in C++?
https://code.google.com/p/cpp-json/

need guidance with basic function creation in MATLAB

I have to write a MATLAB function with the following description:
function counts = letterStatistics(filename, allowedChar, N)
This function is supposed to open a text file specified by filename and read its entire contents. The contents will be parsed such that any character that isn’t in allowedChar is removed. Finally it will return a count of all N-symbol combinations in the parsed text. This function should be stored in a file name “letterStatistics.m” and I made a list of some commands and things of how the function should be organized according to my professors' lecture notes:
Begin the function by setting the default value of N to 1 in case:
a. The user specifies a 0 or negative value of N.
b. The user doesn’t pass the argument N into the function, i.e., counts = letterStatistics(filename, allowedChar)
Using the fopen function, open the file filename for reading in text mode.
Using the function fscanf, read in all the contents of the opened file into a string variable.
I know there exists a MATLAB function to turn all letters in a string to lower case. Since my analysis will disregard case, I have to use this function on the string of text.
Parse this string variable as follows (use logical indexing or regular expressions – do not use for loops):
a. We want to remove all newline characters without this occurring:
e.g.
In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.
In my younger and more vulnerableyears my father gave me some advicethat I’ve been turning over in my mindever since.
Replace all newline characters (special character \n) with a single space: ' '.
b. We will treat hyphenated words as two separate words, hence do the same for hyphens '-'.
c. Remove any character that is not in allowedChar. Hint: use regexprep with an empty string '' as an argument for replace.
d. Any sequence of two or more blank spaces should be replaced by a single blank space.
Use the provided permsRep function, to create a matrix of all possible N-symbol combinations of the symbols in allowedChar.
Using the strfind function, count all the N-symbol combinations in the parsed text into an array counts. Do not loop through each character in your parsed text as you would in a C program.
Close the opened file using fclose.
HERE IS MY QUESTION: so as you can see i have made this list of what the function is, what it should do, and using which commands (fclose etc.). the trouble is that I'm aware that closing the file involves use of 'fclose' but other than that I'm not sure how to execute #8. Same goes for the whole function creation. I have a vague idea of how to create a function using what commands but I'm unable to produce the actual code.. how should I begin? Any guidance/hints would seriously be appreciated because I'm having programmers' block and am unable to start!
I think that you are new to matlab, so the documentation may be complicated. The root of the problem is the basic understanding of file I/O (input/output) I guess. So the thing is that when you open the file using fopen, matlab returns a pointer to that file, which is generally called a file ID. When you call fclose you want matlab to understand that you want to close that file. So what you have to do is to use fclose with the correct file ID.
fid = open('test.txt');
fprintf(fid,'This is a test.\n');
fclose(fid);
fid = 0; % Optional, this will make it clear that the file is not open,
% but it is not necessary since matlab will send a not open message anyway
Regarding the function creation the syntax is something like this:
function out = myFcn(x,y)
z = x*y;
fprintf('z=%.0f\n',z); % Print value of z in the command window
out = z>0;
This is a function that checks if two numbers are positive and returns true they are. If not it returns false. This may not be the best way to do this test, but it works as example I guess.
Please comment if this is not what you want to know.

How to make this Groovy string search code more efficient?

I'm using the following groovy code to search a file for a string, an account number. The file I'm reading is about 30MB and contains 80,000-120,000 lines. Is there a more efficient way to find a record in a file that contains the given AcctNum? I'm a novice, so I don't know which area to investigate, the toList() or the for-loop. Thanks!
AcctNum = 1234567890
if (testfile.exists())
{
lines = testfile.readLines()
words = lines.toList()
for (word in words)
{
if (word.contains(AcctNum)) { done = true; match = 'YES' ; break }
chunks += 1
if (done) { break }
}
}
Sad to say, I don't even have Groovy installed on my current laptop - but I wouldn't expect you to have to call toList() at all. I'd also hope you could express the condition in a closure, but I'll have to refer to Groovy in Action to check...
Having said that, do you really need it split into lines? Could you just read the whole thing using getText() and then just use a single call to contains()?
EDIT: Okay, if you need to find the actual line containing the record, you do need to call readLines() but I don't think you need to call toList() afterwards. You should be able to just use:
for (line in lines)
{
if (line.contains(AcctNum))
{
// Grab the results you need here
break;
}
}
When you say efficient you usually have to decide which direction you mean: whether it should run quickly, or use as few resources (memory, ...) as possible. Often both lie on opposite sites and you have to pick a trade-off.
If you want to search memory-friendly I'd suggest reading the file line-by-line instead of reading it at once which I suspect it does (I would be wrong there, but in other languages something like readLines reads the whole file into an array of strings).
If you want it to run quickly I'd suggest, as already mentioned, reading in the whole file at once and looking for the given pattern. Instead of just checking with contains you could use indexOf to get the position and then read the record as needed from that position.
I should have explained it better, if I find a record with the AcctNum, I extract out other information on the record...so I thought I needed to split the file into multiple lines.
if you control the format of the file you are reading, the solution is to add in an index.
In fact, this is how databases are able to locate records so quickly.
But for 30MB of data, i think a modern computer with a decent harddrive should do the trick, instead of over complicating the program.

Resources