I've accumulated a list of over 10,000 text files in Octave. I've got a function which cleans up the contents of each files, normalizing things in various ways (lowercase, reducing repeated whitespace, etc). I'd like to distill from all these files a list of words that appear in at least 100 files. I'm not at all familiar with Octave data structs or cell arrays or the Octave sorting functions, and was hoping someone could help me understand how to:
initialize an appropriate data structure (word_appearances) to count how many emails contain a particular word
loop thru the unique words that appear in an email string and increment for each of those words the count I'm tracking in word_appearances -- ideally we'd ignore words less than two chars in length and also exclude a short list of stop_words.
reduce word_appearances to only contain words that appear some number of times, e.g, min_appearances=100 times.
sort the words in word_appearances alphabetically and either export this as a .MAT file or as a CSV file like so:
1 aardvark
2 albatross
etc.
I currently have this code to loop through my files one by one:
for i = 1:total_files
filename = char(files{i}(1));
printf("processing %s\n", filename);
file_contents = jPreProcessFile(readFile(filename))
endfor
Note that the file_contents that comes back is pretty clean -- usually just a bunch of words, some repeated, separated by single spaces like so:
email market if done right is both highli effect and amazingli cost effect ok it can be downright profit if done right imagin send your sale letter to on million two million ten million or more prospect dure the night then wake up to sale order ring phone and an inbox full of freshli qualifi lead we ve been in the email busi for over seven year now our list ar larg current and deliver we have more server and bandwidth than we current need that s why we re send you thi note we d like to help you make your email market program more robust and profit pleas give us permiss to call you with a propos custom tailor to your busi just fill out thi form to get start name email address phone number url or web address i think you ll be delight with the bottom line result we ll deliv to your compani click here emailaddr thank you remov request click here emailaddr licatdcknjhyynfrwgwyyeuwbnrqcbh
Obviously, I need to create the word_appearances data structure such that each element in it specifies a word and how many files have contained that word so far. My primary point of confusion is what sort of data structure word_appearances should be, how I would search this data structure to see if some new word is already in it, and if found, increment its count, otherwise add a new element to word_appearances with count=1.
Octave has containers.Map to hold key-value pairs. This is the simple usage:
% initialize map
m = containers.Map('KeyType', 'char', 'ValueType', 'int32');
% check if it has a word
word = 'hello';
if !m.isKey(word)
m(word) = 1;
endif
% increment existing values
m(word) += 1;
This is one way to extract most frequent words from a map like the one above:
counts = m.values;
[sorted_counts, indices] = sort(cell2mat(counts));
top10_indices = indices(end:-1:end-9);
top10_words = m.keys(top10_indices);
I must warn you though, Octave may be pretty slow at this task, considering that you have thousands of files. Use it only if running time isn't that important for you.
So I have to create code that validates (ask username and password) whether a password:
Contains at least 1 number
Contains at least 1 capital letter
Contains at least 1 small letter
Contains at least 1 special symbol
and again ask the username and password (the previous one that we entered) if enter the wrong one after 3rd attempt it will print account blocked!
I am not sure about the contains statements. I know you can just assign a variable += 1 every time they enter an incorrect or invalid password until that variable hits a certain number (3). Then it would print out a statement that says something along the lines of "Blocked".
I am a beginner in python and want to know how to take just the user specified number of inputs in one single line and store each input in a variable.
For example:
Suppose I have 3 test cases and have to pass 4 integers separated by a white space for each such test case.
The input should look like this:
3
1 0 4 3
2 5 -1 4
3 7 1 9
I know about the split() method that helps you to separate integers with a space in between. But since I need to input only 4 integers, I need to know how to write the code so that the computer would take only 4 integers for each test case, and then the input line should automatically move, asking the user for input for the next test case.
Other than that, the other thing I am looking for is how to store each integer for each test case in some variable so I can access each one later.
For the first part, if you would like to store input in a variable, you would do the following...
(var_name) = input()
Or if you want to treat your input as an integer, and you are sure it is an integer, you would want to do this
(var_name) = int(input())
Then you could access the input by calling up the var_name.
Hope that helped :D
Trying to write a custom data validation formula that would only allow values in the following format: 2-digit year (this can be just 2 numbers), dash ("-"), then a 1 or 2 letter character(s) (would prefer upper case, but would settle for lower case), another dash ("-"), and then a 5-digit number. So the final value looks like: 17-FL-12345 ...or 16-G-00008...
I actually have a but more, but if I could get the above working, that would be terrific. I don't know if there's a way, but it would be great if additionally I could use custom formatting to get the dashes to appear when they are not entered, i.e., user enters "17FL12345" and it gets automatically formatted to "17-FL-12345". Finally, again, this isn't a deal breaker either, but it would also be great if the last 5 digits would add any leading zero's, i.e., the user enters 17-G-8 (or just 17G8) and it gets formatted to 17-G-00008.
Can't use VBA unfortunately. Some potential solutions to similar questions I've viewed include:
https://www.mrexcel.com/forum/excel-questions/615799-data-validation-mixed-numeric-text-formula-only.html
Data VAlidation - Text Length & Character Type
Excel : Data Validation, how to force the user to enter a string that is 2 char long?
Try this:
=AND(ISNUMBER(VALUE(LEFT(A1,2))),MID(A1,3,1)="-",OR(ISNUMBER(FIND(MID(A1,4,1),$C$1)),AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),ISNUMBER(FIND(MID(A1,5,1),$C$1)))),MID(A1,LEN(A1)-5,1)="-",ISNUMBER(VALUE(RIGHT(A1,5))),OR(LEN(A1)=11,LEN(A1)=10),LEN(A1)-LEN(SUBSTITUTE(A1,"-",""))=2,LEN(A1)-LEN(SUBSTITUTE(A1,"+",""))=0,LEN(A1)-LEN(SUBSTITUTE(A1," ",""))=0)
Assuming, you want to validate A1. I inserted the letters in C1.
Edit:
I edited the original function, to be more secure and left out the Isnumber part and rather went digit by digit.
If you want exceed the 255 limit, you have to slice the function up.
I created 5 functions.
=AND(ISNUMBER(FIND(LEFT(A1),$C$2)),ISNUMBER(FIND(MID(A1,2,1),$C$2)))
=MID(A1,3,1)="-"
=IF(LEN(A1)=10,AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),MID(A1,5,1)="-"),IF(LEN(A1)=11,AND(ISNUMBER(FIND(MID(A1,4,1),$C$1)),ISNUMBER(FIND(MID(A1,5,1),$C$1)))))
=IF(LEN(A1)=10,MID(A1,5,1)="-",IF(LEN(A1)=11,MID(A1,6,1)="-"))
=IF(LEN(A1)=10,AND(ISNUMBER(FIND(MID(A1,6,1),$C$2)),ISNUMBER(FIND(MID(A1,7,1),$C$2)),ISNUMBER(FIND(MID(A1,8,1),$C$2)),ISNUMBER(FIND(MID(A1,9,1),$C$2)),ISNUMBER(FIND(MID(A1,10,1),$C$2))),IF(LEN(A1)=11,AND(ISNUMBER(FIND(MID(A1,7,1),$C$2)),ISNUMBER(FIND(MID(A1,8,1),$C$2)),ISNUMBER(FIND(MID(A1,9,1),$C$2)),ISNUMBER(FIND(MID(A1,10,1),$C$2)),ISNUMBER(FIND(MID(A1,11,1),$C$2)))))
Set up data validation as on the picture:
In my country any bank account number has a secondary key which named SHEBA, and there is formula to calculate the SHEBA from account number.
For example if my account number is 801-800-125954-1, the SHEBA of this number is IR0008010080000125954001.
As you can see, changing the account number to SHEBA has been done by putting a handful of zero between account number's digits (however It's not always so simple).
So, I want write formula in Excel that can put zero - or any other digit - between our Number of customer accounts.
I mean, write function in which it's input is a number and output is same number plus some another digit between number
If you can split your string after each '-', you can pad each one with the appropriatz number of '0'.
Se "Add leading zeroes/0's to existing Excel values to certain length" (using TEXT)
801 would become "IR"+TEXT("800","000000")
Since you only posted one number and one SHEBA, it is a bit hard to establish a pattern, especially if, as you say "It's not always so simple".
You can use one formula to get from your number to your SHEBA
="IR000"&SUBSTITUTE(A1,"-","00")