File handling (.conf file) in perl - linux

I need to read this .conf file. This file cannot be read by any module such as Config::Tiny, Config::INI, Config::Simple and etc.
Here is the .conf file, let's say the file name is conference.conf :
[ConferenceId] #section
1000 #value
2000
3000
4000
[RadioExt]
1000=102 #parameter and value
2000=202
3000=302
4000=402
What i want is that the perl read the value only and not the section and print it out to the user. I'm still new at perl. I just learned perl for a week. This type of task make me harder to have any progress in reading, writing, appending on configuration file.
What i also want is that the value in [ConferenceId] declared as global. As the value in the [ConferenceId] changed, the other parameters in [RadioExt] also changed. For example,
[ConferenceId]
1100 #the values has been changed
2100
3100
4100
[RadioExt]
1100=102 #parameters also changed
2100=202
3100=302
4100=402
Can anybody help me with this? I know it is such a big favor. But i really needed this so that i can learn more about reading, writing and appending configuration file. Thanks.

The real answer to this is to use Config::Tiny.
However, since this is an learning exercise assigned by your teacher, I will point you at all of the perlfaq5: How do I change, delete, or insert a line in a file, or append to the beginning of a file?. That should demonstrate all of the standard ways to manipulate a file.
If it wasn't also a module, I'd recommend using the core library Tie::File for this problem, but that's probably not your teachers intent.
So my final recommendation is to take a look at the source for Config::Tiny. It's likely to be outside of your skill set, but ideally you would be able to read this entire file by the end of your course. And this problem does not take a complicated bit of code. Looking at how others have solved problems can be oen of the best ways to learn, especially if you're able to recognize which are the better modules.
Update
Config::Tiny alone will not be able to a parse your file, because it's not a strict ini file format. The fact that one of your sections has raw values without keys won't work with any of the standard modules.
Below is an example of how to parse your file using regular expressions. Probably should be enhanced with additional error checking to make sure key/value pairs aren't mixed with array values, but this should get you started:
use strict;
use warnings;
my %hash;
my $section;
while (<DATA>) {
chomp;
next if /^\s*$/;
# Begin Section
if (/^\s*\[(.*)\]\s*$/) {
$section = $1;
# Hash Key & Value
} elsif (/^(.*?)=(.*)/) {
$hash{$section}{$1} = $2;
# Array
} else {
push #{$hash{$section}}, $_;
}
}
use Data::Dump;
dd \%hash;
__DATA__
[ConferenceId]
1000
2000
3000
4000
[RadioExt]
1000=102
2000=202
3000=302
4000=402
Outputs:
{
ConferenceId => [1000, 2000, 3000, 4000],
RadioExt => { 1000 => 102, 2000 => 202, 3000 => 302, 4000 => 402 },
}

Even though the file extension is .conf you should be able to read it like any other text file.
you can try this
$file = "<yourfilename here>";
open(FH,$file);
while(<FH>)
{
$line = $_;
# here you can write your logic
}

Related

How to slip gather-take in lazy manner into map?

I need to construct following flow:
Accept list of file names
Extract multiple lines from those files
Process those lines
However I have no idea how to properly inject gather-take into map:
sub MAIN ( *#file-names ) {
#file-names.map( { slip parse-file( $_ ) } ).map( { process-line( $_ ) } );
}
sub parse-file ( $file-name ) {
return gather for $file-name.IO.lines -> $line {
take $line if $line ~~ /a/; # dummy example logic
}
}
sub process-line ( $line ) {
say $line; # dummy example logic
}
This code works but leaks memory like crazy. I assume slip makes gather-take eager? Or slip does not mark Seq items as consumed? Is there a way to slip gather-take result into map in lazy manner?
BTW: My intent is to parallelize each step with race later - so for example I have 2 files parsed at the same time producing lines for 10 line processors. Generally speaking I'm trying to figure out easiest way of composing such cascade flows. I've tried Channels to connect each processing step but they have no built-in pushback. If you have any other patterns for such flows then comments are more than welcomed.
EDIT 1:
I think my code is correct, and memory leak is not caused by bad logic but rather by bug in Slip class. I've created issue https://github.com/rakudo/rakudo/issues/5138 that is currently open. I'll post an update once it is resolved.
EDIT 2:
No, my code was not correct :) Check for my post for answer.
I believe that you are mistaken about the cause of the non-laziness in your code – in general, using slip should not typically make code eager. And, indeed, when I run the slightly modified version of your code shown below:
sub MAIN () {
my #file-names = "tmp-file000".."tmp-file009";
spurt $_, ('a'..'z').join("\n") for #file-names;
my $parsed = #file-names.map( { slip parse-file( $_ ) } );
say "Reached line $?LINE";
$parsed.map( { process-line( $_ ) } );
}
sub parse-file ( $file-name ) {
say "processing $file-name...";
gather for $file-name.IO.lines -> $line {
take $line if $line ~~ /a/; # dummy example logic
}
}
sub process-line ( $line ) {
say $line; # dummy example logic
}
I get the output that shows Raku processing the files lazily (note that it does not call parse-file until it needs to pass new values to process-line):
Reached 8
processing tmp-file000...
a
processing tmp-file001...
a
processing tmp-file002...
a
processing tmp-file003...
a
processing tmp-file004...
a
processing tmp-file005...
a
processing tmp-file006...
a
processing tmp-file007...
a
processing tmp-file008...
a
processing tmp-file009...
a
Since I don't have the rest of your code, I'm not sure what is triggering the non-lazy behavior you're observing. In general, if you have code that is being eagerly evaluated when you want it to be lazy, though, the .lazy method and/or the lazy statement prefixes are good tools.
Finally, a couple of minor notes about the code you posted that aren't relevant to your question but that might be helpful:
All Raku functions return their final expression, so the return statement in parse-file isn't necessary (and it's actually slightly slower/non-idiomatic).
A big part of the power of gather/take is that they can cross function boundaries. That is, you can have a parse-file function that takes different lines without needing to have the gather statement inside parse-lines – you just need to call parse-lines within the scope of a gather block. This feels like it might be helpful in solving the problem you're working on, though it's hard to be sure without more info.
First of all - I had big misconception. I thought that all lines produced by parse-file must be slipped into map block like this:
#file-names.map( produce all lines here ).map( process all lines here );
And Slip is a List that tracks all elements. That is why I had big memory leak.
The solution is to create gather-take sequence inside map but consume it outside map:
#file-names.map( { parse-file( $_ ) } ).flat.map( { process-line( $_ ) } );
So now it is:
#file-names.map( construct sequence here ).(get items from sequence here).map( process all lines here );

Perl: Multidimentional arrays and "experimental push" error

I'm a junior perl programmer and have very little experience with multidimentional arrays. I'm not even sure it is the proper data structure for this project.
I have an array of clients that is read in from a file:
my #clientlist = grep(/[A-Z]\w+$/,readdir(DIR));
It produces a list like:
$VAR1 = [
'AA14A',
'BB12R',
'CC34M'
];
Each client has some unknown number of elements read from another file that correspond to the client name like:
__U_AA14A_0001, __U_AA14A_0002, __U_AA14A_0003
__U_BB12R_0001, __U_BB12R_0002, __U_BB12R_0003
When I try to assign the corresponding element to the client name:
my #allclients;
my $header = $string;
my $i = 0; # index in array
foreach my $client (#clientlist) {
push #allclients{$client}{$i}, $header;
$i += 1;
}
it prints:
Useless use of push with no values at ./convert_domains.pl line 97.
Global symbol "%allclients" requires explicit package name (did you forget to declare
"my %allclients"?) at ./convert_domains.pl line 97.
Experimental push on scalar is now forbidden at ./convert_domains.pl line 97, near "}
{"
syntax error at ./convert_domains.pl line 97, near "}{"
I've also tried numerous variations to the push() function, but they all return some variation of the above.
I'm trying to build something like:
AA14A, __U_AA14A_0001, __U_AA14A_0002, __U_AA14A_0003
BB12R, __U_BB12R_0001, __U_BB12R_0002, __U_BB12R_0003
so I can iterate through it and print out the individual elements.
My main questions are how to properly access a multi-dimentional array of strings. I've also read this perldoc but this doesn't seem to work.
https://perldoc.perl.org/perllol
First of all,
my #allclients;
should be
my %allclients;
because you want an associative array (i.e. an array-like structure keyed by strings) and thus a hash.
Also,
push #allclients{$client}{$i}, ...;
should be
$allclients{$client}[$i] = ...;
or
push #{ $allclients{$client} }, ...;
or
push $allclients{$client}->#*, ...;
You want to add to the array referenced by $allclients{$client}, so #{ $allclients{$client} } or $allclients{$client}->#*. See Perl Dereferencing Syntax.
Yes, you never explicitly created any of the multiple arrays and the references to them, but that's not a problem thanks to autovivification.

what's the proper way to allow users to provide a string "mangler" as a regex/proc/expr/

In my Tcl/Tk project, i need to allow my users to mangle a string in a well-defined way.
The idea is, to allow people to declare a "string mangling" proc/expr/function/... in a configuration file, which then gets applied to the strings in question.
I'm a bit worried on how to properly implement that.
Possibilities I have considered so far:
regular expressions
That was my first thought, but there's two caveats:
search/replace with regular expressions in Tcl seems to be awkward. at least with regsub i need to pass the match and replacement parts separately (as opposed to how e.g. sed allows me to pass a single complicated string that does everything for me); there are sed implementations for Tcl, but they look naive and might break rather sooner than later
also regexes can be awkward by themselves; using them to mangle complicated strings is often more complicated than it should be
procs?
Since the target platform is Tcl anyhow, why not use the power of Tcl to do string mangling?
The "function" should have a single input and produce a single output, and ideally it the user should be nudged into doing it right (e.g. not being able to define a proc that requires two arguments) and it be (nigh) impossible to create side-effects (like changing the state of the application).
A simplistic approach would be to use proc mymangler s $body (with $body being the string defined by the user), but there are so many things that can go wrong:
$body assuming a different arg-name (e.g. $x instead of $s)
$body not returning anything
$body changing variables,... in the environment
expressions look more like it (always returning things, not allowing to modify the environment easily), but i cannot make them work on strings, and there's no way to pass a variable without agreeing its name.
So, the best I've come up with so far is:
set userfun {return $s} # user-defined string
proc mymangler s ${userfun}
set output [mymangler $input]
Are there better ways to achieve user-defined string-manglers in Tcl?
You can use apply -- the user provides a 2-element list: the second element is the "proc body", the code that does the mangling; the first element is the variable name to hold the string, this variable is used in the body.
For example:
set userfun {{str} {string reverse $str}}
set input "some string"
set result [apply $userfun $input] ;# => "gnirts emos"
Of course the code you get from the user is any arbitrary Tcl code. You can run it in a safe interpreter:
set userfun {{str} {exec some malicious code; return [string reverse $str]}}
try {
set interp [safe::interpCreate]
set result [$interp eval [list apply $userfun $input]]
puts "mangled string is: $result"
safe::interpDelete $interp
} on error e {
error "Error: $e"
}
results in
Error: invalid command name "exec"
Notes:
a standard Tcl command is used, apply
the user must specify the variable name used in the body.
this scheme does protect the environment:
set userfun {{str} {set ::env(SOME_VAR) "safe slave"; return $str$str}}
set env(SOME_VAR) "main"
puts $env(SOME_VAR)
try {
set interp [safe::interpCreate]
set result [$interp eval [list apply $userfun $input]]
puts "mangled string is: $result"
safe::interpDelete $interp
} on error e {
error "Error: $e"
}
puts $env(SOME_VAR)
outputs
main
mangled string is: some stringsome string
main
if the user does not return a value, then the mangled string is simply the empty string.
The "simplistic" approach is like foreach in that it requires the user to supply a variable name and a script to evaluate that uses that variable, and is a good approach. If you don't want it affecting the rest of the program, run it in a separate interpreter:
set x 0
proc mymangler {name body} {
set i [interp create -safe]
set s "some string to change"
try {
# Build the lambda used by apply here instead of making
# the user do it.
$i eval [list apply [list $name $body] $s]
} on error e {
return $e
} finally {
interp delete $i
}
}
puts [mymangler s { set x 1; string toupper $s }]
puts $x
outputs
SOME STRING TO CHANGE
0
If the person calling this says to use s as a variable and then uses something else in the body, it's on them. Same with providing a script that doesn't return anything.
I'd generally allow the user to specify a command prefix as a Tcl list (most simple command names are trivially suitable for this), which you would then apply to the argument by doing:
set mangled [{*}$commandPrefix $valueToMangle]
This lets people provide pretty much anything they want, especially as they can use apply and a lambda term to mangle things as required. Of course, if you're in a procedure then you're probably actually better off doing:
set mangled [uplevel 1 [list {*}$commandPrefix $valueToMangle]]
so that you're running in the caller's context (change 1 to #0 to use the global context instead) which can help protect your procedure against accidental changes and make using upvar within the mangler easier.
If the source of the mangling prefix is untrusted (what that means depends greatly on your application and deployment) then you can run the mangling code in a separate interpreter:
# Make the safe evaluation context; this is *expensive*
set context [interp create -safe]
# You might want to let them define extra procedures too
# interp invokehidden $context source /the/users/file.tcl
# Use the context
try {
set mangled [interp eval $context [list {*}$commandPrefix $valueToMangle]]
} on error {msg} {
# User supplied something bad; error message in $msg
}
There's various ways to support users specifying the transformation, but if you can expose the fact that you're working with Tcl to them then that's probably easiest and most flexible.

PowerShell on CSV file - looking for string depending on string

I need your help regarding PowerShell programming on CSV file.
I've made some searches but cannot find what I'm looking for (or perhaps I don't know the technical terms). Basically, I have an Excel workbook with large amount of data (more or less 38 columns x 350.000 rows), and there are a couple of formulas that take hours to calculate.
I was first wondering if PowerShell could speed up a bit the calculation compared to Excel. The calculations taking most of my time are in fact not that complex (at least at first glance). My data is more or less constructed like this:
Ref Title
----- --------------------------
A/001 "free_text"
A/002 "free_text A/001 free_text"
... ...
A/005 "free_text A/004 free_text"
A/006 "free_text"
B/001 "free_text"
B/002 "free_text"
C/001 "free_text"
C/002 "free_text"
...
C/050 "free_text C/047 free_text"
... ...
C/103 "free_text"
D/001 "free_text"
D/002 "free_text D/001 free_text"
... ....
Basically the data is as follows:
the Ref field contains unique values, in {letter}/{incremental value} format.
In some rows, the Title field may call up one of the Ref data. For example, in line 2, the Title calls for the A/001 Ref. In the last row, the Title calls for the D/001 Ref, etc.
There is no logic pattern defining when this ref could be called up in a title. This is random.
However, what I'm 100% sure of is the following:
The Ref called in the Title is always belonging to the same {letter} block. For example: the string 'C/047' in the Title field can only be found in the block where the Ref {letter} is C.
The Ref called in the Title will always be located 'after' (or in a lower row) than the Ref it refers to. In other words, I cannot have a line with following pattern:
Ref Title
------------ -----------------------------------------
{letter/i} {free_text {letter/j} free_text} with j<i
→ This is not possible.
→ j is always > i
I've used these characteristics in Excel to minimize my lookup arrays. But it still takes an hour to calculate everything.
I've therefore looked into PowerShell, and started to 'play' a bit with the CSV, and looping with the ForEach-Object hoping I would have quicker results. Up to now I basically ended-up looping twice on my CSV file.
$CSV1 = myfile.csv
$CSV2 = myfile.csv
$CSV1 | ForEach-Object {
# find Title
$TitSearch = $_.$Ref
$CSV2 | ForEach-Object {
if ($_.$Title -eq $TitSearch) {
myinstructions
}
}
}
It works but it's really really really long. So I then tried the following instead of using the $CSV2 | ForEach...:
$CSV | where {$_.$Title -eq $TitleSearch} | % $Ref
In either case, it's too long and not efficient at all. Additionally with these 2 solutions, I'm not using above characteristics which could reduce the lookup array and as already stated, it seems I end up looping twice on the CSV file from its beginning up to the end.
Questions:
Is there a leaner way to do this?
Am I wasting my time with PowerShell?
I though about creating 1 file per Ref {letter} block (1 file for block A, 1 for B, etc...). However I have about 50.000 blocks to create. Or create them one by one, carry out the analysis, put the results in a new file, and delete them. Would that be quicker?
Note: this is for work, to be used by other colleagues, and Excel and PowerShell are really the only softwares we may use. I know VBA but ok... At the end I'm curious about how and if this can be solved in a simple manner using PowerShell.
As far as I can see your base algorithm do N^2 iteration (~120 billion). There is a standard way to make it efficient - you need to build a hashtable first. Hashtable is a key/value storage, and look up is pretty much instantaneous, so algorithm's time complexity will become ~N.
Powershell has built-in data type for that. In your case the key would be ref, and the value an array of cell data (assuming your table is smth like: ref, title, col1, ..., colN)
$hash = #{}
foreach($row in $table} {$hash.Add($row.ref, #($row.title, $row.col1, ...)}
#it will take 350K steps to generate it
#then you can iterate over it again
foreach($key in $hash.Keys) {
$key # access current ref
$rowData = $hash.$key # access to current row elements (by index)
$refRowData = $hash[$rowData[$j]] # lookup from other rows, assuming lookup reference is in some column
}
So it's a general idea how to solve the time issue. To be honest I don't believe you need to recreate a wheel and code it yourself. What you need is a relational database. Since you have excel, you should have MS ACCESS too. Just import your data in there, make ref and title an index, then all you need to do is self join. MS Access suck, but I'm sure it will handle 350K row just fine.
Ideally you'd need to get a database on some corporate MSSQL server (open a ticket, talk to your manger, etc). It will calculate all that in seconds, and then you can link the output to a spreadsheet as well.

How can I import data from text files into Excel?

I have multiple folders. There are multiple txt files inside these folder. I need to extract data (just a single value: value --->554) from a particular type of txt file in this folder.(individual_values.txt)
No 100 Value 555 level match 0.443 top level 0.443 bottom 4343
There will be many folders with same txt file names but diff value. Can all these values be copyed to excel one below the other.
I have to extract a value from a txt file which i mentioned above. Its a same text file with same name located inside different folders. All i want to do is extract this value from all the text file and paste it in excel or txt one below the other in each row.
Eg: The above is a text file here I have to get the value of 555 and similarly from other diff values.
555
666
666
776
Yes.
(you might want to clarify your question )
Your question isn't very clear, I imagine you want to know how this can be done.
You probably need to write a script that traverses the folders, reads the individual files, parses them for the value you want, and generates a Comma Separated Values (CSV) file. CSV files can easily be imported to Excel.
There are two or three basic methods you can use to get stuff into a Excel Spreadsheet.
You can use OLE wrappers to manipulate Excel.
You can write the file in a binary form
You can use Excel's import methods to take delimited text in as a spreadsheet.
I chose the latter way, because 1) it is the simplest, and 2) your problem is so poorly stated as it does not require a more complex way. The solution below outputs a tab-delimited text file that Excel can easily support.
In Perl:
use IO::File;
my #field_names = split m|/|, 'No/Value/level match/top level/bottom';
#' # <-- catch runaway quote
my $input = IO::File->new( '<data.txt' );
die 'Could not open data.txt for input!' unless $input;
my #data_rows;
while ( my $line = <$input> ) {
my %fields = $line =~ /(level match|top level|bottom|Value|No)\s+(\d+\S*)/g;
push #data_rows, \%fields if exists $fields{Value};
}
$input->close();
my $tab_file = IO::File->new( '>data.tab' );
die 'Could not open data.tab for output!' unless $tab_file;
$tab_file->print( join( "\t", #field_names ), "\n" );
foreach my $data_ref ( #data ) {
$tab_file->print( join( "\t", #$data_ref{#field_names} ), "\n" );
}
$tab_file->close();
NOTE: Excel's text processing is really quite neat. Try opening the text below (replacing the \t with actual tabs) -- or even copying and pasting it:
1\t2\t3\t=SUM(A1:C1)
I chose c#, because i thought it would be fun to use a recursive lambda. This will create the csv file containing matches to the regex pattern.
string root_path = #"c:\Temp\test";
string match_filename = "test.txt";
Func<string,string,StringBuilder, StringBuilder> getdata = null;
getdata = (path,filename,content) => {
Directory.GetFiles(path)
.Where(f=>
Path.GetFileName(f)
.Equals(filename,StringComparison.OrdinalIgnoreCase))
.Select(f=>File.ReadAllText(f))
.Select(c=> Regex.Match(c, #"value[\s\t]*(\d+)",
RegexOptions.IgnoreCase))
.Where(m=>m.Success)
.Select(m=>m.Groups[1].Value)
.ToList()
.ForEach(m=>content.AppendLine(m));
Directory.GetDirectories(path)
.ToList()
.ForEach(d=>getdata(d,filename,content));
return content;
};
File.WriteAllText(
Path.Combine(root_path, "data.csv"),
getdata(root_path, match_filename, new StringBuilder()).ToString());
No.
just making sure you have a 50/50 chance of getting the right answer
(assuming it was a question answerable by Yes and No) hehehe
File_not_found
Gotta have all three binary states for the response.

Resources