xgettext - How to extract strings split by nulls - string

There's an issue in extracting strings (with xgettext) from gst-plugins-base where a string has null delimiters -
static const gchar genres[] =
"Blues\000Classic Rock\000Country\000Dance\000Disco\000Funk\000Grunge\000"
"Hip-Hop\000Jazz\000Metal\000New Age\000Oldies\000Other\000Pop\000R&B\000"
"Rap\000Reggae\000Rock\000Techno\000Industrial\000Alternative\000Ska\000"
"Death Metal\000Pranks\000Soundtrack\000Euro-Techno\000Ambient\000Trip-Hop"
"\000Vocal\000Jazz+Funk\000Fusion\000Trance\000Classical\000Instrumental\000"
"Acid\000House\000Game\000Sound Clip\000Gospel\000Noise\000Alternative Rock"
"\000Bass\000Soul\000Punk\000Space\000Meditative\000Instrumental Pop\000"
"Instrumental Rock\000Ethnic\000Gothic\000Darkwave\000Techno-Industrial\000"
"Electronic\000Pop-Folk\000Eurodance\000Dream\000Southern Rock\000Comedy"
"\000Cult\000Gangsta\000Top 40\000Christian Rap\000Pop/Funk\000Jungle\000"
"Native American\000Cabaret\000New Wave\000Psychedelic\000Rave\000Showtunes"
"\000Trailer\000Lo-Fi\000Tribal\000Acid Punk\000Acid Jazz\000Polka\000"
"Retro\000Musical\000Rock & Roll\000Hard Rock\000Folk\000Folk/Rock\000"
"National Folk\000Swing\000Bebob\000Latin\000Revival\000Celtic\000Bluegrass"
"\000Avantgarde\000Gothic Rock\000Progressive Rock\000Psychedelic Rock\000"
"Symphonic Rock\000Slow Rock\000Big Band\000Chorus\000Easy Listening\000"
"Acoustic\000Humour\000Speech\000Chanson\000Opera\000Chamber Music\000"
"Sonata\000Symphony\000Booty Bass\000Primus\000Porn Groove\000Satire\000"
"Slow Jam\000Club\000Tango\000Samba\000Folklore\000Ballad\000Power Ballad\000"
"Rhythmic Soul\000Freestyle\000Duet\000Punk Rock\000Drum Solo\000A Capella"
"\000Euro-House\000Dance Hall\000Goa\000Drum & Bass\000Club-House\000"
"Hardcore\000Terror\000Indie\000BritPop\000Negerpunk\000Polsk Punk\000"
"Beat\000Christian Gangsta Rap\000Heavy Metal\000Black Metal\000"
"Crossover\000Contemporary Christian\000Christian Rock\000Merengue\000"
"Salsa\000Thrash Metal\000Anime\000Jpop\000Synthpop";
I'm using xgettext-0.21 to extract the strings -
xgettext -a --no-wrap ./gst-libs/gst/tag/gstid3tag.c -o -
I'm getting only one of the strings -
#: gst-libs/gst/tag/gstid3tag.c:51
msgid "Blues"
msgstr ""
While I should get also "Classic Rock", "Country", "Dance", etc...
Is there any other way to extract those strings? Maybe some other tool or by using specific flags with the xgettext command?

There is no way to extract this string with xgettext and that is by design. And even if there was a way, there are no tools available to edit po file with entries containing null bytes.
The solution is to assemble the string with the null bytes at runtime or compile time. The latter would require a helper script that generates the source file containing the genre list.
An example in Perl:
#! /usr/bin/env perl
use strict;
# Stub gettext that just returns the argument.
sub gettext {
shift;
}
my $genres = join '\\000', (
gettext('Blues'),
gettext('Classic Rock'),
gettext('Country'),
gettext('Dance'),
);
print <<EOF;
static const gchar genres[] = "$genres";
EOF
Running the script will produce the required snippet in C. And feeding it as an additional source file to xgettext will add all genres to your po file:
$ xgettext --omit-header -o - genres.pl
#: genres.pl:11
msgid "Blues"
msgstr ""
#: genres.pl:12
msgid "Classic Rock"
msgstr ""
#: genres.pl:13
msgid "Country"
msgstr ""
#: genres.pl:14
msgid "Dance"
msgstr ""
You can do that, of course, in every other language that xgettext supports, not just in Perl. Pick the one that is easiest to integrate into your build system.
Just using a different delimiter (for example "Blues:Classic Rock:...") not only has escaping issues but would also result in a po file that is awkward to translate.

Related

How to use python to split lines of a txt file into a list with variables

This is the example of two lines from the sample.txt file
2021-06-12 16:40:49,225 INFO:URL: http://localhost:8000/page
2021-06-14 16:56:46,488 INFO:URL: http://localhost:8000/gpage
Result for each line:
['2021-06-14','16:56:46','488','INFO','URL','http://localhost:8000/gpage']
How can we get this result without using regular expressions?
first of all replace text "INFO:URL:" with "INFO URL " then replace "," with space and split text by space
a="2021-06-14 16:56:46,488 INFO:URL: http://localhost:8000/gpage"
aa=a.replace("INFO:URL:","INFO URL ").replace(","," ").split()
print(aa)
it gives
['2021-06-14', '16:56:46', '488', 'INFO', 'URL', 'http://localhost:8000/gpage']

Minimal self-compiling to .pdf Rmarkdown file

I need to compose a simple rmarkdown file, with text, code and the results of executed code included in a resulting PDF file. I would prefer if the source file is executable and self sifficient, voiding the need for a makefile.
This is the best I have been able to achieve, and it is far from good:
#!/usr/bin/env Rscript
library(knitr)
pandoc('hw_ch4.rmd', format='latex')
# TODO: how to NOT print the above commands to the resulting .pdf?
# TODO: how to avoid putting everyting from here on in ""s?
# TODO: how to avoid mentioning the file name above?
# TODO: how to render special symbols, such as tilde, miu, sigma?
# Unicode character (U+3BC) not set up for use with LaTeX.
# See the inputenc package documentation for explanation.
# nano hw_ch4.rmd && ./hw_ch4.rmd && evince hw_ch4.pdf
"
4E1. In the model definition below, which line is the likelihood?
A: y_i is the likelihood, based on the expectation and deviation.
4M1. For the model definition below, simulate observed heights from the prior (not the posterior).
A:
```{r}
points <- 10
rnorm(points, mean=rnorm(points, 0, 10), sd=runif(points, 0, 10))
```
4M3. Translate the map model formula below into a mathematical model definition.
A:
```{r}
flist <- alist(
y tilda dnorm( mu , sigma ),
miu tilda dnorm( 0 , 10 ),
sigma tilda dunif( 0 , 10 )
)
```
"
Result:
What I eventually came to use is the following header. At first it sounded neat, but later I realized
+ is indeed easy to compile in one step
- this is code duplication
- mixing executable script and presentation data in one file is a security risk.
Code:
#!/usr/bin/env Rscript
#<!---
library(rmarkdown)
argv <- commandArgs(trailingOnly=FALSE)
fname <- sub("--file=", "", argv[grep("--file=", argv)])
render(fname, output_format="pdf_document")
quit(status=0)
#-->
---
title:
author:
date: "compiled on: `r Sys.time()`"
---
The quit() line is supposed to guarantee that the rest of the file is treated as data. The <!--- and --> comments are to render the executable code as comments in the data interpretation. They are, in turn, hidden by the #s from the shell.

How can I save input to a file?

I use gforth running on linux boxes.
For one of my mini-applications I want to register a formatted text output from a few different user inputs.
Here is the INPUT$ I use:
: INPUT$
pad swap accept pad swap ;
I think this is correct. I tested it this way:
cr ." enter something : " 4 INPUT$ CR
enter something : toto
ok
cr ." enter something : " 8 INPUT$ CR
enter something : titi
ok
.S <4> 140296186274576 4 140296186274576 4 ok
My file definition:
256 Constant max-line
Create line-buffer max-line 2 + allot
//prepare file for Write permissions :
s" foo.out" w/o create-file throw Value fd-out
: close-output ( -- ) fd-out close-file throw ;
The end goal is to build very small files as:
data1;data2;data3
data4;data5;data6
where each data is the user input (asked 3times to insert text & a second wave of 3 inputs)
I did not find documentation about how I can use text inputs to build my file.
How can I call my stack data to copy them to the text file format? (using type will only echo texts to my terminal)
I think you are looking for the Forth write-file and write-line words, which are documented here: https://www.complang.tuwien.ac.at/forth/gforth/Docs-html/General-files.html
write-file ( c-addr u fileid -– ior )
write-line ( c-addr u fileid –- ior )
Pass the address and length of your text buffer, and the file ID (fd-out in your example) to write text to the file. The ior result will be zero on success.

Lua string.gsub with Multiple Patterns

I am working on renaming the Movie titles that has unwanted letters. The string.gsub can replace a string with "" nil value but I have around 200 string patterns that need to be replaces with "".
Right now I have to string.gsub for every pattern. I was thinking is there is a way to put all the string patterns in to single string.gsub line. I have searched around the web for the solution but still didn't got anything.
The movie title is like this B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT
and I want to remove the extra characters like 2013, Hindi, 720p, DvDRip, CROPPED, AAC, x264, RickyKT.
You can pass to string.gsub a table as the third argument like this:
local movie = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"
movie = movie:gsub("%S+", {["2013"] = "", ["Hindi"] = "", ["720p"] = "",
["DvDRip"] = "", ["CROPPED"] = "", ["AAC"] = "",
["x264"] = "", ["RickyKT"] = ""})
print(movie)
Put all of the patterns in a table and then enumerate the table, calling string.gsub() for each pattern:
str = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"
patterns = {"pattern1", "pattern2", "pattern3"}
for i,v in ipairs(patterns) do
str = string.gsub(str, v, "")
end
This will require many invocations of string.gsub(), but the code should be much more maintainable than having a lot of string.gsub() calls.
To avoid to write keys and values on a table for every new entry, i'd write a function to handle a numerically indexed table (the patterns being the values).
This way I dont need to write {["pattern_n"] = ""} for every new pattern.
Ex:
PATTERNS = {"2013", "Hindi", "720p", "DvDRip", "CROPPED", "AAC", "x264", "RickyKT"}
function replace(match)
local ret = nil
for i, v in ipairs(PATTERNS) do
if v:find(match) then
ret = ""
end
end
return ret
end
local movie = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"
movie = movie:gsub("%S+", replace)
print(movie)
You could do it in a simple function, that way you do not need to write the code each time per string, or just put string.gsub, and the replacement value for the string you need
Function:
local large_name = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"
function clean_name(str)
local v = string.gsub(str, "(.-)%s([%(%[']?%d%d%d?%d?[%)%]]?)%s*(.*)", "%1")
return v
end
print(clean_name(large_name))
Only string.gsub for value
local large_name = "B.A.Pass 2013 Hindi 720p DvDRip CROPPED AAC x264 RickyKT"
local clean_name = string.gsub(large_name, "(.-)%s([%(%[']?%d%d%d?%d?[%)%]]?)%s*(.*)", "%1")
print(clean_name)
The replacement pattern places the first value (name of the movie) separated by a space and prints it, also identifies the year as the second value, to avoid error in the titles, so it is not necessary to place all the values ​​that can exist within the name of the movie and will avoid many false positives
I add a testing function to test different movie names
local testing = {"Whiplash 2014 [1080p]",
"Anon (2018) [WEBRip] [1080p] [YTS.AM]",
"Maze Runner The Death Cure 2018 [WEBRip] [1080p] [YTS.AM]",
"12 Strong [2018] [WEBRip] [1080p] [YTS.AM]",
"Kingsman The Secret Service (2014) [1080p]",
"The Equalizer [2014] [1080p]",
"Annihilation 2018 [WEBRip] [1080p] [YTS.AM]",
"The Shawshank Redemption '94",
"Assassin's Creed 2016 HC 720p HDRip 850 MB - iExTV",
"Captain Marvel (2019) [WEBRip] [1080p] [YTS.AM]",}
for k,v in pairs(testing) do
local result = string.gsub(v, "(.-)%s([%(%[']?%d%d%d?%d?[%)%]]?)%s*(.*)", "%1")
print(result)
end
Output:
Whiplash
Anon
Maze Runner The Death Cure
12 Strong
Kingsman The Secret Service
The Equalizer
Annihilation
The Shawshank Redemption
Assassin's Creed
Captain Marvel

Search in directory of files based on keywords from another file

Perl Newbie here and looking for some help.
I have a directory of files and a "keywords" file which has the attributes to search for and the attribute type.
For example:
Keywords.txt
Attribute1 boolean
Attribute2 boolean
Attribute3 search_and_extract
Attribute4 chunk
For each file in the directory, I have to:
lookup the keywords.txt
search based on Attribute type
something like the below.
IF attribute_type = boolean THEN
search for attribute;
set found = Y if attribute found;
ELSIF attribute_type = search_and_extract THEN
extract string where attribute is Found
ELSIF attribute_type = chunk THEN
extract the complete chunk of paragraph where attribute is found.
This is what I have so far and I'm sure there is a more efficient way to do this.
I'm hoping someone can guide me in the right direction to do the above.
Thanks & regards,
SiMa
# Reads attributes from config file
# First set boolean attributes. IF keyword is found in text,
# variable flag is set to Y else N
# End Code: For each text file in directory loop.
# Run the below for each document.
use strict;
use warnings;
# open Doc
open(DOC_FILE,'Final_CLP.txt');
while(<DOC_FILE>) {
chomp;
# open the file
open(FILE,'attribute_config.txt');
while (<FILE>) {
chomp;
($attribute,$attribute_type) = split("\t");
$is_boolean = ($attribute_type eq "boolean") ? "N" : "Y";
# For each boolean attribute, check if the keyword exists
# in the file and return Y or N
if ($is_boolean eq "Y") {
print "Yes\n";
# search for keyword in doc and assign values
}
print "Attribute: $attribute\n";
print "Attribute_Type: $attribute_type\n";
print "is_boolean: $is_boolean\n";
print "-----------\n";
}
close(FILE);
}
close(DOC_FILE);
exit;
It is a good idea to start your specs/question with a story ("I have a ..."). But
such a story - whether true or made up, because you can't disclose the truth -
should give
a vivid picture of the situation/problem/task
the reason(s) why all the work must be done
definitions for uncommon(ly used)terms
So I'd start with: I'm working in a prison and have to scan the emails
of the inmates for
names (like "Al Capone") mentioned anywhere in the text; the director
wants to read those mails in toto
order lines (like "weapon: AK 4711 quantity: 14"); the ordnance
officer wants those info to calculate the amount of ammunition and
rack space needed
paragraphs containing 'family'-keywords like "wife", "child", ...;
the parson wants to prepare her sermons efficiently
Taken for itself, each of the terms "keyword" (~running text) and
"attribute" (~structured text) of may be 'clear', but if both are applied
to "the X I have to search for", things get mushy. Instead of general ("chunk")
and technical ("string") terms, you should use 'real-world' (line) and
specific (paragraph) words. Samples of your input:
From: Robin Hood
To: Scarface
Hi Scarface,
tell Al Capone to send a car to the prison gate on sunday.
For the riot we need:
weapon: AK 4711 quantity: 14
knife: Bowie quantity: 8
Tell my wife in Folsom to send some money to my son in
Alcatraz.
Regards
Robin
and your expected output:
--- Robin.txt ----
keywords:
Al Capone: Yes
Billy the Kid: No
Scarface: Yes
order lines:
knife:
knife: Bowie quantity: 8
machine gun:
stinger rocket:
weapon:
weapon: AK 4711 quantity: 14
social relations paragaphs:
Tell my wife in Folsom to send some money to my son in
Alcatraz.
Pseudo code should begin at the top level. If you start with
for each file in folder
load search list
process current file('s content) using search list
it's obvious that
load search list
for each file in folder
process current file using search list
would be much better.
Based on this story, examples, and top level plan, I would try to come
up with proof of concept code for a simplified version of the "process
current file('s content) using search list" task:
given file/text to search in and list of keywords/attributes
print file name
print "keywords:"
for each boolean item
print boolean item text
if found anywhere in whole text
print "Yes"
else
print "No"
print "order line:"
for each line item
print line item text
if found anywhere in whole text
print whole line
print "social relations paragaphs:"
for each paragraph
for each social relation item
if found
print paragraph
no need to check for other items
first implementation attempt:
use Modern::Perl;
#use English qw(-no_match_vars);
use English;
exit step_00();
sub step_00 {
# given file/text to search in
my $whole_text = <<"EOT";
From: Robin Hood
To: Scarface
Hi Scarface,
tell Al Capone to send a car to the prison gate on sunday.
For the riot we need:
weapon: AK 4711 quantity: 14
knife: Bowie quantity: 8
Tell my wife in Folsom to send some money to my son in
Alcatraz.
Regards
Robin
EOT
# print file name
say "--- Robin.txt ---";
# print "keywords:"
say "keywords:";
# for each boolean item
for my $bi ("Al Capone", "Billy the Kid", "Scarface") {
# print boolean item text
printf " %s: ", $bi;
# if found anywhere in whole text
if ($whole_text =~ /$bi/) {
# print "Yes"
say "Yes";
# else
} else {
# print "No"
say "No";
}
}
# print "order line:"
say "order lines:";
# for each line item
for my $li ("knife", "machine gun", "stinger rocket", "weapon") {
# print line item text
# if found anywhere in whole text
if ($whole_text =~ /^$li.*$/m) {
# print whole line
say " ", $MATCH;
}
}
# print "social relations paragaphs:"
say "social relations paragaphs:";
# for each paragraph
for my $para (split /\n\n/, $whole_text) {
# for each social relation item
for my $sr ("wife", "son", "husband") {
# if found
if ($para =~ /$sr/) {
## if ($para =~ /\b$sr\b/) {
# print paragraph
say $para;
# no need to check for other items
last;
}
}
}
return 0;
}
output:
perl 16953439.pl
--- Robin.txt ---
keywords:
Al Capone: Yes
Billy the Kid: No
Scarface: Yes
order lines:
knife: Bowie quantity: 8
weapon: AK 4711 quantity: 14
social relations paragaphs:
tell Al Capone to send a car to the prison gate on sunday.
Tell my wife in Folsom to send some money to my son in
Alcatraz.
Such (premature) code helps you to
clarify your specs (Should not-found keywords go into the output?
Is your search list really flat or should it be structured/grouped?)
check your assumptions about how to do things (Should the order line
search be done on the array of lines of thw whole text?)
identify topics for further research/rtfm (eg. regex (prison!))
plan your next steps (folder loop, read input file)
(in addition, people in the know will point out all my bad practices,
so you can avoid them from the start)
Good luck!

Resources