I have C macros like:
#define MY_MACRO() \
xxxxx xxxxxxx \
xxxxxx xxxxxxx \
xx xxxxxxxxxx xxxxxx x xxxxxxx xxxxxxxx \
xxxxx \
xxxxxx xxxxxxx \
xxxxxx x xxxxxxxxx xx xxxxxxxxx x \
xxxxxxxxxx xxxxxxxxxxxxxx \
x xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx \
xxxxxx x xxxxxx x xxxxxxxx x xxxxxx x xxxxxxxxx \
x \
xxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
xxxxxxxxxxxxxxxxxxxxxxxxx xxx \
xxxxx \
xxxxxx xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx x \
x xx xx xxxxxxx
And I'm trying to pad them to:
#define MY_MACRO() \
xxxxx xxxxxxx \
xxxxxx xxxxxxx \
xx xxxxxxxxxx xxxxxx x xxxxxxx xxxxxxxx \
xxxxx \
xxxxxx xxxxxxx \
xxxxxx x xxxxxxxxx xx xxxxxxxxx x \
xxxxxxxxxx xxxxxxxxxxxxxx \
x xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx \
xxxxxx x xxxxxx x xxxxxxxx x xxxxxx x xxxxxxxxx \
x \
xxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
xxxxxxxxxxxxxxxxxxxxxxxxx xxx \
xxxxx \
xxxxxx xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx x \
x xx xx xxxxxxx
The lines should be 80 chars total.
'<,'>s/\(.*[^\s]\)\s*\\\s*$/\=printf('%-79s\', submatch(1))
mostly does it, except for lines such as:
xxxxxx xxxxxxx \
that exceed the 80 char limit to star with.
What am I doing wrong?
The printf() width specifier only adds padding to increase the string. If it is already larger than the width, it does not cut off anything.
Your regular expression attempts to limit the matching by excluding trailing whitespace from the match group. Unfortunately, this does not work: [^\s]. You cannot use atoms like \w inside a collection. Either use the corresponding character class ([^[:white:]]), or, if available, the negated atom: \S. With this fix, your substitution works just fine:
'<,'>s/\(.*\S\)\s*\\\s*$/\=printf('%-79s\', submatch(1))
Related
Spark version=Apache Spark 3.3.0
I'm using spark structured streaming to read and process comma-delimited CSV files. However, some records automatically get dropped in the target table, I'm assuming this is happening due to a column value having lots of character in it.
Edited: Read DF is reading data correctly, but at the time of writing the DF, a few rows get removed
The following comma-delimited row is a row that gets dropped automatically and there is more like this.
a500o0000008bugAAA,FALSE,KMI000004704,Key Medical Insight,0050o00000WuoSBAAZ,2020-04-02T10:17:02.000Z,0019000000R3GVDAA3,4/2/2020,"<XXXX XXXXX=XXXXX-XXXXX: XXX-XXXX;>XXXXXXXXXX XXXX XXXXXXX XXXX XX XXXXXXXXXXX XXXXXXXX XXXXXX XXXXX-XX. XXXXXX XXX XXX XXXXXXXXX XXX XXXXX XXXXXXXXXX XXXX XXXXXXX XXXXXXXXXXX XXX XXXX XXXXXXXX XX XXXXXXXXXX XXXXX XXXXXXXXXXXX XXXXX XX XXXXX XXXXX. XXXXXXXX XXXX XXXX XX XXXXX XXXXX XXXXXXXXXX XXXXXX XXXXX-XX XXXXX XXXXXXXX XXXXXX XXXXXXXXXX XXX XXXXXXX XXXX XXXXXXX XXXXXX XX XXXX XXXXXX XXXXXXXXXX XXX XXXXXXX XXXXX XXX. </XXXX><XXX><XXXX XXXXX=XXXXX-XXXXX: XXX-XXXX;><XX></XXXX></XXX><XXX><XXXX XXXXX=XXXXX-XXXXX: XXX-XXXX;>XXXXXXXXX XXXX XXXXXXX&#XX;X XX.X XXXXX XXXXXXXXX (XXX-XXXXXXXX) XXXXX XXX XXXX XXXXXXXX XXXX X XXXXXXXX - XXXX XXXXXXXX XX XXXXXX XXX (XXXXXX XXXX XXXXX XXX XXXXX) XX XXXX-XX-XXXX XXXXX XXX XXXXXXX XXXXXXX XXXXXXXXX XXX XXXXXXXXXX (XXXX XXX XXXXXXXX XXXXX).
XXXXXXX XXXXXXXXX XXXX XXXXXXXX XX XXXXXX XXX XXXXXX XXXX XXXX XX XXXXX XXXXXXX X.X. &XXXX;XXXXXXXX XXXXX XXXXXXXX XXXXX XXXXXXXXXX XXXX XXXX XXXX XXXXXXX XXXXXXXXX&XXXX;. </XXXX><XXX><XXXX XXXXX=XXXXX-XXXXX: XXX-XXXX;><XX></XXXX></XXX><XXX><XXXX XXXXX=XXXXX-XXXXX: XXX-XXXX;>XXXXX XXXXXXXXXX: XXXXXXX XXXXXXX XX XX XXXXX XXXXXXXXXXX XXXXX XXXXXXXXXX XX XXXX XXXXXXXXXX XX XXXXXXXXXX XXXXXXXX XXXXX XXX XXXXXXX XX XXXXXXXXX XXXXXXXXXX XXXXXXXXXXXX XX XXXX XXXXXXXXXX XXXXX XX XXXXXXXX XXXXXXXX XXXX XXX XX X XXXXX XXXX XXXX XXX XXXXXX XXXXXXXXXX XX XXXXXXXXXX XXXXX XXXXXXXXX. XX XXXXXXXX XXXX XXXXXXXX XXXXXXXXXXXX XXXXXXXXX XXX XXXXX XXXXXXXXX XXXXXX XXXXX-XX XXXXXX XXXXXXX XX XX XXXXX XXXX XXXXXXXXX XXXXXXXX XXX XXXXXXXXXX/XXXXXXX - XXXXXX XX XXX.</XXXX></XXX></XXX>",a040o00002QZ26mAAD,Submitted_vod,Local Data/ Fact/Observation,Key Opinion Leader,Hematology
note: The field with the "xxxxx" values is a single field having lots of spaces and characters etc.
I read the CSV file through the following code
def read_stream(container_read_path, file_format, delimeter, spark, header):
spark.conf.set("spark.sql.streaming.schemaInference", True)
source_data = (
spark.readStream.format(file_format)
.option("header", header)
.option("sep", delimeter)
.option("escape", "\"")
.option("multiline", True)
.option("recursiveFileLookup", "true")
.load(f"{container_read_path}")
)
return source_data
The following code use to save the structured streaming data
def write_stream(dataframe, database_name, table_name, checkpoint_path, partition_cols,
header, file_format='parquet'):
(dataframe
.writeStream
.format(file_format)
.trigger(once=True)
.option("checkpointLocation", f'{checkpoint_path}')
.foreachBatch(lambda df, epochId: write_raw_file(df, epochId, database_name,
table_name, partition_cols, header, file_format))
.start()
)
def write_raw_file(df, epochId, database_name, table_name, partition_cols, header,
file_format):
file_format = 'csv' if file_format == 'text' else file_format
header = "true" if file_format == 'csv' else "false"
(df.write
.mode("append")
.option("header", header)
.partitionBy(partition_cols)
.format(file_format)
.saveAsTable(f"{database_name}.{table_name}")
)
I resolved the issue by adding "option(multiLine, True)" to the write df.
def write_raw_file(df, epochId, database_name, table_name, partition_cols,
header, file_format):
file_format = 'csv' if file_format == 'text' else file_format
header = "true" if file_format == 'csv' else "false"
(df.write
.mode("append")
.option("header", header)
.option("multiline", True)
.partitionBy(partition_cols)
.format(file_format)
.saveAsTable(f"{database_name}.{table_name}")
)
What happing is that the column with the value "xxxx" has lots of spaces characters and multiple lines, spark ignores the rows which span multiple lines. I'm required to add "option(multiLine, True)" to both read df and write df to get the desired result.
There is missing information in the documentation of spark read CSV which is
significant to highlight that "option(multiLine, True)" is can be used in write df as well.
I got this
______0______
/ \
__7__ __3__
/ \ / \
0 4 9 8
/ \ / \ / \ / \
7 7 0 4 6 0 3 2
______another______
/ \
__block__ __just__
/ \ / \
aside the first one
I'd like that
______0______ ______another______
/ \ / \
__7__ __3__ __block__ __just__
/ \ / \ / \ / \
0 4 9 8 aside the first one
/ \ / \ / \ / \
7 7 0 4 6 0 3 2
Is there kind of "mutliline block" copy/cut and paste in vim ?
There is no magical way to achieve that but it is doable with :help visual-block and some planning.
First, the naive approach, with visual-block mode:
Put the cursor on the first column of the line containing another.
Press <C-v> to enter visual-block mode, then jjjj to expand the block downward, and $ to expand it to the end of each line.
Cut it with d.
Move the cursor to the end of the line containing 0 and press p to put what you just cut.
The horror:
______0______ ______another______
/ / \ \
__7__ _ __block__ __just__ _3__
/ \ / / \ / \ \
0 4 9 aside the first one 8
/ \ / \ / \ / \
7 7 0 4 6 0 3 2
The problem is that putting "block text" (by lack of a better word) is done "in place", without adding padding or assuming anything about the user's intent.
In order to put that "block text" at the right position, you need to add some padding yourself:
Put the cursor on the first column of the line containing another.
Press <C-v> to enter visual-block mode, then jjjj to expand the block downward, and $ to expand it to the end of each line.
Cut it with d.
Move the cursor to the end of the line containing 0, append as many spaces as necessary with A<Space><Space><Space>, and press p to put what you just cut.
Much better:
______0______ ______another______
/ \ / \
__7__ __3__ __block__ __just__
/ \ / \ / \ / \
0 4 9 8 aside the first one
/ \ / \ / \ / \
7 7 0 4 6 0 3 2
NOTE: :help 'virtualedit' can be of use, too, but it can necessitate special care so I prefer the simplicity of adding padding manually.
I would like to write a script to check the log , if any line in the log have the string in include_list.txt but do not have the string in exclude_list.txt , then send alert mail to administrator , as below example , the line 2 have the string "string 4" ( which is in include_list.txt ) but do not have anything in exclude_list.txt , then display line 1 only in alert mail .
Would advise how to write this script ? very thanks .
vi exclude_list.txt
string 1
string 2
string 3
vi include_list.txt
string 4
string 5
string 6
For example
xxx string 4 xxxstring 2
xxx string 4 xxxxxxxxxx
xxx xxxxxxx xxxstring 3
You can use grep piped with another grep for this:
grep -iFf includes file.log | grep -iFf excludes
xxx string 4 xxxstring 2
If you want to match 2nd line that doesn't have corresponding entry in excludes then use grep -v after pipe:
grep -iFf includes file.log | grep -ivFf excludes
xxx string 4 xxxxxxxxxx
For example:
I have a text as following:
Jul 11 xxxx xxxx start xxxxx
....
....
....
Jul 11 xxxx xxxx stop xxxxx
Jul 11 xxxx xxxx start xxxxx
....
....
....
Jul 11 xxxx xxxx stop xxxxx
....
now I want to split the above text file into different files based on "start" and "stop", like
/***text1.txt******/
Jul 11 xxxx xxxx start xxxxx
....
....
....
Jul 11 xxxx xxxx stop xxxxx
/***text2.txt******/
Jul 11 xxxx xxxx start xxxxx
....
....
....
Jul 11 xxxx xxxx stop xxxxx
How can I do that? Thanks.
This can make it:
$ awk '{if ($0 ~ /start/) a++} {print >> "file"a}' file
Explanation
{if ($0 ~ /start/) a++} looks for lines containing the word start. If so, increments the variable a, which is 0 by default.
{print >> "file"}' prints $0 (that is, the whole line) to a file called "file" in the same directory.
{print >> "file"a} prints the line to a file called "file" + variable a, which happens to be 0, 1, 2... So it prints to file1, file2...
Test
$ cat a
Jul 11 xxxx xxxx start xxxxx
....
....
....
Jul 11 xxxx xxxx stop xxxxx
Jul 11 xxxx xxxx start xxxxx
here begins 2nd file
....
....
....
Jul 11 xxxx xxxx stop xxxxx
$ awk '{if ($0 ~ /start/) {a++}} {print >> "file"a}' a
$ cat file1
Jul 11 xxxx xxxx start xxxxx
....
....
....
Jul 11 xxxx xxxx stop xxxxx
$ cat file2
Jul 11 xxxx xxxx start xxxxx
here begins 2nd file
....
....
....
Jul 11 xxxx xxxx stop xxxxx
With awk, 'if ($0 ~ )' is implicit.
awk 'BEGIN{a=0;b=0} /start/{a++;b=1} (b==1){print >> "file"a} /stop/{b=0}' input_file.txt
I have been searching everywhere for an online tool doing a simple but crucial thing:
Truncating each line in a document after x number of characters (output: each line having a maximum length of xx characters).
I would be very happy if someone could give me the URL for such tool!!
Example:
Document has 3 lines
It should truncate each line after 20 characters
INPUT:
Xxx xxx xxxxxx. Xxxxxx xxxxx, xxx xxx xxxx.
Xx xxx xxx xxxx xxxxxx xx.
Xxxxxxx.
OUTPUT:
Xxx xxx xxxxxx. Xxxx
Xx xxx xxx xxxx xxxx
Xxxxxxx.
What I want to find is: 1) an online tool where I paste the original text in one box, 2) enter the desired maximimum number of characters per line, 3) click on a button, 4) find the result in another box/or the same box.
Thanks for any help!
In OS X Terminal:
If you have the input data in a file at /path/to/file.txt:
$ cut -c 1-20 /path/to/file.txt
Xxx xxx xxxxxx. Xxxx
Xx xxx xxx xxxx xxxx
Xxxxxxx.
If you wish to enter the data interactively:
$ cut -c 1-20 <<EOF
> Xxx xxx xxxxxx. Xxxxxx xxxxx, xxx xxx xxxx.
> Xx xxx xxx xxxx xxxxxx xx.
> Xxxxxxx.
> EOF
Xxx xxx xxxxxx. Xxxx
Xx xxx xxx xxxx xxxx
Xxxxxxx.