Merge two files in linux and ignore any repetition - linux

Can anyone provide me with a shell script in linux that merges two files and saves it in a third file. However I want that if there is any common data in both the files then the common lines should only be saved once. Please ask if you need any more details. Thanks in advance!!

Simplest way:
cat one two | sort -u > third
But this is probably not what you want...
You mentioned merging in your question: what do you mean with that? If it's not that simple as I assumed in my code above, provide sample files and tell us what you want to achieve.

Related

Identifying matching lines across multiple files in Linux

I'm working on a script to help me identify matching lines across multiple files. I've had luck with the comm command:
comm -12 <(sort file1.txt) <(sort file2.txt)
This works and can be daisy-chained to expand to X number of files. My problem is that anything more than two files starts becoming very unreadable. I'm wondering if there is a sleeker option available or if I should dig in and write a script to do the work for me. It'd be a simple loop and maybe a prompt to collect file names.
I have a working solution just trying to see if there is a better way that doesn't involve reinventing the wheel. Thoughts appreciated.

Merge, sort, maintain line order

This probably sounds contradicting. So let me explain. I have a number of log files that use log4j to write to different files and rotate. What I want to do is merge them into fewer files.
How I started to go about doing this:
- use awk to concat multi-line entries into one line into a separate file.
- cat awk output files to 1 file.
- sort the cat file
- awk to separate the concatenated lines.
But I see that the sort is putting entries with the same second/ms in a different order than they appeared in their original output file. It may not be a HUGE deal. But, I don't like it. Any ideas for how I go about doing what I want (maintaining their original line order while sorting)? I would rather not write my own program and would like to use native linux utils if possible. But, I am open to the "best" way of doing this (Perl, Python, etc..).
I thought about cat'ing the output files from highest to lowest (log4j rotate files) so that I wouldn't have to sort. But that only solves the problem for files writing to the same log file (file1.0.log, file1.1.log, etc..). But this doesn't help when needing to merge file2 with file1.
Thank you,
Gregg
What you are talking about is "stable" sorting. There is a -s option on sort that should give you what you want.
Stability in sorting algorithms

Ensure Linux (suse) Programs Level across multiple servers with cksum

We have a GOLD image new servers are imaged from new ones are created.
Over time some of these have become out of sync due to poorly managed rollouts.
I would like to scan all of these servers bin folders and compare to what the GOLD image has into a output file. (IE: if different flag one way. If same say Same, if missing say Missing, if there but not on gold. Addition?)
I was going to accomplish via like below.
on the Gold Image run following example.
for x in `ls /bin/`
do
cksum $x >> /data/OnGold.lst
done
Distribute this file to all of servers along with another script that will execute the same thing with a different log name.
after the script executes another script will Diff the two files and report on the differences based off of the cksum or if files are missing or in addition to the OnGold.lst
(This is what I could use some advice on the best way to achieve this? Or if anyone knows of some open source tools that could accomplish the same thing? assuming. pretty sure diff would do the trick as it will advise if items were misssing or in addition but I dont know how to format this in a report format.)
Any help would be greatly appreciated?

Picking Certain Documentation with DOXYGEN

I would like to achieve the almost exact opposite of what can be
performed with command \internal. There exists a huge doxygen
documentation for a project already, but now I would like to pick out
a few blocks (functions, constants etc.) to create a very small manual
only containing the important stuff.
Instead of marking 99% of the comments as \internal it would be nice
to have a command like \external for the 1% of comments that need to
be exported in my case.
Something like disabling the "default section" (everything, which is
not part of a section) would work too, of course. Then I could use
ENABLED_SECTIONS...
Unfortunately the comments in question do not reside in one file only.
Furthermore those files contain a lot of other comments, which should
not be exported.
I already thought to move those comments into separate header files
that could be included in the original position, but this would mean
to restructure a lot and tearing files apart.
Does anybody have an idea how to solve my problem?
Thanks in advance,
Nico
I think ENABLED_SECTIONS is the way forward, but there's a couple of things that might reduce the workload.
The first is to create a separate doxyfile for your particular requirement, then you can customise that without upsetting any master one.
In that new doxyfile explicitly list, in the INPUT file list, only those files that contain content that you need. Chances are that it's currently set to pull in whole folder trees - edit that to cherry pick the individual files; not forgetting files that you may need to define the 'structure' of the document.
After that use ENABLED_SECTIONS with corresponding #if <SECTION_NAME> #endif markers to refine the selection to units smaller than a file.

Something like .htaccess in Linux

I have a directory with lot of files (above 4.000.000 files). All filenames has this same pattern:
PREFIX-XXXXXX-YY.ext
where
XXXXXX contains letters and digits
YY contains digits
ext is a extension of file (.txt, .jpg)
File structure have 12MB, so listing/searching of this directory takes long time. I divided all content of this directory to subdirectories, depends of filename, precisiously first letter of XXXXXX from pattern above.
ie.
main_directory/A/PREFIX-AXXXXX-YY.ext
main_directory/B/PREFIX-BXXXXX-YY.ext
main_directory/1/PREFIX-1XXXXX-YY.ext
Is in Linux easy way to make a rule, when I type in linux command for example
test:/home/usr/admin # ls main_directory/PREFIX-AXXXXX-*
I will get a list of filenames from main_directory/A/ directory? This rule MUST work only for main_directory.
You can't have this at file-system layer, not without creating links and circling back to your original problem. I can think of two easy ways out.
Take 1: scripting
You could write a short script to rewrite the names for you.
Suppose you had a rewrite script that took PREFIX-AXXXX-* and outputted main_directory/A/PREFIX-AXXXX-*. You could then change your ls line to:
$ ls `rewrite PREFIX-AXXXXX-*`
This can be easily accomplished with sed, awk or any other on-the-fly text transformation tool.
Shell programs are composable for a reason! :)
Take 2: embed a faster file-system
You could do away with the restructuring and rewriting names by using a faster file-system, mounted in your main directory. XFS sounds good for this. It should remove your performance concerns without further ado.
This requires a deeper understanding of what's going on to be effective for day-to-day usage, however.
Edit: Here's an article on how to create virtual user-space file-systems.
Edit 2: actually no, I don't think XFS would cut it. Maybe another file-system, though.

Resources