read the first line of a text file with JQ - linux

Trying to see how I can read the first line of a text file using jq
I have a text file with a bunch of ids (newfile.txt )
5584157003
5584158003
5584159003
5584160003
id like to be able to just read the first line with jq.
I tried doing this
cat newfile.txt | jq '.[0]'
But getting an error of
jq: error (at <stdin>:482): Cannot index number with number
I'd like to be able to read line by line so that I can eventually run a look with that ID and be able to do stuff with it. Any ideas?

Use the -R argument (aka --raw-input) to tell jq that it's receiving input as strings rather than JSON, and use input to read only a single item at a time. Thus:
jq -Rn input <yourfile
...will output:
"5584157003"
If you want to convert it to a number, that's what tonumber is for:
jq -Rn 'input | tonumber' <yourfile
...which will output:
5584157003

is there a way to specifically retrieve a line number? for example line 3?
If no transformation need be done, then using sed would probably be the simplest, efficient approach; if a simple transformation is required, then besides sed, awk might be worth considering, but jq might also be worth considering under certain circumstances.
In particular, if efficiency is a consideration, then it would make sense to use jq's nth filter, along the lines of:
jq --argjson n 3 -nR 'nth($n - 1; inputs)' newfile.txt
This approach will avoid reading lines beyond the specified one.
(nth counts from 0.)
You might also want to use jq's -r option.

what I am ultimately trying to do is fetch each line number from this file one by one and make an api call using bash.
For a straightforward task like that, you could simply use bash's read, along the lines of
while IFS= read -r line ; do ... done < newfile.txt
If any kind of transformation of the input lines needs to be done, however, jq might be appropriate, e.g. if the lines must be URL-encoded. This could be done using inputs in conjunction with jq's -n and -R command-line options, along the lines of:
while IFS= read -r line ; do
...
done < <(jq -Rrn 'inputs|#uri' newfile.txt)

Related

How do I add the first 2 letters of every line in a file to a list using bash?

I have a file ($ScriptName). I want the first 2 charactors of every line to be in a list (Starters). I am using a bash script.
How would I do this?
I have declared my array like this:
array=() #Empty array
Using guidence from this: https://opensource.com/article/18/5/you-dont-know-bash-intro-bash-arrays
I am using manjaro 19 and the latest kernel.
To get the first two characters from each line, you can use
cut -c1,2 "$ScriptName"
-c1,2 means "output characters in positions 1 and 2"
I'm not sure what you mean by a "list". If you just want to create a file with the results, use redirection:
cut -c1,2 "$ScriptName" > Starters
If you want to populate an array, just use
while IFS= read -r starter ; do Starters+=("$starter") ; done < <(cut -c1,2 "$ScriptName")
Moreover, if you're interested in letters rather than characters, you can use sed to remove non-letters from each line and then use the solution shown above.
sed 's/[^[:alpha:]]//g' "$ScriptName" | cut -c1,2
Try this Shellcheck-clean (except for a missing initialization of ScriptName) pure Bash code:
Starters=()
while IFS= read -r line || [[ -n $line ]]; do
Starters+=( "${line:0:2}" )
done < "$ScriptName"
See Arrays [Bash Hackers Wiki] for information about using arrays in Bash.
See BashFAQ/001 (How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?)
for information about reading files line-by-line in Bash.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) (particularly the bit about "range notation") for an explanation of ${line:0:2}".
The mapfile bash built-in command combined with cut makes it simple:
cut -c1,2 "$ScriptName" | mapfile Starters

text file contains lines of bizarre characters - want to fix

I'm an inexperienced programmer grappling with a new problem in a large text file which contains data I am trying to process. Here's a screen capture of what I'm looking at (using 'less' - I am on a linux server):
https://drive.google.com/file/d/0B4VAqfRxlxGpaW53THBNeGh5N2c/view?usp=sharing
Bioinformaticians will recognize this file as a "fastq" file containing DNA sequence data. The top half of the screenshot contains data in its expected format (which I admit contains some "bizarre" characters, but that is not the issue). However, the bottom half (with many characters shaded in white) is completely messed up. If I were to scroll down the file, it eventually returns to normal text after about 500 lines. I want to fix it because it is breaking downstream operations I am trying to perform (which complain about precisely this position in the file).
Is there a way to grep for and remove the shaded lines? Or can I fix this problem by somehow changing the encoding on the offending lines?
Thanks
If you are lucky, you can use
strings file > file2
Oh well, try it another way.
Determine the linelength of the correct lines (I think the first two lines are different).
head -1 file | wc -c
head -2 file | tail -1 | wc -c
Hmm, wc also counts the line-ending, substract 1 from both lengths.
Than try to read the file 1 line a time. Use a case-statement so you do not have to write a lot of else-if constructions for comparing the length to the expected length. In the code I will accept the lengths 20, 100 and 330
Redirect everything to another file outside the loop (inside will overwrite each line).
cat file | while read -r line; do
case ${#line} in
20|100|330) echo $line ;;
esac
done > file2
A total different approach would be filtering the wrong lines with sed, awk or grep but that would require knowledge what characters you will and won't accept.
Yes, when you are a lucky (wo-)man, all ugly lines will have a character in common like '<' or maybe an '#'. In that case you can use egrep:
egrep -v "<|#" file > file2
BASED ON INSPECTION OF THE SNAP
sed -r 's/<[[:alnum:]]{2}>//g;s/\^.//g;s/ESC\^*C*//g' file
to make the actual changes in the file and make a backup file with a .bak extension do
sed -r -i.bak 's/<[[:alnum:]]{2}>//g;s/\^.//g;s/ESC\^*C*//g' file

How do I insert the results of several commands on a file as part of my sed stream?

I use DJing software on linux (xwax) which uses a 'scanning' script (visible here) that compiles all the music files available to the software and outputs a string which contains a path to the filename and then the title of the mp3. For example, if it scans path-to-mp3/Artist - Test.mp3, it will spit out a string like so:
path-to-mp3/Artist - Test.mp3[tab]Artist - Test
I have tagged all my mp3s with BPM information via the id3v2 tool and have a commandline method for extracting that information as follows:
id3v2 -l name-of-mp3.mp3 | grep TBPM | cut -D: -f2
That spits out JUST the numerical BPM to me. What I'd like to do is prepend the BPM number from the above command as part of the xwax scanning script, but I'm not sure how to insert that command in the midst of the script. What I'd want it to generate is:
path-to-mp3/Artist - Test.mp3[tab][bpm]Artist - Test
Any ideas?
It's not clear to me where in that script you want to insert the BPM number, but the idea is this:
To embed the output of one command into the arguments of another, you can use the "command substitution" notation `...` or $(...). For example, this:
rm $(echo abcd)
runs the command echo abcd and substitutes its output (abcd) into the overall command; so that's equivalent to just rm abcd. It will remove the file named abcd.
The above doesn't work inside single-quotes. If you want, you can just put it outside quotes, as I did in the above example; but it's generally safer to put it inside double-quotes (so as to prevent some unwanted postprocessing). Either of these:
rm "$(echo abcd)"
rm "a$(echo bc)d"
will remove the file named abcd.
In your case, you need to embed the command substitution into the middle of an argument that's mostly single-quoted. You can do that by simply putting the single-quoted strings and double-quoted strings right next to each other with no space in between, so that Bash will combine them into a single argument. (This also works with unquoted strings.) For example, either of these:
rm a"$(echo bc)"d
rm 'a'"$(echo bc)"'d'
will remove the file named abcd.
Edited to add: O.K., I think I understand what you're trying to do. You have a command that either (1) outputs out all the files in a specified directory (and any subdirectories and so on), one per line, or (2) outputs the contents of a file, where the contents of that file is a list of files, one per line. So in either case, it's outputting a list of files, one per line. And you're piping that list into this command:
sed -n '
{
# /[<num>[.]] <artist> - <title>.ext
s:/\([0-9]\+.\? \+\)\?\([^/]*\) \+- \+\([^/]*\)\.[A-Z0-9]*$:\0\t\2\t\3:pi
t
# /<artist> - <album>[/(Disc|Side) <name>]/[<ABnum>[.]] <title>.ext
s:/\([^/]*\) \+- \+\([^/]*\)\(/\(disc\|side\) [0-9A-Z][^/]*\)\?/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t\1\t\6:pi
t
# /[<ABnum>[.]] <name>.ext
s:/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t\t\2:pi
}
'
which runs a sed script over that list. What you want is for all of the replacement-strings to change from \0\t... to \0\tBPM\t..., where BPM is the BPM number computed from your command. Right? And you need to compute that BPM number separately for each file, so instead of relying on seds implicit line-by-line looping, you need to handle the looping yourself, and process one line at a time. Right?
So, you should change the above command to this:
while read -r LINE ; do # loop over the lines, saving each one as "$LINE"
BPM=$(id3v2 -l "$LINE" | grep TBPM | cut -D: -f2) # save BPM as "$BPM"
sed -n '
{
# /[<num>[.]] <artist> - <title>.ext
s:/\([0-9]\+.\? \+\)\?\([^/]*\) \+- \+\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\2\t\3:pi
t
# /<artist> - <album>[/(Disc|Side) <name>]/[<ABnum>[.]] <title>.ext
s:/\([^/]*\) \+- \+\([^/]*\)\(/\(disc\|side\) [0-9A-Z][^/]*\)\?/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\1\t\6:pi
t
# /[<ABnum>[.]] <name>.ext
s:/\([A-H]\?[A0-9]\?[0-9].\? \+\)\?\([^/]*\)\.[A-Z0-9]*$:\0\t'"$BPM"'\t\t\2:pi
}
' <<<"$LINE" # take $LINE as input, rather than reading more lines
done
(where the only change to the sed script itself was to insert '"$BPM"'\t in a few places to switch from single-quoting to double-quoting, then insert the BPM, then switch back to single-quoting and add a tab).

How to convert the script from using command,read to command,cut?

Here is the test sample:
test_catalog,test_title,test_type,test_artist
And i can use the following sript to cut off the text above by comma and set the variable respectively:
IFS=","
read cdcatnum cdtitle cdtype cdac < $temp_file
(ps:and the $temp_file is the dir of the test sample)
And if i want to replace the read with command,cut.Any idea?
There are many solutions:
line=$(head -1 "$temp_file")
echo $line | cut -d, ...
or
cut -d, ... <<< "$line"
or you can tell BASH to copy the line into an array:
typeset IFS=,
set -A ARRAY $(head -1 "$temp_file")
# use it
echo $ARRAY[0] # test_catalog
echo $ARRAY[1] # test_title
...
I prefer the array solution because it gives you a distinct data type and clearly communicates your intent. The echo/cut solution is also somewhat slower.
[EDIT] On the other hand, the read command splits the line into individual variables which gives each value a name. Which is more readable: $ARRAY[0] or $cdcatnum?
If you move columns around, you will just need to rearrange the arguments to the read command - if you use arrays, you will have to update all the array indices which you will get wrong.
Also read makes it much more simple to process the whole file:
while read cdcatnum cdtitle cdtype cdac ; do
....
done < "$temp_file"
man cut ?
But seriously, if you have something that works, why do you want to change it?
Personally, I'd probably use awk or perl to manipulate CSV files in linux.

Is there a way to put the following logic into a grep command?

For example suppose I have the following piece of data
ABC,3,4
,,ExtraInfo
,,MoreInfo
XYZ,6,7
,,XyzInfo
,,MoreXyz
ABC,1,2
,,ABCInfo
,,MoreABC
It's trivial to get grep to extract the ABC lines. However if I want to also grab the following lines to produce this output
ABC,3,4
,,ExtraInfo
,,MoreInfo
ABC,1,2
,,ABCInfo
,,MoreABC
Can this be done using grep and standard shell scripting?
Edit: Just to clarify there could be a variable number of lines in between. The logic would be to keep printing while the first column of the CSV is empty.
grep -A 2 {Your regex} will output the two lines following the found strings.
Update:
Since you specified that it could be any number of lines, this would not be possible as grep focuses on matching on a single line see the following questions:
How can I search for a multiline pattern in a file?
Regex (grep) for multi-line search needed
Why can't i match the pattern in this case?
Selecting text spanning multiple lines using grep and regular expressions
You can use this, although a bit hackity due to the grep at the end of the pipeline to mute out anything that does not start with 'A' or ',':
$ sed -n '/^ABC/,/^[^,]/p' yourfile.txt| grep -v '^[^A,]'
Edit: A less hackity way is to use awk:
$ awk '/^ABC/ { want = 1 } !/^ABC/ && !/^,/ { want = 0 } { if (want) print }' f.txt
You can understand what it does if you read out loud the pattern and the thing in the braces.
The manpage has explanations for the options, of which you want to look at -A under Context Line Control.

Resources