Diffing The Line Numbers

hi guys

i am trying to find the "size" of a "block" of data in LARGE data files, the example below test_data.txt is very simplified. by "size" i mean the difference in line numbers of a block, and the "size" will be constant throughout the file so

1234 6.600000 4321
1234 8.500000 4321
1234 1.800000 4321
1234 2.300000 4321
1234 8.500000 4321
1234 2.800000 4321

if i define a block as whenever i find 8.500000 in the second column, then in the example the the block size would be 3 becasue 8.500000 occurs on the 5th line and on the 2nd. right now i am using

Code:
 grep -n "8.500000" test_data.txt | cut -f1 -d:

and/or

Code:
 awk '/8.500000/ {print FNR}' test_data.txt

obviously i don't remeber how to tag text as code?

btw, the grep command is much much faster

both of these commands give an entire list (long list of number for files greater than a gig) of line numbers which i then have to subtract one from another to come up with 3 in the example. not that i'm opposed to doing math, but i would think awk or grep should be able to do this for me

ideas?

tabby


Similar Content



Extract Info And Find/count Strings From Blocks Inside Text File

Hello

I have a text file which has blocks like
Code:
dir1/dir2/dir3/name_run_number1:
line1_run_number1_part1
line2_run_number1_part2
line3_run_number1_part3...

Each block is separated with a blank line and there is the ":" in the "header" of each one while each block carries the same "number1" after "run_" suffix
What I want to do is for each block, extract the "number1" as shown in the first line and then for the lines below count from 1-20 and give a message if a "partX" line is missing. Any bash or python would be fine

Thanks

Help With Applying Passing Parameters

i need to complete this exercise but my code has some issues
HERE is the PRoblem:
Create a script that can accept ANY amount of numbers from the command line. Process the numbers one at a time, where numbers greater than 10 print “large”, numbers less than or equal to 10 print “small”
E.g. process 5 10 15 would print
small
small
large

and here is my code so far
if [ $@ -le "10" ]
then
echo "smaller"
else
echo "bigger"
shift
fi
if [ $@ -le "10" ]
then
echo "smaller"
else
echo "bigger"
shift
fi
if [ $@ -le "10" ]
then
echo "smaller"
else
echo "bigger"
shift
fi
if [ $@ -le "10" ]
then
echo "smaller"
else
echo "bigger"
shift
fi

any help would be greatly appreciated

Grep: Find Files That Do Not Have Multiple Different Strings

Hi all,

I'm trying to identify files that do not have matches for certain strings. FYI, these are files of DNA sequences and I'm trying to find those that are NOT sampled for any species by my group of interest (e.g., genes that are specific to that group of organisms).

I tried this code but it's actually yielding a list of files that DO match for my regexp.
Code:
for FILENAME in *.fas
do
grep -q -L ">PBAH" $FILENAME && grep -q -L ">SKOW" $FILENAME && grep -q -L ">CGRA" $FILENAME && echo $FILENAME
done

Basically I want to somehow go through and file files that do not contain ">PBAH" ">SKOW" or ">CGRA". Any assistance would be greatly appreciated!

Best,
Kevin

Awk Or Sed Help

Hi All,
How can I replace the particular word using sed or awk
Code:
$ BUGZILLAURL="https://mylocalserver.com/bugzilla"
$ PROJECTNAME="mybugs"
$ echo "$BUGZILLAURL/$PROJECTNAME" 
https://mylocalserver.com/bugzilla/mybugs

There is a urlbase line in data/params file which has an empty variable, pls see 2nd line from below command's output.
Code:
$ grep -i "urlbase" data/params 
           'docs_urlbase' => 'docs/%lang%/html/',
           'urlbase' => '',
           'webdotbase' => 'http://www.research.att.com/~north/cgi-bin/webdot.cgi/%urlbase%',

Or search only for that line which I wanted to be as 'urlbase' => 'https://mylocalserver.com/bugzilla/mybugs',
Code:
$ grep -i "'urlbase'" data/params 
           'urlbase' => '',

So expecting your kind help to replace '' word with 'https://mylocalserver.com/bugzilla/mybugs' using sed or awk.

Thanks.

How To Run A Command In Another Command?

I am not sure how to ask this, sorry.

if I had a code like this

Code:
# grep -a ": " md5list.txt | cut -f2,3 -d


How can I run the command basename for each line of the output?

basename {(grep -a ": " md5list.txt | cut -f2,3 -d )}


EDIT: A little more clarity on what im doing:
I didn't realize that 'md5sums' was a link to a nice formatted page. So I copied all packages here and put them into a text file. I decided to write a script that put all of these in that format.

So basically, even though I have already ran the md5sum -c 'md5sum-list' I still want to finish this small project because I am learning a ton.

Can No Longer Mount Data Dvds

For whatever reason, when I try to mount a data DVD I get the following message:

Unable to mount [disk name]
Error mounting: mount exited with exit code 1: helper failed with:
mount: mount point /media/cdrom does not exist

Yet, if I put a DVD video in I can play on any of my video apps, yet I also can't mount those in a file browser. I have no idea what has caused this. I haven't backed up data to disks in a long time, I only did so today because I'm running low on space in one of my drives.

I was told elsewhere that I needed to alter a file: /etc/udev/rules.d/70-persistent-cd.rules

But I have no idea how to do this, and what I found there does not look like what was shown. This is what my file looks like:

Code:
# This file maintains persistent names for CD/DVD reader and writer devices.
# See udev(7) for syntax.
#
# Entries are automatically added by the 75-cd-aliases-generator.rules
# file; however you are also free to add your own entries provided you
# add the ENV{GENERATED}=1 flag to your own rules as well.
# TSSTcorp_CDDVDW_SH-S223C (pci-0000:00:1f.2-scsi-1:0:1:0)
SUBSYSTEM=="block", ENV{ID_CDROM}=="?*", ENV{ID_PATH}=="pci-0000:00:1f.2-scsi-1:0:1:0", SYMLINK+="cdrom", ENV{GENERATED}="1"
SUBSYSTEM=="block", ENV{ID_CDROM}=="?*", ENV{ID_PATH}=="pci-0000:00:1f.2-scsi-1:0:1:0", SYMLINK+="cdrw", ENV{GENERATED}="1"
SUBSYSTEM=="block", ENV{ID_CDROM}=="?*", ENV{ID_PATH}=="pci-0000:00:1f.2-scsi-1:0:1:0", SYMLINK+="dvd", ENV{GENERATED}="1"
SUBSYSTEM=="block", ENV{ID_CDROM}=="?*", ENV{ID_PATH}=="pci-0000:00:1f.2-scsi-1:0:1:0", SYMLINK+="dvdrw", ENV{GENERATED}="1"

Any suggestions would be appreciated.

Need Help In Bash Scripting

I have two files which has exact same number of lines.
I want first line of first file should be filename of new file and content of this new file should be first line of second file.
Then second line of first file should be filename of again new file and content of this new file should be second line of second file.
then third line of first file should be filename of again new file and content of this new file should be third line of second file.
and so on...
I am trying to do it using for loop but I am not able to create two for loops.
This is what I have done
Code:
IFS=$'\n'
var=$(sed 's/\"http\(.*\)\/\(.*\).wav\"\,\".*/\2/g' 1797.csv) # filenames of all files
var2=$(sed 's/\"http\(.*\)\/\(.*\).wav\"\,\"\(.*\)\"$/\3/g' 1797.csv) # contents of all files
for j in $var;
do
#Here I do not know how to use $var2
done

Please help.

Executing Command From File (with Tail) Probably Misquotes?

When working with a virtual terminal, I find it often easier to edit a file to execute than to construct a regular-expression etc. to inject the right UUID etc. into the command. I have run into an error while doing this. I suspect it is stems from quote mishandling, or improper escape sequences. (I ran directly from the command line earlier, forgot a quotation mark, and it gave a similar bad result.)
The program in question was efibootmgr. I had a file vaguely similar to this one, named efiboot.Hz:
Code:
efibootmgr -c -g -L "Debian (EFI stub)" -l '\EFI\debian\vmlinuz' -u 'root=UUID=$UUID ro quiet rootfstype=ext4 add_efi_memmap initrd=\\EFI\\debian\\initrd.img'
efibootmgr -c -d /dev/sdb -L "Debian Linux" -l '\EFI\debian\vmlinuz' -u 'root=UUID=1234-ffff-789 ro quiet rootfstype=ext4 add_efi_memmap initrd=\\EFI\\debian\\initrd.img'

Then I executed:
Code:
`tail -n 1 efiboot.Hz`

efibootmgr -v revealed the previous command produced a garbled name and boot options, and most importantly it didn't boot. Manually writing the last line on the terminal did produce the desired effect. I thought I checked the output from tail before putting the back-ticks.
What did I do wrong?

Removing Multiple Lines From Cell Data In A .csv File

I am trying to process some .csv files with Linux as follows:

Some fields have data with newline characters embedded, like so:

"Bob Smith
531 Pennsylvania Avenue
Washington, DC"

(I verified the existence of the " via Wordpad. The file is too large to easily edit in Wordpad to get all the data for each row on a single line).

what linux command would I use on the files to get the data in each cell on one line?

I have tried:

1. awk -v RS="" '{gsub (/\n/,"")}1' file > newfile

but the cell data was still being read in as if "531 Pennsylvania Avenue" was a brand new row in the CSV file.

2. Command 1 followed by awk -v RS="" '{gsub (/\r/,"")}1' newfile > finalFile

but that resulted in all of the data in the file being put onto a single line.

3. awk -v RS="" '{gsub (/\r\n/,"")}1' file > newFile

But that result was the same as attempt number 2.

How can I preprocess the file so that:

"Bob Smith
531 Pennsylvania Avenue
Washington, DC"

is read as a single field on a single line as part of the row it should be associated with, like

"Bob Smith 531 Pennsylvania Avenue Washington, DC"

How I Can Print A Specific Range Of Nubers Form A File.

hello,

i am trying to make a table from some files. i used this to record how much "RD_" field i have in my file. Quote:
grep -o 'RD_' $f|grep -c 'RD_'
forexample i got 5 "RD_" fields now i want to print 5 number of fields from another file starting from 2nd field. i did it mannully like Quote:
awk 'NR==1{print"{"$2","$3","$4","$5","$6","0.0000",""0.0000""}"","}' $file
i want to make it work together and a bit auto matic like PHP Code:
awk 'NR==1{print"{"$2"to "$5"," apend zeros to make it total 7 fields"}"","}' $file 


your coments would be apreciated
thanks alot