How to merge two .txt file in unix based on one common column. Unix
Thanks for adding your own attempts to solve the problem - it makes troubleshooting a lot easier.
This answer is a bit convoluted, but here is a potential solution (GNU join):
join -t $'\t' -1 2 -2 1 <(head -n 1 File1.txt && tail -n +2 File1.txt | sort -k2,2 ) <(head -n 1 File2.txt && tail -n +2 File2.txt | sort -k1,1)
#Sam_ID Sub_ID v1 code V3 V4
#2253734 1878372 SAMN06396112 20481 NA DNA
#2275341 1884646 SAMN06432785 20483 NA DNA
#2277481 1860945 SAMN06407597 20488 NA DNA
Explanation:
- join uses a single character as a separator, so you can't use
"\t"
, but you can use$'\t'
(as far as I know) - the
-1 2
and-2 1
means "for the first file, use the second field" and "for the second file, use the first field" when combining the files - in each subprocess (
<()
), sort the file by the Sam_ID column but exclude the header from the sort (per Is there a way to ignore header lines in a UNIX sort?)
Edit
To specify the order of the columns in the output (to put the Sub_ID before the Sam_ID), you can use the -o
option, e.g.
join -t $'\t' -1 2 -2 1 -o 1.1,1.2,1.3,2.2,2.3,2.4 <(head -n 1 File1.txt && tail -n +2 File1.txt | sort -k2,2 ) <(head -n 1 File2.txt && tail -n +2 File2.txt | sort -k1,1)
#Sub_ID Sam_ID v1 code V3 V4
#1878372 2253734 SAMN06396112 20481 NA DNA
#1884646 2275341 SAMN06432785 20483 NA DNA
#1860945 2277481 SAMN06407597 20488 NA DNA
How to merge two files based on one column and print both matching and non-matching?
Assuming your real files are sorted like your samples are:
$ join -o 0,1.2,2.2 -e0 -a1 -a2 tmptest1.txt tmptest2.txt
aaa 231 222
bbb 132 0
ccc 111 0
ddd 0 132
If not sorted and using bash, zsh, ksh93 or another shell that understands <(command)
redirection:
join -o 0,1.2,2.2 -e0 -a1 -a2 <(sort temptest1.txt) <(sort tmptest2.txt)
Merging two files by a single column in unix
Check out join(1)
. In your case, you don't even need any flags:
$ join file_b file_a
subjectid prob_disease name age
12 0.009 Jane 16
24 0.738 Kristen 90
15 0.392 Clarke 78
23 1.2E-5 Joann 31
Merging two files based on 1st matching columns using awk command
Could you please try following(tested with provided samples only).
awk '
BEGIN{
FS=OFS=","
}
FNR>1 && FNR==NR{
a[$1]=$2 OFS $3
next
}
FNR>1{
print $1,$2,$3,a[$1]?a[$1]:","
}
' Test2.txt Test1.txt
Explanation: Adding explanation for above code now.
awk '
BEGIN{ ##Starting BEGIN section from here, which will be executed before reading Input_file(s).
FS=OFS="," ##Setting FS and OFS value as comma here.
} ##Closing BEGIN section here.
FNR>1 && FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when 1st Input_file is being read and FNR>1 will skip its 1st line.
a[$1]=$2 OFS $3 ##Creating an array named a whose index is $1 and value is $2 OGS $3.
next ##next will skip all further statements from here.
}
FNR>1{ ##Checking condition FNR>1 which will run for all lines except 1st line of 2nd Input_file.
print $1,$2,$3,a[$1]?a[$1]:"," ##Printing $1 $2 $3 and value of array a value whose index is $1 if its value is NULL then print comma there.
}
' Test2.txt Test1.txt ##Mentioning Input_file names here.
Merge two files based on two common columns, and replace the blank to 0
Could you please try following, written and tested with shown samples only in GNU awk
.
awk '
FNR==NR{
a[$1 OFS $2]=$NF
next
}
{
if(($1 OFS $2) in a){
d[$1 OFS $2]
$(NF+1)=a[$1 OFS $2]
}
else{
$(NF+1)=0
}
print
}
END{
for(i in a){
if(!(i in d)){
print i,"0",a[i]
}
}
}
' Input_file2 Input_file1 | sort -k1
Output will be as follows.
chr1 1000001 135 377
chr1 5500002 0 320
chr2 1000002 57 0
chr2 4400002 117 0
chr6 1000003 172 432
Unix: How to combine separate columns into one column
You can put in string literal inside awk
print command.
Here's an example:
$ cat a
1 2 3 [AUTORESTART] Mar 17 21:21:32 GMT 2022
$ cat a | awk '{print $4 "," $6 " " $7 " " $8 " " $9 " " $10}'
[AUTORESTART],17 21:21:32 GMT 2022
You can see that I print 4th column, then a literal comma, then 6th column, then literal space, and so on until 10th column
You can then redirect it to a csv file
$ cat a | awk '{print $4 "," $6 " " $7 " " $8 " " $9 " " $10}' > mycsv.csv
Merge Two files of columns but insert columns of second file into columns of first file
You can use a loop in awk
, for example
paste file_A file_B | awk '{
half = NF/2;
for(i = 1; i < half; i++)
{
printf("%s %s ", $i, $(i+half));
}
printf("%s %s\n", $half, $NF);
}'
or
paste file_A file_B | awk '{
i = 1; j = NF/2 + 1;
while(j < NF)
{
printf("%s %s ", $i, $j);
i++; j++;
}
printf("%s %s\n", $i, $j);
}'
The code assumes that the number of columns in awk
's input is even.
Related Topics
Can't Read Variable That Was Stored from Within a While Loop, When Out of the While Loop
How to Rename Files You Put into a Tar Archive Using Linux 'Tar'
Dynamically Determining Where a Rogue Avx-512 Instruction Is Executing
Force a Shell Script to Fflush
Vim Background with Gnu Screen
Sonar - Measure Code Coverage Using Cobertura
Does Awk Cr Lf Handling Break on Cygwin
How Is the Init Process Started in the Linux Kernel
Have 5 Scripts Running at Any Given Time
Create .So Files on Linux Without Using Pic (Position Independent Code) (X86 32Bit)
Unix File System: How Are File Names Translated to Disk Sectors
How to Determine Stack Size of a Program in Linux
How to Prevent a Linux User Space Pthread Yielding in Critical Code
Apache Mod_Rewrite Not Working with .Htaccess File