Merging Two Files by a Single Column in Unix

How to merge two .txt file in unix based on one common column. Unix

Thanks for adding your own attempts to solve the problem - it makes troubleshooting a lot easier.

This answer is a bit convoluted, but here is a potential solution (GNU join):

join -t $'\t' -1 2 -2 1 <(head -n 1 File1.txt && tail -n +2 File1.txt | sort -k2,2 ) <(head -n 1 File2.txt && tail -n +2 File2.txt | sort -k1,1)

#Sam_ID Sub_ID  v1  code    V3  V4
#2253734    1878372 SAMN06396112    20481   NA  DNA
#2275341    1884646 SAMN06432785    20483   NA  DNA
#2277481    1860945 SAMN06407597    20488   NA  DNA

Explanation:

join uses a single character as a separator, so you can't use "\t", but you can use $'\t' (as far as I know)
the -1 2 and -2 1 means "for the first file, use the second field" and "for the second file, use the first field" when combining the files
in each subprocess (<()), sort the file by the Sam_ID column but exclude the header from the sort (per Is there a way to ignore header lines in a UNIX sort?)

Edit

To specify the order of the columns in the output (to put the Sub_ID before the Sam_ID), you can use the -o option, e.g.

join -t $'\t' -1 2 -2 1 -o 1.1,1.2,1.3,2.2,2.3,2.4 <(head -n 1 File1.txt && tail -n +2 File1.txt | sort -k2,2 ) <(head -n 1 File2.txt && tail -n +2 File2.txt | sort -k1,1)

#Sub_ID Sam_ID  v1  code    V3  V4
#1878372    2253734 SAMN06396112    20481   NA  DNA
#1884646    2275341 SAMN06432785    20483   NA  DNA
#1860945    2277481 SAMN06407597    20488   NA  DNA

How to merge two files based on one column and print both matching and non-matching?

Assuming your real files are sorted like your samples are:

$ join -o 0,1.2,2.2 -e0 -a1 -a2 tmptest1.txt tmptest2.txt
aaa 231 222
bbb 132 0
ccc 111 0
ddd 0 132

If not sorted and using bash, zsh, ksh93 or another shell that understands <(command) redirection:

join -o 0,1.2,2.2 -e0 -a1 -a2 <(sort temptest1.txt) <(sort tmptest2.txt)

Merging two files by a single column in unix

Check out join(1). In your case, you don't even need any flags:

$ join file_b file_a
subjectid prob_disease name age
12 0.009 Jane 16
24 0.738 Kristen 90
15 0.392 Clarke 78
23 1.2E-5 Joann 31

Merging two files based on 1st matching columns using awk command

Could you please try following(tested with provided samples only).

awk '
BEGIN{
  FS=OFS=","
}
FNR>1 && FNR==NR{
  a[$1]=$2 OFS $3
  next
}
FNR>1{
  print $1,$2,$3,a[$1]?a[$1]:","
}
'  Test2.txt Test1.txt

Explanation: Adding explanation for above code now.

awk '
BEGIN{                              ##Starting BEGIN section from here, which will be executed before reading Input_file(s).
  FS=OFS=","                        ##Setting FS and OFS value as comma here.
}                                   ##Closing BEGIN section here.
FNR>1 && FNR==NR{                   ##Checking condition if FNR==NR which will be TRUE when 1st Input_file is being read and FNR>1 will skip its 1st line.
  a[$1]=$2 OFS $3                   ##Creating an array named a whose index is $1 and value is $2 OGS $3.
  next                              ##next will skip all further statements from here.
}
FNR>1{                              ##Checking condition FNR>1 which will run for all lines except 1st line of 2nd Input_file.
  print $1,$2,$3,a[$1]?a[$1]:","    ##Printing $1 $2 $3 and value of array a value whose index is $1 if its value is NULL then print comma there.
}
'  Test2.txt Test1.txt              ##Mentioning Input_file names here.

Merge two files based on two common columns, and replace the blank to 0

Could you please try following, written and tested with shown samples only in GNU awk.

awk '
FNR==NR{
  a[$1 OFS $2]=$NF
  next
}
{
  if(($1 OFS $2) in a){
    d[$1 OFS $2]
    $(NF+1)=a[$1 OFS $2]
  }
  else{
    $(NF+1)=0
  }
  print
}
END{
  for(i in a){
    if(!(i in d)){
      print i,"0",a[i]
    }
  }
}
' Input_file2  Input_file1 | sort -k1

Output will be as follows.

chr1 1000001 135 377
chr1 5500002 0 320
chr2 1000002 57 0
chr2 4400002 117 0
chr6 1000003 172 432

Unix: How to combine separate columns into one column

You can put in string literal inside awk print command.

Here's an example:

$ cat a
1 2 3 [AUTORESTART] Mar 17 21:21:32 GMT 2022
$ cat a | awk '{print $4 "," $6 " " $7 " " $8 " " $9 " " $10}'
[AUTORESTART],17 21:21:32 GMT 2022

You can see that I print 4th column, then a literal comma, then 6th column, then literal space, and so on until 10th column

You can then redirect it to a csv file

$ cat a | awk '{print $4 "," $6 " " $7 " " $8 " " $9 " " $10}' > mycsv.csv

Merge Two files of columns but insert columns of second file into columns of first file

You can use a loop in awk, for example

paste file_A file_B | awk '{ 
    half = NF/2; 
    for(i = 1; i < half; i++)
    {
        printf("%s %s ", $i, $(i+half));
    }
    printf("%s %s\n", $half, $NF);
}'

paste file_A file_B | awk '{ 
    i = 1; j = NF/2 + 1;
    while(j < NF)
    {
        printf("%s %s ", $i, $j);
        i++; j++;
    }
    printf("%s %s\n", $i, $j);
}'

The code assumes that the number of columns in awk's input is even.