KnownBugs

Known Bugs

  • join.pl: A comment on Alex's entry below about join.pl. The join.pl script performs a left-inner join (not an outer join). See the description of join.pl. It is not a bug. It is written this way so that the result file has the same key order as the first file supplied. This is usually what we need for adding gene annotations or attaching data to a list of genes. We give the list of genes as the first file and it pulls out only the data for those genes in the order that they are given. Note that join.pl can be given a flag (the -o or -ob flags) that will tell it to perform a left-outer join. If you want to ensure that all keys mentioned in both files are represented in the result, this can be done by using a pipe that cuts out all the unique keys and passes them to two calls of join.pl. Using Alex's example below, if the two files are test and test2, you just do: cat test test2 | cut -f 1 | sort -u | join.pl -ob - test1 | join.pl -ob - test2. Note that there is currently no implementation of a full outer join. For example, if test had the entries 'a\tb1\na\tb2' and test2 was the same as below, a full outer join should return four result rows: 'a\tb1\tb', 'a\tb1\tc', 'a\tb2\tb', and 'a\tb2\tc'. At this point, as Alex mentions below, the left (inner or outer) join will only select out the last row in test2 and will return two result rows: 'a\tb1\tc' and 'a\tb2\tc'. If anyone wants to implement a full outer join, where the cross-product of all matching entries are returned, please be my guest.
  • join.pl: Unlike join, join.pl does NOT require the input files to be sorted already. Note that join.pl actually has slightly different behaviors from join. For example, join.pl only matches the last item it runs across when matching, not all matches like regular join. For an example of this behavior, try: echo 'a' > test ; echo 'a\tb\na\tc' > test2 . Running join.pl test1 test2 will not report the "a b" match. It is possible that this is intended behavior or can be changed with a command line option, but if so, I couldn't figure it out. (Reported by Alex).