Frame matching¶
Note
I’d like to point out, that I did not invent the algorithm presented below. I reimplemented the algorithm that was part of the original Munipack software. According to the original source code, Filip Hroch is the author, but I haven’t found any paper regarding this method. In case you know about it, please let me know.
Up to this point, the source frames are processed independently. To make a light curve of a particular object, we need to trace that object throughout the set of frames. This step is called matching in the C-Munipack software. It takes two or more lists of stellar objects (result of photometry) and determines which objects on one list are the same physical object on the rest of the lists.
The matching algorithm implemented in the C-Munipack software is an extension of the algorithm published by Valdes [valdes95]. The main difference is that it builds polygons of n \geq 3 vertices instead of triangles. The text follows the terminology of Valdes wherever possible. The matching algorithms based on similarity of triangles were described and implemented also by Stetson [stetson89] and Groth [groth86].
The algorithm works on two catalogs of objects, each object is described by its coordinates in some local coordinate system and its brightness. We assume that the coordinate systems of these catalogs are shifted, scaled, rotated or even inverted. To make it even more difficult, the catalogs can overlap only partially; objects from one catalog need to be present in the second catalog and vice versa. Problems with separation of close objects might cause that objects present in one catalogs are missing in the other catalogs. In addition to that, due to random and field distortions to position of objects are subject of noise and thus they are known only imprecisely.
The matching algorithm works in three subsequent phases. In the first phase, possible transformations between the coordinate systems of two catalogs are found. Then, the best transformation is chosen and last a relation between the objects from the two catalogs is established.
Although the method in its basic form works on a pair of catalogs only, its extension to an arbitrary number of frames is obvious. One frame from the set of source frames is chosen as a reference catalog and all other source frames are matched against the reference one. The alternative is to use any other catalog of the same field and match the source frames against it.
Similar triangles
A catalog of objects can be seen a large number of triangles - it is possible to create a triangle for every combination of three objects. If the coordinate systems of two catalogs are shifted, scaled, rotated or inverted (flipped) to each other and two catalogs show at least partially the same set of physical objects, there must be many triangles that have the similar shape in both catalogs, because the said operations do not change the their shape. Not all triangles from one catalog have their counterparts in the other one. Also, due to the noise component in position measurements and field distortion, the triangles are never exactly the same.
Following Stetson [stetson89], shape of a triangle is described by two shape indices which form a point in two-dimensional triangle space (x_t, y_t) defined as:
(1)¶u = \frac{\text{length of second longest side}}{\text{length of longest side}}
and
(2)¶v = \frac{\text{length of shortest side}}{\text{length of longest side}}
It can be shown, using the law of sines, that if a triangle is made by shifting, rotation, scaling, inversion or their combination from an original triangle, it projects to the same point in the triangle space. The Euclidean distance in the triangle space is used to measure similarity of two triangles. If it is lesser than some value \epsilon, the triangles match.
Building polygons
The algorithm implemented in the C-Munipack software follows the method described by Valdes. It sorts the objects by the brightness in decreasing order and takes up to N_{obj} objects from both catalogs. But it does not stop when it find a pair of triangles that match. It continues to build up a pair of polygons of N vertices from them by taking one of the sides and finding two objects that would form yet another similar triangles. On success, it adds the objects to the list to make a quadrilateral and so on up to a polygon with N sides. The value N is a configurable parameter. A constant value \epsilon = 0.005 is used to check if two triangles match.
Reducing the number of false matches
The following optimization reduces the number of false matches. When a pair of objects (A, B)/ and *(A’, B’) is takes, their distances d = \left|AB\right| and d' = \left|A'B'\right| are computed. The ratio d:d’ is used as a scale between the two frames. Also, points in half between these points P and P’ are determined. For any point C from the first catalog, its distance to the point P is divided by d:d’ to get expected distance of a desired object C’ to the point P’. Then, it is possible to restrict the search to objects of distance to P’ not very far from the expected value. Quick sort algorithm is used to sort objects by distance from the point P’, which allows to stop search when the distance is above upper limit.
From polygon to transformation
By means of the algorithm described in the previous section, two similar polygons P_1 and P_2 were found, each of which has N vertices. The next step is to find the best-fitting transformation between coordinates of vertices of P_1 and P_2. The scale factor is estimated first. Next, the inversion is determined. Finally, the linear least squares (LLSQ) method is applied to find the values of the rest of the parameters.
The estimation of the scale factor s between the two polygons P and P’ is estimated as the robust mean of ratios between corresponding vertex distances of the two polygons. The mean value is determined using the robust mean algorithm (see Robust mean).
The inversion is represented by value m = 1 if mirroring does not occur (local coordinate systems are of the same handedness) or m = -1 if it does. To determine the value m, first two vertices of polygons P and P’ are used to make oriented lines and we check whether the third vertices in P and P’ are on the same side of the oriented lines. The test can be performed in 3-D space by comparison of sign of z-coordinate of crossproduct of the vectors \vec{AB} and \vec{AC} to the sign of z-coordinate of crossproduct of the vectors \vec{A'B'} and \vec{'A'C}. The value m = 1 if the signs are equal, m = -1 otherwise.
The rotation and offset between the two frames are estimated by determining coefficients A, B, X_0 and Y_0 by means of the LLSQ method.
The first estimation of the transformation matrix M_0 is created by combining the scale factor, the m value and the results of the LLSQ fitting. The variance :math`sigma_0^2` of the solution is determined using the minimized value of the sum of squares S:
(3)¶\sigma_0^2 = \frac{S}{2N - 4} = \frac{1}{2}\frac{S}{N-2}
The denominator, 2N-4, is the statistical degrees of freedom; 2N is the number of equations (each vertex gives two equations) and there are four free parameters. The variance will be used to determine allowed displacement between expected and observed position of a star.
Refining the transformation
The transformation found using polygon vertices only is further refined in the second stage by including stars from S and S’. It was stated above, that not all stars from S* and S’ must necessarily have corresponding objects on the other frame, thus it is necessary to use a robust approach and rule such stars out.
For each star from a set S, its coordinates are transformed into the second frame using the matrix M_0 (first estimation). Then, a star from the set S’ with minimum distance to the computed location is found, and if its distance is less than maximum displacement t, the two stars are included in the second estimation. The value t_0 is determined using the variance \sigma_0^2:
(4)¶t_0 = \sigma_0.\text{clip}.\sqrt{6}
The stars from the set S that were successfully identified in S’ are used in the LLSQ method (same as in the first iteration) in order to determine a second estimation of the transformation matrix M_1. The scale and mirror value are kept unchanged. Also, the new value of sample variance \sigma_1^2 is determined.
Number of identified stars and the sum of squares from the second iteration are used to select a best-fitting solution.
Vote array
The vote array is organized as a map, where keys are unordered sets of polygon vertices and it stores number of matches, stars that were identified in the second stage, sum of squares and the transformation matrix. When a new candidate is identified, the program compares the set of polygon vertices with existing keys in the vote array. When there is a matching item, its match counter is incremented by one. If not, a new record is added to the vote array and its match counter is initialized to one.
The selection of the best-fitting solution is based on the number of matches. If the vote array contains multiple records with the maximum match count, the first one wins.
Cross-reference table
The result of the matching algorithm applied to a set of source frames is a cross-reference table which represents relations between objects detected on the source frames. The table should allow to efficiently find the measurements for a particular physical object. In the C-Munipack software, each detected object on a single source frame is assigned a unique integer number at the final stage of the photometry step; these identification numbers are frame-wise.
The cross-reference table is a map where keys are frame identification and frame-wise object identifier and the values are global object identifiers. If the matching is done against one of the source frames, object identifiers on that source frame are used as global identifiers. If a catalog file is used, the object identifiers in the catalog are used as global identifiers.
The table is stored in a distributed form, each object on each frame has got an attribute that defines if the object was found on a reference frame or a catalog file and if so, what the global identification was.