Editing Distance Clusters

Normalize

Heatmap

Threshold

A greedy string classification algorithm

This algorithm attempts to classify strings by their editing distance. It first computes the case-insensitive editing distances between all input strings in O(n²ab), where a and b the maximum string lengths. Normalized editing distances are stored in a square matrix, M. Normalized, in this context, means that the editing distance is divided by the length of the longer of the two strings. The value M_i,j = d means that the normalized editing distance between elements i and j is equal to d. The diagonal of this matrix should consist of all zeros.

Next, the algorithm sorts the columns of this distance matrix by the simple invariant that the element immediately to the right of the diagonal is the least element on that row right of the diagonal. Symbolically, the invariant states that for matrix M of size m×n, M_i,i+1 ≤ M_i,j for all i+1 < j ≤ n. This "sorting" is basically a selection sort and occurs in Θ(n²).

Finally, the algorithm assigns clusters by the normalized editing distance. This is a simple comparison. The threshold for differentiating clusters configurable by the user. If M_i,i+1 > threshold (more different) then the elements i and i+1 are classified in different clusters.

My goal with all of this is to automatically classify Syslog messages. This application looks promising! Try pasting the below test inputs into the text area above to see what interesting clusters you can come up with.

Test inputs and recommended thresholds

"Nice" clusters (0.50)

Even nicer clusters (.40)

Low variation (.45)

High variation (.75)

Very high variation (.80)

Long input (notice Grover Cleveland) (.50)

Interface state change Syslog messages (.20)

Router adjacency change Syslog messages (.20)

*Mar 15 20:30:22.475: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Fa0/1: Inconsitent superior PVST BPDU received on VLAN 10, claiming root 24586:0009.b752.d780
*Mar 15 20:30:22.479: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Fa0/2: Inconsitent superior PVST BPDU received on VLAN 10, claiming root 24586:0009.b752.d780
*Mar 15 20:30:22.479: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Fa0/3: Inconsitent superior PVST BPDU received on VLAN 10, claiming root 24586:0009.b752.d780
*Mar 15 20:30:22.479: %SPANTREE-2-PVSTSIM_FAIL: Blocking designated port Fa0/4: Inconsitent superior PVST BPDU received on VLAN 10, claiming root 24586:0009.b752.d780
*Mar 15 20:30:22.667: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down
*Mar 15 20:30:22.667: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan50, changed state to down
*Mar 15 20:30:22.671: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.50 (Vlan50) is down: interface down
*Mar 15 20:30:22.671: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.101 (Vlan50) is down: interface down
*Mar 15 20:30:22.683: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan10, changed state to down
*Mar 15 20:30:22.687: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 192.168.1.101 (Vlan10) is down: interface down
*Mar 15 20:30:24.387: %SYS-5-CONFIG_I: Configured from console by console
*Mar 15 20:30:42.479: %SPANTREE-2-PVSTSIM_OK: PVST Simulation inconsistency cleared on port FastEthernet0/2.
*Mar 15 20:30:52.651: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to up
*Mar 15 20:30:52.651: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan10, changed state to up
*Mar 15 20:30:52.679: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan50, changed state to up
*Mar 15 20:30:54.043: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.50 (Vlan50) is up: new adjacency
2d16h: %SYS-5-CONFIG_I: Configured from console by console
2d16h: %SPANTREE-2-PVSTSIM_FAIL: Blocking root port Fa0/5: Inconsitent inferior PVST BPDU received on VLAN 20, claiming root 24596:0009.b752.d780
2d16h: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.50 (Vlan50) is up: new adjacency
2d16h: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 192.168.1.100 (Vlan10) is up: new adjacency
2d16h: %DUAL-5-NBRCHANGE: EIGRP-IPv4:(1) 1: Neighbor 172.16.50.100 (Vlan50) is up: new adjacency

Spanning tree state change Syslog messages (.20)

Jan  3 19:49:29:I:MSTP: MST 1 Port 1/1/9:1  - Bridge TC Event
Jan  3 19:49:12:I:MSTP: MST 0 Port 1/1/9:1 - DISCARDING
Jan  3 19:49:29:I:MSTP: MST 0 Port 1/1/9:1 - LEARNING
Jan  3 19:49:29:I:MSTP: MST 0 Port 1/1/9:1 - FORWARDING
Jan  3 19:49:12:I:MRP: Interface ethernet 1/1/9:1 of ring 51  Vlan 51, changing to disabled
Jan  3 19:49:29:I:MSTP: MST 0 Port 1/1/11:1  - Bridge TC Event
Jan  3 19:49:29:I:MRP: Interface ethernet 1/1/9:1 of ring 51  Vlan 51, changing to preforwarding
Jan  3 19:50:43:I:MSTP: MST 0 Port 1/1/11:1  - Bridge TC Event
Jan  3 19:49:12:I:System: Interface ethernet 1/1/9:3, state down
Jan  3 19:49:29:I:MSTP: MST 1 Port 1/1/9:1 - FORWARDING
Jan  3 19:49:29:I:MSTP: MST 0 Port 1/1/9:1  - Bridge TC Event
Jan  3 19:49:29:I:System: Logical link on dynamic lag interface ethernet 1/1/9:4 is up.
Jan  3 19:49:29:I:System: Logical link on dynamic lag interface ethernet 1/1/9:1 is up.
Jan  3 20:23:10:I:Security: startup-config was changed by operator from console
Jan  3 19:49:29:I:System: Logical link on dynamic lag interface ethernet 1/1/9:3 is up.
Jan  3 19:49:12:I:System: Logical link on dynamic lag interface ethernet 1/1/9:4 is down.
Jan  3 19:49:29:I:Trunk: Group (1/1/9:1, 1/1/9:2, 1/1/9:3, 1/1/9:4) created by 802.3ad link-aggregation module.
Jan  3 19:49:29:I:System: Interface ethernet 1/1/9:2, state up
Jan  3 19:49:12:I:System: Logical link on dynamic lag interface ethernet 1/1/9:3 is down.
Jan  3 19:49:29:I:System: Interface ethernet 1/1/9:3, state up
Jan  3 19:49:29:I:MSTP: MST 0 Port 1/1/9:1 - DISCARDING
Jan  3 19:49:29:I:MSTP: MST 1 Port 1/1/9:1 - LEARNING
Jan  3 19:49:12:I:MSTP: MST 1 Port 1/1/9:1 - DISCARDING
Jan  3 19:49:12:I:Trunk: Group (1/1/9:1, 1/1/9:2, 1/1/9:3, 1/1/9:4) removed by 802.3ad link-aggregation module.
Jan  3 19:49:12:I:System: Logical link on dynamic lag interface ethernet 1/1/9:1 is down.
Jan  3 19:49:29:I:System: Logical link on dynamic lag interface ethernet 1/1/9:2 is up.
Jan  3 19:49:29:I:MSTP: MST 1 Port 1/1/11:1  - Bridge TC Event
Jan  3 19:49:12:I:System: Interface ethernet 1/1/9:4, state down
Jan  3 19:49:12:I:System: Logical link on dynamic lag interface ethernet 1/1/9:2 is down.
Jan  3 20:17:25:I:Security: startup-config was changed by operator from console
Jan  3 19:49:29:I:MSTP: MST 1 Port 1/1/9:1 - DISCARDING
Jan  3 19:50:43:I:MSTP: MST 1 Port 1/1/11:1  - Bridge TC Event
Jan  3 19:49:29:I:System: Interface ethernet 1/1/9:4, state up
Jan  3 19:49:29:I:System: Interface ethernet 1/1/9:1, state up
Jan  3 19:49:12:I:System: Interface ethernet 1/1/9:2, state down
Jan  3 19:49:29:I:MRP: Interface ethernet 1/1/9:1 of ring 51  Vlan 51, changing to forwarding
Jan  3 19:49:12:I:System: Interface ethernet 1/1/9:1, state down