Trying to group values?

Rhinn

New Member
I have some data like this:\[code\]1 23 45 92 63 7\[/code\]and am looking for an output like this (group-id and the members of that group):\[code\]1: 1 2 62: 3 4 73: 5 9\[/code\]First row because 1 is "connected" to 2 and 2 is connected to 6.Second row because 3 is connected to 4 and 3 is connected to 7This looked to me like a graph traversal but the final order does not matter so I was wondering if someone can suggest a simpler solution that I can use on a large dataset (billions of entries).From the comments:
  • The problem is to find the set of disjoint sub-graphs given a set of edges.
  • The edges are not directed; the line '1 2' means that 1 is connected to 2 and 2 is connected to 1.
  • The '1:' in the sample output could be 'A:' without changing the meaning of the answer.
EDIT 1:Problem looks solved now. Thanks to everyone for their help. I need some more help picking the best solution that can be used on billions of such entries.EDIT 2:Test Input file:\[code\]1 271 1341 1371 1611 1711 2751 3091 4131 4641 6271 7442 1352 3982 4372 5482 5942 7172 7382 7832 7982 9125 745 2237 537 657 1227 2377 3147 7017 7307 7557 8217 8757 8847 8987 9007 9308 1159 2079 3059 3429 3649 4939 6009 6769 8309 94110 16410 28310 38010 42310 46810 57711 7211 13211 27611 30611 40111 51511 59912 9512 12612 29413 6413 17213 52814 39615 3515 6615 21015 22615 36015 58817 26317 41517 47417 64817 98621 54321 77122 4723 7023 20323 42723 59024 28624 56525 17526 67827 13727 16127 17127 27527 30927 41327 46427 62727 68427 74429 787\[/code\]Benchmarks:I tried out everything and the version posted by TokenMacGuy is the fastest on the sample dataset that I tried. The dataset has about 1 million entries for which it took me about 6 seconds on a Dual Quad-Core 2.4GHz machine. I haven't gotten a chance to run it on the entire dataset yet but I will post the benchmark as soon as it is available.
 
Back
Top