0

I would like to remove orphans (non-matching pairs) from current array:

J01171 01/11/2012 08:03:34 J01171 Bath Rd Ipswich Reconnected  
J01171 01/11/2012 08:04:34 J01171 Bath Rd Ipswich Isolated by OTU Fault   
J01171 01/11/2012 08:04:47 J01171 Bath Rd Ipswich Reconnected   
J01171 02/01/2013 15:46:22 J01171 Bath Rd Ipswich Isolated by OTU Fault   
J01171 02/01/2013 15:46:36 J01171 Bath Rd Ipswich Reconnected  
J01171 01/02/2013 18:12:43 J01171 Bath Rd Ipswich Isolated by OTU Fault  
J01171 01/02/2013 18:42:32 J01171 Bath Rd Ipswich Reconnected   
J01181 10/12/2012 13:13:13 J01181 BathRd LeighRd Isolated by Fault    
J01181 10/12/2012 15:30:01 J01181 BathRd LeighRd Reconnected     
J01181 09/02/2013 00:43:00 J01181 BathRd LeighRd Isolated by OTU Fault     
J01181 09/02/2013 00:47:57 J01181 BathRd LeighRd Reconnected   
J01181 09/02/2013 00:49:00 J01181 BathRd LeighRd Isolated by OTU Fault

After removing orphans, the output should be like this:

J01171 01/11/2012 08:04:34 J01171 Bath Rd Ipswich Isolated by OTU Fault   
J01171 01/11/2012 08:04:47 J01171 Bath Rd Ipswich Reconnected   
J01171 02/01/2013 15:46:22 J01171 Bath Rd Ipswich Isolated by OTU Fault   
J01171 02/01/2013 15:46:36 J01171 Bath Rd Ipswich Reconnected  
J01171 01/02/2013 18:12:43 J01171 Bath Rd Ipswich Isolated by OTU Fault  
J01171 01/02/2013 18:42:32 J01171 Bath Rd Ipswich Reconnected   
J01181 10/12/2012 13:13:13 J01181 BathRd LeighRd Isolated by Fault    
J01181 10/12/2012 15:30:01 J01181 BathRd LeighRd Reconnected     
J01181 09/02/2013 00:43:00 J01181 BathRd LeighRd Isolated by OTU Fault     
J01181 09/02/2013 00:47:57 J01181 BathRd LeighRd Reconnected

All elements in a sorted array come in pairs 'Isolated - Reconnected' for all asset codes. But, the array has got orphans for some asset codes: on the top there is non-matching 'Reconnected' (because its 'Isolated' pair has been left in other log file) and at the bottom there is non-matching 'Isolated' (because its 'Reconnected' pair will be in a future log file). My task is to get rid of all orphans. I have put here only 2 asset codes, but in reality hundreds (or might be even thousands) asset codes with a half of million elements in array, therefore hundreds orphans.

Orphans may come in the middle of asset codes as well. Basically, there might be a case when three 'Isolated' will follow each other in the middle of any given asset code. I need to remove following 'Isolated'-s after the first 'Isolated', because it has not get its pair yet. For instance,

X00000 dd/mm/yyyy hh:mm:ss X00000 qwerty Isolated    
X00000 dd/mm/yyyy hh:mm:ss X00000 qwerty Isolated [NEEDS TO BE REMOVED]     
X00000 dd/mm/yyyy hh:mm:ss X00000 qwerty Isolated  [NEEDS TO BE REMOVED]    
X00000 dd/mm/yyyy hh:mm:ss X00000 qwerty Reconnected    
J00000 dd/mm/yyyy hh:mm:ss X00000 qwerty Isolated    
J00000 dd/mm/yyyy hh:mm:ss X00000 qwerty Isolated [NEEDS TO BE REMOVED]        
J00000 dd/mm/yyyy hh:mm:ss X00000 qwerty Reconnected    
J00000 dd/mm/yyyy hh:mm:ss X00000 qwerty Reconnected  [NEEDS TO BE REMOVED]

Any ideas to deal with this problem? Thanks in advance.

8
  • 1
    In what way is it not working? Commented Jun 11, 2013 at 14:52
  • @Paulpro doesn't remove all orphans Commented Jun 11, 2013 at 15:07
  • Which orphans doesn't it remove? Just random orphans in the file? Ones at the beginning or the end? Are you sure that your file always has "Isolated" followed by "Reconnected"? Or could you have two (or more) "Isolated" in a row? Commented Jun 11, 2013 at 15:09
  • Also, what does removeA actually do? Because if it's removing elements from the array as you are iterating over it, it could really cause problems. Commented Jun 11, 2013 at 15:13
  • @MattBurland it does remove elements. Why do you think it might cause problems? removeA = function(arr) { var what, a = arguments, L = a.length, ax; while (L > 1 && arr.length) { what = a[--L]; while ((ax = $.inArray(what, arr)) !== -1) { arr.splice(ax, 1); } } return arr; } Commented Jun 11, 2013 at 15:17

2 Answers 2

1

I think this does what you want:

isolated = {} 

result = data.reduce(function(buf, line) {
    var m = line.match(/(^[A-Z]\d{5}).*?(Reconnected|Isolated)/);
    var asset = m[1], event = m[2];

    if(event == "Reconnected" && asset in isolated) {
        buf[isolated[asset]] = buf[isolated[asset]].substr(1);
        delete isolated[asset];
        buf.push(line);
    } else if(event == "Isolated") {
        isolated[asset] = buf.push("?" + line) - 1;
    }
    return buf;
}, []). filter(function(line) {
    return line.charAt(0) != "?";
})

This idea is to keep track of "isolated" lines and to "resume" cleanup once a matching "reconnected" is found. Note that this code does NOT require "isolated" to be immediately followed by "reconnected" and can process logs with mixed outputs from different assets.

Complete fiddle: http://jsfiddle.net/evY8B/

Sign up to request clarification or add additional context in comments.

18 Comments

thanks, @thg435. Is 'result' variable an array? I try to deal with it as an array but browser doesn't execute my javascript.
@KananFarzali: yes, result is an array that contains only valid lines. It should be equal to the original array excluding all orphaned lines.
how shall I replace my loop with your function then? I try but it doesn't work. jsfiddle.net/kanan88/tbsGz
@KananFarzali: I don't understand what you're trying to achieve.
This is the content of a file. I get the content, split it into an array of the lines, and after deleting all orphans I want them joined back into the content. BTW, I saw you are using /^[A-Z]\d{5}/ to match asset code, while I am using /[A-Z]{1}\d{5}/. Which one is more strict? Not a guru in regular expressions. Thank you.
|
1

This fiddle seems to work as expected:

var orph_arr = [
    "J01171 01/11/2012 08:03:34 J01171 Bath Rd Ipswich Reconnected",
    "J01179 01/11/2012 08:04:34 J01179 Bath Rd Ipswich Isolated by OTU Fault",  // Note: this guy is an orphan according to the OP's definition
    "J01171 01/11/2012 08:04:34 J01171 Bath Rd Ipswich Isolated by OTU Fault",
    "J01171 01/11/2012 08:04:47 J01171 Bath Rd Ipswich Reconnected",
    "J01171 02/01/2013 15:46:22 J01171 Bath Rd Ipswich Isolated by OTU Fault",
    "J01171 02/01/2013 15:46:36 J01171 Bath Rd Ipswich Reconnected",
    "J01171 01/02/2013 18:12:43 J01171 Bath Rd Ipswich Isolated by OTU Fault",
    "J01171 01/02/2013 18:42:32 J01171 Bath Rd Ipswich Reconnected",
    "J01181 10/12/2012 13:13:13 J01181 BathRd LeighRd Isolated by Fault",
    "J01181 10/12/2012 15:30:01 J01181 BathRd LeighRd Reconnected",
    "J01181 09/02/2013 00:43:00 J01181 BathRd LeighRd Isolated by OTU Fault",
    "J01181 09/02/2013 00:47:57 J01181 BathRd LeighRd Reconnected",
    "J01181 09/02/2013 00:49:00 J01181 BathRd LeighRd Isolated by OTU Fault"];



for (var i = 0; i < orph_arr.length - 1; i++) {
    var asset1 = orph_arr[i].match(/[A-Z]{1}\d{5}/);
    var asset2 = orph_arr[i + 1].match(/[A-Z]{1}\d{5}/);
    var isolated1 = orph_arr[i].match(/\b(Isolated)\b/gi);
    var isolated2 = orph_arr[i + 1].match(/\b(Isolated)\b/gi);
    var reconnected1 = orph_arr[i].match(/\b(Reconnected)\b/gi);
    var reconnected2 = orph_arr[i + 1].match(/\b(Reconnected)\b/gi);

    if ((asset1[0] !== asset2[0]) && (reconnected1) && (reconnected2)) {
        orph_arr[i + 1] = "REMOVED";
    }
    if ((asset1[0] !== asset2[0]) && (isolated1) && (isolated2)) {
        orph_arr[i] = "REMOVED";
    }
}

console.dir(orph_arr);

I simplified it by just replacing the orphans rather than removing them, but obviously they could be removed instead (a simple orph_arr.splice(i,1); should do it), although I think it's generally better if you don't remove them from a list while you are iterating through it. It tends to mess up your indexes.

Here's a fiddle that actually removes rather than replaces. Note how it's important to set your array index back or you will miss two consecutive orphans.

4 Comments

I actually found out that orphans for some asset codes may come in the middle as well. It means I need a function which will remove all orphans removing them from any place.
@KananFarzali: Well that's why I asked! You need to define the problem before you can solve it.
well, the problem is the same: to remove all orphans from everywhere, not only from the top and bottom of asset codes.
@KananFarzali: Well, no it isn't. Your definition of an orphan seems to shift every time you post. You need an exact definition of what makes a line an orphan. Clearly it isn't an orphan just because a "isolated" isn't immediately followed by a "reconnect", but thg435's solution works perfectly and yet you say it doesn't.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.