The following assumes the input text is in a file called 'test.log' and that you wanted a solution in the form of something you are piping the input into and out (i.e. cat 'test.log' is used instead of specifying it as the input).
Using a placeholder value:
With a problem where you are attempting to use regular expressions to act on everything very similar to a pattern which you want to keep it is often easier to first change the text you desire to not act on to a placeholder value that is easily distinguished from the patterns you do desire to act upon:
cat test.log | sed -e "s/Q/Qz/g" -e "s/<e>123<\/e>/Qa/g" -e "s/<e>[^<]*<\/e>//g" -e "s/Qa/<e>123<\/e>/g" -e "s/Qz/Q/g" -e "s/<[^e]>[^<]*<\/[^e]>;\?//g" -e "s///g" -e "s///g" -e "s///g" -e "s///g" -e "s///g"
The trick is realizing that the data you are manipulating does not have to keep the form it was in throughout the intermediate forms you are manipulating. It is only the output that matters. Thus, the transformations of the data are:
Input (Added a line where there is no <e>123</e> at all in the tags. It is a case that we probably need to handle):
2014-09-01 12: 01: 01.899;some app logs
2014-09-01 12: 01: 02,045;some app logs2;<a><b><c><d><e>111</e></d><d><e>123</e></d><d><e>222</e></d><d><e>333</e></d></c></b></a>;some app logs3
2014-09-01 12: 01: 03,625;some app logs4
2014-09-01 12: 01: 04,045;some app logs5;<a><b><c><d><e>111</e></d><d><e>222</e></d><d><e>333</e></d></c></b></a>;some app logs6
Intermediary form 1 (just exists line by line within sed): Same as input because no "Q" existed in test data.
Intermediary form 2 (within sed): change text we want to keep to placeholder:
2014-09-01 12: 01: 01.899;some app logs
2014-09-01 12: 01: 02,045;some app logs2;<a><b><c><d><e>111</e></d><d>Qa</d><d><e>222</e></d><d><e>333</e></d></c></b></a>;some app logs3
2014-09-01 12: 01: 03,625;some app logs4
2014-09-01 12: 01: 04,045;some app logs5;<a><b><c><d><e>111</e></d><d><e>222</e></d><d><e>333</e></d></c></b></a>;some app logs6
Intermediary form 3 (remove <e></e> tags which don't contain 123):
2014-09-01 12: 01: 01.899;some app logs
2014-09-01 12: 01: 02,045;some app logs2;<a><b><c><d></d><d>Qa</d><d></d><d></d></c></b></a>;some app logs3
2014-09-01 12: 01: 03,625;some app logs4
2014-09-01 12: 01: 04,045;some app logs5;<a><b><c><d></d><d></d><d></d></c></b></a>;some app logs6
Intermediary form 4 (substitute <e>123</e> back from placeholder):
2014-09-01 12: 01: 01.899;some app logs
2014-09-01 12: 01: 02,045;some app logs2;<a><b><c><d></d><d><e>123</e></d><d></d><d></d></c></b></a>;some app logs3
2014-09-01 12: 01: 03,625;some app logs4
2014-09-01 12: 01: 04,045;some app logs5;<a><b><c><d></d><d></d><d></d></c></b></a>;some app logs6
Intermediary form 5 (unclear the placeholder): (same as form 4, as there is no "Q").
output (after substitutions to remove empty tags):
2014-09-01 12: 01: 01.899;some app logs
2014-09-01 12: 01: 02,045;some app logs2;<a><b><c><d><e>123</e></d></c></b></a>;some app logs3
2014-09-01 12: 01: 03,625;some app logs4
2014-09-01 12: 01: 04,045;some app logs5;some app logs6
It was assumed that we should not leave a "some app logs5;;some app logs6" but "some app logs5;some app logs6". If that is not the case, the regular expression can be adjusted.
Issues when using a placeholder
If your placeholder is not unique, then when changing back from the placeholder you corrupt the data. To have a unique placeholder in unknown input data you have to expend a substitution to clear out any current uses of the placeholder and a substitution to revert your clearing of it. To do this you can use a substitution such as: sed -e "s/Q/Qz/g" This results in no possibility that there is any two letter combination starting with Q in the text other than "Qz". You then have a large number of potential unique two-letter placeholders (e.g. "Qa", "Qb", "Qc", "QA", etc.). After you are done using them, you can change back to your text by reversing the substitution: sed -e "s/Qz/Q/g" With multiple unique placeholders available it is possible to use them to represent multiple other strings. With this method you must keep in mind in all operations which you are matching text while using the placeholders that the initial clearance had been performed.
In some instances, if you know the characteristics of your input data you can choose a placeholder which will never occur in that data. This can save you the CPU cost of the two substitution operations and potential additional memory which which clearing out the two character placeholder can cost. However, with log files one of the things that you are looking for is corruption, so using a short placeholder that you are only assuming is not in the data is a bad idea.
If you do not know your exact input by included characters, but you do know some characteristics of the input then you can choose to save those two substitutions by using a placeholder which is only very, very unlikely to exist in your input, but is not guaranteed to be unique. This does introduce some risk. In such case, the more complex the string you use for your placeholder, and the less it resembles something that is a possible input, the lower your risk is that you might select a placeholder which exists in your input.
For this example, the text "lOnG3Rep5LacEN2eV7E9rE4xIST" is very unlikely to exist in the input log file even if it was corrupted.
The following assumes the input text is in a file called 'test.log' for convenience. Also, it assumes that "lOnG3Rep5LacEN2eV7E9rE4xIST" will not exist in the input. What is actually used for the intermediary string can, of course, be anything you want which will be unique:
cat test.log | sed -e "s/<e>123<\/e>/lOnG3Rep5LacEN2eV7E9rE4xIST/g" -e "s/<e>[^<]*<\/e>//g" -e "s/lOnG3Rep5LacEN2eV7E9rE4xIST/<e>123<\/e>/g" -e "s/<[^e]>[^<]*<\/[^e]>;\?//g" -e "s///g" -e "s///g" -e "s///g" -e "s///g" -e "s///g"
Choosing to use a placeholder that you have not guaranteed does not exist in the input data is a risk. You should not do so unless you understand the risk and have chosen to accept it. It is much more reasonable to accept such risk when the output is going to be immediately reviewed by a human who would catch any such problems.
Thanks go to Ed Morton who reminded me that I had gotten into the habit of accepting that risk without enough consideration.
Using a regular expression to define something is not:
Character by character:
Because the pattern "123" is quite simple and exact, it is relatively easy to define a regular expression that matches everything except that string. Note that this becomes much more complex with a more complex pattern that you are attempting to exclude from matching:
cat test.log | sed -e "s/<e>\(\|[^1<][^<]*\|1[^2<][^<]*\|12[^3<][^<]*\)<\/e>//g" -e "s/<[^e]>[^<]*<\/[^e]>;\?//g" -e "s///g" -e "s///g" -e "s///g" -e "s///g" -e "s///g"
This builds up a regular expression with sub-patterns that progressively match everything longer by one character which is not the pattern you desire not to match.
Negative look ahead/look behind:
Many implementations of regular expression syntax provide a negative look-ahead or look-behind operator. These can be used to generate more complex matches of "not this string".
cat test.log | awk '{print "<" $0 }' RS="<" | awk '{print $0 ">"}' RS=">" | sed '/^\s*$/d' | sed '/<[^\/][^>]*>/ {x; s/.*//; x}; {H; g;}; /<[^>/]*>123<\/[^>]*>/ ! d; /<\/[^>]*$/ {p; x; s/.*//; x;}'bash tools: Perl? Python? xpath? sed and awk only? gawk?