2

I have a big string ... Original text something like that:

'Lorem ipsum dolor sit amet, consectetur adipiscing  <a href="hxxp://www.youtube.com/watch?v=VIDEO_1">hxxp://www.youtube.com/watch?v=VIDEO_1</a>
Sed lacinia purus turpis. Curabitur in nisi urna, vitae aliquet
Vestibulum ante ipsum primis in faucibus orci luctus hxxp://www.youtube.com/watch?v=VIDEO_2</a>'

If you notice there is a video (VIDEO_2) that has a closing < /a> without opening < a> Those problematic videos can be anywhere and any number inside the original text.

I want to remove those unnecessary < /a> How can I detect and delete those?

I am on Delphi XE4. Any help please?

5
  • What is a video 'that has a closing without opening'? Commented May 26, 2013 at 12:34
  • I edited the question, now its more clear. Commented May 26, 2013 at 12:36
  • Do you want to do this in source code editor or at runtime ? Commented May 26, 2013 at 12:42
  • Of course in runtime, that is why I posted "Delphi" Commented May 26, 2013 at 12:44
  • Either use an html parser that will forgive such nonsense, or fix the html. Commented May 26, 2013 at 16:36

2 Answers 2

2

I believe the following code works efficiently:

function RemoveLonelyClosingATags(const S: string): string;
var
  level: integer;
  i: Integer;
  ActualLength: integer;
begin
  level := 0;
  SetLength(result, Length(S));
  ActualLength := 0;
  i := 1;
  while i <= Length(S) do
  begin
    if (S[i] = '<') and (UpperCase(Copy(S, i, 4)) = '</A>') then
    begin
      if Level = 0 then
      begin
        inc(i, 4);
        Continue;
      end
      else
        dec(Level);
    end;

    inc(ActualLength);
    result[ActualLength] := S[i];
    if (S[i] = '<') and (i < Length(S)) and (UpperCase(S[i+1]) = 'A') then
    begin
      inc(Level);
      if Level > 1 then
        raise Exception.Create('Nested A tags detected.');
    end;
    inc(i);

  end;
  SetLength(result, ActualLength);
end;
Sign up to request clarification or add additional context in comments.

Comments

0

A general function:

Function TagStripper(inString: String; beginTag : String; endTag: String): String;
Var
  index : Integer;
  startTag : Integer;
  closeTag : Integer;
Begin
  index := 1;
  While (index > 0) Do
    Begin
      closeTag := PosEx(endTag, inString, index);
      startTag := PosEx(beginTag, inString, index);
      If startTag = 0 Then
        startTag := closeTag;
      index := closeTag;
      If (closeTag <= startTag) And (index > 0) Then
        Delete(instring, closeTag, Length(endTag))
      Else
        If closeTag > 0 Then
          index := index + Length(endTag);
    End;
  Result := inString;
End;                               

Essentially it looks for an opening and closing tag. If the closing tag comes before the opening one it deletes it. The beginning point to search (index) is then rebased from wherever it found the closing tag. beginTag and endTag in your example would be

'<a' and '</a>'.

The result when running it is:

Lorem ipsum dolor sit amet, consectetur adipiscing  <a
 href="hxxp://www.youtube.com/watch?v=VIDEO_1">hxxp://www.youtube.com/watch?v=VIDEO_1</a>
 Sed lacinia purus turpis. Curabitur in nisi urna, vitae aliquet
 Vestibulum ante ipsum primis in faucibus orci luctus
 hxxp://www.youtube.com/watch?v=VIDEO_2

2 Comments

Good, but of course the Delete is non-optimal in terms of performance. (That's why I use the ActualLength pattern.)
Oh there's lots of debate between myself and others on Embarcadero's forum about this subject :-). Let's just say I believe the focus should be on readability first and optimization only take place if performance is observed to be unacceptable. That said, I had no problem understanding how your method worked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.