0

I am having a bit of trouble removing a part of a string inside a text file with php.

I have a big file and i need to remove part of a line of this file.

The thing is the line is not always the same. It keeps the format but the numbers change. Here is an example:

< /td >This is the line< /td >and this< /td >is < /td >the < /td >part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >

I would like to remove from the < /td > after the word this until the < /td > after Name.

I was wondering if there is anyway of makin php delete backwards from name until the X number occurence from < /td >, something like:

Delete from Name until the 4th appearance of < /td >

Hope someone can help me....

Both answers below do the trick for the text but they dont work for my real code. So here is part of the real code:

... < /td >< /tr >< tr >< td onmouseover="dm.v(this,1);" onmouseout="dm.u(this);" id="mnFE0BBC45_i8" onclick="dm.ItClk(this,\'\');cmn.href(\'indexall.php\',\'\');" class="mn31BBMainMenuItemTD" >< table border="0" cellspacing="0" cellpadding="0" >< tr >< td class="mn31BBIconTD" > < font class="MG_Icons" > &#xe 746;< /font >< /td >< td class="mn31BBTitleTD" id="mnFE0BBC45_i8-tl" >Other_Name< /td >< td class="mn31BBArrowTD" > < /td >< /tr >< /table >< /td >< /tr >< tr >< td onmouseover="dm.v(this,1);" onmouseout="dm.u(this);" id="mnFE0BBC45_i3" onclick="dm.ItClk(this,\'\');cmn.href(\'index.php\',\'\');" class="mn31BBMainMenuItemTD" >< table border="0" cellspacing="0" cellpadding="0" >< tr >< td class="mn31BBIconTD" >< font class="MG_Icons" >&#xe 746;< /font >< /td >< td class="mn31BBTitleTD" id="mnFE0BBC45_i3-tl" >Name< /td > class="mn31BBArrowTD"   < /td > /tr /table < /td >< /tr >< tr >< onmouseover="dm.v(th is,1);" onmouseout="dm.u(th is) ;" id="mnFE0B BC45_i5" oncli ck="dm.ItC lk(t his,\'\');cmn.h ref(\'indexd2.php\',\'\');" class...

This is only a little part of the code (is a Javascript Menu), there are spaces in all the tags (< tr >) to be able to see them....

The text i want to delete is:

< /td >< td class="mn31BBArrowTD" > < /td >< /tr >< /table >< /td >< /tr >< tr >< td onmouseover="dm.v(this,1);" onmouseout="dm.u(this);" id="mnFE0BBC45_i3" onclick="dm.ItClk(this,\'\');cmn.href(\'index.php\',\'\');" class="mn31BBMainMenuItemTD" >< table border="0" cellspacing="0" cellpadding="0" >< tr >< td class="mn31BBIconTD" >< font class="MG_Icons" >&#xe 746;< /font >< /td >< td class="mn31BBTitleTD" id="mnFE0BBC45_i3-tl" >Name

Both mnFE0BBC45_i3-tl and mnFE0BBC45_i3 are not always the same, the number changes depending of the Name.

That is way i want to do: Delete all from Name to the 4th appearence of < /td >

6
  • The code above is invalid HTML (<td> needs an opening and closing tag). Is this intentional? Commented Jan 20, 2015 at 10:22
  • Is that 'Name' word will be there in every text file. and also how long is the text Commented Jan 20, 2015 at 10:23
  • it is intentional... it is only an example... in the real file each < /td > has its corresponding < td > Commented Jan 20, 2015 at 10:24
  • The length of the text varies depending on the name of the variables that are in the middle... That is why i want to delete the text base in the occurence of the word < /td > Commented Jan 20, 2015 at 10:25
  • so you know what is the Name Commented Jan 20, 2015 at 10:26

2 Answers 2

1

Misread the requirement first; here is a corrected version that looks for the appropriate matches before "Name".

Between the other occurences of "<\td>" I am only looking for alphanumeric characters and spaces. It may be necessary to add more to this character class, like dash or underline ([[:alnum:]\ ]+)

<?php
$txt = '< /td >This is the line< /td >and this< /td >is the part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >';

$replacement = preg_replace('/([[:alnum:]\ ]+<\s*\/td\s*>){2,2}Name<\s*\/td\s*>/', '', $txt);
echo "$replacement \n";
?>

Output:

< /td >This is the line< /td >and this< /td > after it keeps going < /td > a loong way < /td >

Edit:

Here is a little Perl script that does what you want:

#!/usr/bin/perl
#

use strict;
use warnings;

open(my $fh, "<", "input.txt")
                   or die "cannot open < input.txt: $!";
my $content = do { local $/ = <$fh> };
close($fh);

my $anchor = ">Name<";
my $position = 0;
# find occurences of anchor in the text
while ( $position = index($content, $anchor, $position) ) {
    if ($position == -1) {
        last;
    }
    print "anchor $anchor is at $position \n";
    # go backwards to the starttag of the anchor (has to be a td element)
    my $starttag_position = rindex($content, "< td", $position);
    print "starttag of anchor is at $starttag_position \n";
    my $start = $starttag_position;
    # look backwards to closing tds
    for (my $i = 0; $i < 4; $i++) {
        $start = rindex($content, "< /td >", $start - 1);
        if ($start == -1) {
            die("less than 3 tds found before $anchor");
        }
    }
    print "first td is at $start \n";
    # delete the text in between
    substr($content, $start, $starttag_position - $start, "");
}

open(my $fout, ">", "input.new")
                   or die "cannot open > input.new: $!";
print $fout $content;
close $fout;
Sign up to request clarification or add additional context in comments.

5 Comments

Can you maybe try with the new code i wrote in the question? I am not able to make it work....
Okay, that's a different story. Could you post more examples of what you want to delete? Otherwise there might be many more such trials.
Can we use the class name "mn31BBArrowTD" somehow?
This is the command i use to create the part of the line: < td class="mn31BBArrowTD" >&nbsp;< /td >< /tr >< /table >< /td >< /tr >< tr >< td onmouseover="dm.v(this,1);" onmouseout="dm.u(this);" id="mnFE0BBC45_i'.$num.'" onclick="dm.ItClk(this,'.$e.''.$e.');cmn.href('.$e.'index'.$name.'.php'.$e.','.$e.''.$e.');" class="mn31BBMainMenuItemTD" >< table border="0" cellspacing="0" cellpadding="0" >< tr >< td class="mn31BBIconTD" >< font class="MG_Icons" >&#xe746;< /font >< /td >< td class="mn31BBTitleTD" id="mnFE0BBC45_i'.$num.'-tl" >'.$name.'< /td > The variabl $name is known to me but the $num not
the variable $e is only a "\".
1

Try this:

Algo: 1) first postion of name; 2) find postion of 3rd td from last 3) then truncate or make substring from that two postion.

$text_string= '< /td >This is the line< /td >and this< /td >is the part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >';
$textLength = strlen($text_string);
$first_pos= strpos($text_string,'Name');
$third_occurance = strrpos($text_string, '< /td >', $first_pos- strlen($text_string) - 3);
$result = substr_replace($text_string, ' ', $third_occurance /2, $textLength-$third_occurance );
var_DUMP($result);

Output:

string(78) "< /td >This is the line< /td >and this keeps going < /td > a loong way < /td >"

1 Comment

Can you maybe try with the new code i wrote in the question? I am not able to make it work....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.