Creating break lines before desired line length keeping chunk prefix

Question

I want to break lines of code (adding #') when they make part of chunk (line begins with #' (#\x27)), when exceeding 100 cols.

My solution does not work for several chunks:

Example file:

#' chunk line
#' big chunk line to split big chunk line to split big chunk line to split big chunk line to split big chunk line to split
#' ruler90123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
#'
not chunk line do nothing

big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line

#' chunk line
#' big chunk line to split big chunk line to split big chunk line to split big chunk line to split big chunk line to split
#' ruler90123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
#'
not chunk line do nothing

big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line

My try: (works if only one chunk present)

perl -0777 -pe '
  s{#\x27.*#\x27}{                          q{ gets lines from #\x27 to #\x27 (chunk) };
    ($r = $&) =~ s/\n!\n#\x27//g;           q{ removes breaks except followed by #\x27 }; 
    $r =~ s/\G.{0,100}(\s|.$)\K/\n#\x27 /g; q{ before column 100 adds break + #\x27 };
    $r =~ s/#\x27 #\x27/#\x27/g;            q{ removes duplicated #\x27 };
    $r =~ s/\n\n/\n/g;                      q{ removes duplicated breaks };
    $r
  }gse' < chunks.txt

Expected output: (two times this)

#' chunk line
#' big chunk line to split big chunk line to split big chunk line to split big chunk line to split
#' big chunk line to split
#' ruler90123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
#'
not chunk line do nothing

big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line big do nothing line

Workaround in R

psum <- function(...,na.rm=FALSE) {
  rowSums(do.call(cbind,list(...)),na.rm=na.rm)
}    

gblines<-readLines("chunks.txt")

newgblines<-character()
i<-1
j<-1
repeat {
  newgblines[j] <- gblines[i]
    if (grepl("^#\'",newgblines[j] ) & nchar( newgblines[j] ) > 100 ) { # select lines with more than 100 and beginning in #'
      repeat{
        greps<-gregexpr(pattern ="\\s",newgblines[j])[[1]] # get position of spaces
        lenG<-length(greps)
        sums<-psum(-greps , rep(100,lenG ) )               # calculate which space is closest to col. 100
        index <- which(sums>0)
        minSums<- min(sums[index])
        index2<-which(sums==minSums)                       # index of space in greps
        cutpoint<-greps[index2]
        nchar2<-nchar(newgblines[j])                       # number of chars. in line
        strFirst <-substr(newgblines[j],1,cutpoint)        # cut before col. 100
        strSecond<-substr(newgblines[j],cutpoint+1,nchar2) # segmente after col. 100
        newgblines[j]<-strFirst
        j<-j+1
        newgblines[j]<-paste0("#\' ",strSecond)
        if (nchar(strSecond)<=100 ){
          break
        }
      } # 
    } #  if
  i <- i+1
  j <- j+1
  if (i>length(gblines) ){
    break
  }
}
newgblines

You've proposed a solution in R but you've tagged with perl. Do I understand you want a perl solution? — Chris Davies
– Chris Davies, Commented Jun 3, 2020 at 23:25

Ferroao · Accepted Answer · 2020-06-04 14:34:14Z

3

You were almost there.

Do these two changes:

change
```
s{#\x27.*#\x27}{
```
to
```
s{#\x27.*?#\x27$}{
```
and change
```
}gse' < fileName
```
to
```
}mesg' < fileName
```

Basically you were doing a greedy search-n-eplace. Whereas what you need is a block oriented search-n-replace operation.

Using the #' marker which sees a newline to its right is the block end and the .*? regex is the non-greedy version of .*

More details in perl docs

edited Jun 4, 2020 at 14:34

Ferroao

3331 gold badge2 silver badges18 bronze badges

answered Jun 4, 2020 at 8:33

Rakesh Sharma

1,1021 gold badge6 silver badges3 bronze badges

Add a comment |

Ferroao · Accepted Answer · 2022-11-06 12:28:38Z

Alternative universal answer that avoids using chunks ending in #' It is not perfect but works better

perl -0777 -pe '
q{ 4 manual entries };
  $max_length = 100;
  $line_filter_pattern = "#\x27 ";                  
  $prefix_pattern = "#\x27 ";         
  $break_point = " ";                               q{ character in which to break lines };
  
  $linebreak_prefix = "\n$prefix_pattern";          q{ \n is linebreak };
  $lp_length = length($linebreak_prefix);

q{act in lines with prefix pattern };

  s{$line_filter_pattern.*?$}{
    ($r2 = $r = $&);

q{    check if splitting makes changes };
      $r2 =~ s/\G.{0,$max_length}($break_point|.$)\K/$linebreak_prefix/gs;
      if(length($r2) > length($r) + $lp_length) {

q{      add breaks and prefixes in a loop way };
        $r = $r2;
        $r =~ s/$linebreak_prefix$//g;
      }
  $r }gsem' < input.file > output.file

Stack Exchange Network

Creating break lines before desired line length keeping chunk prefix

Workaround in R

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Creating break lines before desired line length keeping chunk prefix

Workaround in R

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions