I have block of text read from a PDF document, using the ItextSharp library(method: GetResultantText())
Consider the text is outlined/formatted in paragraphs:
*"Paragraph One.
Paragraph Two. ...
Paragraph n "*
Is there a way to use the C# StringBuilder object, or perhaps an alternate approach, to store the text while retaining the fomatting?: contains carriage returns and paragraphs etc. and store the value in a varchar field in SQL Server 08.
Ultimately I intend storing the text into a varchar field and would like to retain the line feeds, carriage return [basic fomatting metadata], otherwise the extracted text is a single block of text that isn't readabe when rendered.
I reckon invoking the toString() method on a StringBulder object removes all intermediate formatting characters in a text excecpt the terminating [newlinecharacter].
SimpleTextExtractionStrategy strategy;
//StreamWriter writer = new StreamWriter("c:\\pdfOutput.txt");
for (int i = 1; i <= reader.NumberOfPages; i++)
{
try
{
strategy = parser.ProcessContent(i, new SimpleTextExtractionStrategy());
buffer.AppendLine(strategy.GetResultantText());
//writer.WriteLine(strategy.GetResultantText());
}
catch (IndexOutOfRangeException e) { }
}
pdfText = buffer.ToString();
Console.WriteLine("* End: Text Extraction Process ...");
return pdfText = buffer.ToString();
If I uncomment and output to a text file, the fomatting is retained. However if I save the resulting text into and entity defined as: All i get is a single block of text:
[System.Data.Linq.Mapping.Table(Name = "ReportsText")]
public class ReportsText
{
[Column (IsDbGenerated = true, AutoSync=AutoSync.OnInsert)]
public int ID { get; set; }
[Column(IsPrimaryKey = true, AutoSync = AutoSync.OnInsert)]
public String image { get; set; }
[Column] public String announcement { get; set; }
}
So pdfText is inteded to be stored into the annouuncement field. Cheers.