2

I want to remove some rogue HTML from a DB field that is supposed to contain a simple filename. Example of ok field:

myfile.pdf

Example of not ok field:

myfile2.pdf<input type="hidden" id="gwProxy" />...

Does anyone know a query I can run that can remove the HTML part but leave the filename? i.e. remove everything from the first < character onwards.

Lets assume the field is called myattachment and is defined as a varchar(250) and the table is called mytable in a MySQL database.


Background info (not necessary to read):

The field in our database is supposed to contain filenames however, due to a issue (documented here) some of the fields now contain a filename and some rogue HTML. We have fixed the root issue and now need to fix the corrupt fields. In the past I have replaced text using this kind of query:

UPDATE mytable SET myattachment = replace(myattachment, 'JPG', 'jpg') WHERE myattachment LIKE '%JPG';
2
  • How many rows does the table have? If not over 1million I suggest extracting the id and myattachment to a file. Use a text editor with regular expression to do the replace and import back the column. Commented Sep 9, 2010 at 17:48
  • Thanks for your suggestion. I did consider fixing the issue using a PHP script to read the rows, fix the corruption and write them back. But then I thought there must be a MySQL query I can run that will be quicker? Commented Sep 9, 2010 at 17:56

1 Answer 1

1

This query seems to work ok, can anyone see any issues with it?

UPDATE mytable
   SET myattachment = SUBSTRING_INDEX(myattachment, '<', 1) 
 WHERE `myattachment` LIKE '%<%';

For docs on SUBSTRING_INDEX see the mysql manual page.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.