Software

This section holds some of the software I’ve written in various languages (mostly PHP and Javascript). All software is licensed under the GPL.

Word HTML Cleaner 1.1

While developing various websites I have needed to put large amounts of text from word documents into a webpage, but converting it by hand would take too long, and the HTML word outputs is just plain awful. So I wrote this little javascript to strip all the junk tags and attributes from word html, and to convert its plain text lists into proper html lists.
View/Download

XHR Chat

This is a demonstration type chat script I wrote when experimenting with xmlHttpRequest. It never refreshes, keeps an infinite post log, and checks for new posts once a second. It will degrade gracefully if the browser doesn’t support xmlHttpRequest, just so long it still has javascript enabled.
View - Download

9 Responses »

  1. Patrick - November 20th, 2006 at 2:40 pm

    Man, thank you thank you thank you!

    the Word Cleaner rocks my world!

  2. Greg - January 11th, 2007 at 2:33 pm

    I love the Word HTML Cleaner! It’s an absolutely wonderful script you’ve written. It’s exactly what I’ve been looking for… for many months in fact!

    I have only one suggestion, is it possible for the script to retain all empty tags?? The script seems to remove any and all empty cells and so all subsequent cells shift over in place of the missing cell(s).

    If this is something that could be worked into the script, or if you have any suggestions on how to implement it, I’d be forever grateful!!

    Thanks again for having written this!!

  3. Connor - January 12th, 2007 at 2:13 pm

    Glad you find it so useful Greg!

    The problem you mentioned has now been fixed.

  4. Greg - February 2nd, 2007 at 12:26 pm

    Thank you for applying the fix so quickly! Much appreciated. So far so good! Though, the only other thing, if I could make one more suggestion? Can the script be setup so that it’s case insensitive. As of now, if the HTML code has any uppercase tags or attributes, all of the tags are removed. Now, I’m all for clean code, don’t get me wrong, but that’s just too clean :) Anyway, although I’ve got a script that will convert all tags to lowercase, all of the attributes are left in uppercase and will be removed if not specified in the arrays(as uppercase). If case insensitivity could be integrated into your current script, that would be terrific!! Thank you

  5. Connor - March 25th, 2007 at 8:39 pm

    It took me a really long time, but I’ve implemented case insensitivity now. It will convert all tags and attribute names to lower case, but it won’t remove them just because they’re upper case.

  6. Matthew - May 26th, 2007 at 4:48 pm

    Just wanted to let you know that I use the html cleaner for documents on my website. Thanks!

  7. Rekcor - June 18th, 2007 at 8:04 am

    Thank you for your script, it is great!

    I had a small problem however: not all of Word’s special characters are converted, because they are in fact just normal letters, but in Word’s Symbol font. E.g. a for alpha.

    These can be replaced using regular expressions.

  8. Andrys - December 3rd, 2007 at 5:22 pm

    I’ve spent a long time looking for a WORD html cleaner and all of them, including Tidy, gave me a universe of woes, including bloated code and strange interpretations, making me just clean it up manually instead whenever I received a huge zillion-worded WORD (Microsoft tries to live up to that name) html doc to post.

    Yours did exactly what I wanted RIGHT AWAY. I thought something must be wrong, but nope, you’d really cleaned all that gunk while leaving the meat intact just as laid out. MANY thanks !

    (Tidy refused to even try, by the way.)

  9. Greg - March 4th, 2008 at 11:06 am

    Connor, this code is proving to be very helpful time and time again. I thank you for development of this script and providing it through your website. In the version I’m currently using I’ve included some additional coding that allows for an interface with various options. In my desire to go a little deeper with these options I was wondering at what point in the script could a function be inserted which would allow me to manipulate the various arrays and variables. I’ve tried inserting one at various points in the script, but I wasn’t able to position it at the correct location. I hope what I’m requesting makes sense. If you have any suggestions, I’d greatly appreciate them! Thank you!

Leave a Reply

Comments will be styled using Markdown.