Posts tagged Meta

Hacking Wordpress, Day Two

Thus far, my move to Wordpress has been an adventure.  Here’s a few lessons learned.

First off, I was very excited about the features of Wordpress.  I was really excited, most specifically, about the API, and about the rich text WYSIWYG of the backend.  I’ve done a lot of work on Small Axe’s backend, but it’s still nothing compared to Wordpress.

When I imported my stuff, it worked well, but the “slugs” — or URL-friendly post titles — did not convert properly.  They converted as Wordpress friendly, properly escaped slugs.  The problem was, my slugs needed to stay intact, because I didn’t want all old links to break.

Understanding the way Wordpress functions is really tough for a WP newbie, because the code is so spread out, yet compact, voluminous, yet digestible. Start with index.php, onto wp-blog-header.php, into wp-settings.php, and then you find the massive list of files in the wp-includes directory.  You’ll dig all over trying to find files to find includes in includes in includes. I finally found a great article that tries to explain the Wordpress slug architecture. It’s fairly complex. Much of it lives in/wp-includes/query.php. However, my problem was very specific.

Many of my post slugs had periods in them. The period does not interfere with the URL, but Wordpress doesn’t like them, and somewhere in the massive beast. So I had to find the page that “gets” posts. Lo and behold, there is a function called “get_posts” that lives in /wp-includes/query.php. I kept poking around. Like anyone who keeps digging, eventually, you’ll find yourself in wp-includes/formatting.php. And there it is.

Slug posts get sanitized – like everything, virtually all input is strictly sanitized – by a function called sanitize_title_with_dashes(). This function generates the slug. In order to include dots in your slug titles, just replace lines 366 and 267 (on Wordpress 2.6.0) with this:

$title = preg_replace('/&+?;/', '', $title); // kill entities
$title = preg_replace('/[^%a-z0-9 _.-]/', '', $title);

Then your slug titles will not strip periods. Of course, I don’t recommend you actually use periods, I just wanted them to work when fetching old posts created before I knew any better.

After that adventure, I have to tell you, I’m really loving Wordpress. There are some incredible plugins that have done some amazing functionality extension for me. So far, so good.

Enter firsttube.com 10

Here it is, the tenth and largest ever revision of firsttube.com.  After a long decision-making process, I decided to migrate to Wordpress.  There are a number of reasons why I did this, but here are a few.

First of all, Wordpress is  actively developed… a lot.  Small Axe is a lot of fun, but it’s a hobby, and although one I enjoy, it was a lot of work.  On top of that, certain features were a challenge for me that I simply never had enough time to implement, such as a quality API.  With Wordpress, I can post from my iPhone.  Or Flickr.  Or Digg.  Etc.   

I was also able to preserve my permalink structure with this code. I’ve developed a decent standing in Google, one I’m fearing I will have destroyed with this migration, but it was important that my links be maintained.  

Themes with Wordpress are a snap.  Honestly, changing the look and feel is cake once it’s uploaded.  

The migration to Wordpress was PAINFUL! Importing the feeds was easy enough, it can be done via RSS, so I quickly edited my RSS script to output my whole blog.  Boom! Done. But getting comments in was a lot of work.  I think I’ve covered all of them though, and that’s exciting.  

One problem is that my “tags” came over as “categories” and my “topics” were entirely discarded.  Had I been a little smarter about Wordpress, I could easily have fixed that. But that part doesn’t really bother me, so it will stay reversed.  Dammit.

I’m still considering whether I want to host my own comments or push them into Disqus. I really like Disqus, but I’m not entirely certain about pushing my blog content to a third party service.

Overall, I’m pretty happy with this iteration and the theme I’ve adopted.  The HTML needs some work, there are a few pages/site features I’ve still not properly migrated, but all in all, I’m feeling good about this move.  It gives me more time to focus on other PHP projects, Wordpress add-ons, and OSNews.   I’ll keep you all posted, and now that I’ve upgraded to a new codebase, I promise to update more often.  In the meantime, enjoy firsttube.com 10.

A Cleaner, Simpler firsttube.com

I’ve been pretty liberal in completely redesigning my website for some time now. I built this site sometime in August of 2000, using my own HTML. All dynamics were achieved… well… faked… via re-uploading static HTML files. Version 2.0, a major overhaul, arrived shortly thereafter, and version 3.0 completely migrated to PHP as the base. The site thrived as a Phish music archive and when I moved away from that, I retired what was then version 4.0, and several versions followed until this one, version 9. But alas, shortly, I will begin the design of firsttube.com version 10, and it will be a chore, as I intend to modify most of the tables in my underlying database. Many features I wish I had implemented long ago – such as subscribing to threads and letting users enter a website, thereby not exposing their email address – are long overdue and virtually omnipresent in other weblogs.

I’ve even tossed around using another blog engine and just migrating my data, but then, where would I play?

My primary goal, though, for firsttube.com 10, will be a radically simpler and more attractive interface. I like some Web 2.0 mainstays. Expect larger text, brighter colors, AJAX where appropriate, and simplicity. My new comments page, which I’ve been playing with, is already stripped down and already kind of overwhelming. So back to the drawing board, it appears. Stay tuned for more updates than necessary.

Offline: The Silly Script Disaster

I have several websites. The way my web host has them set up, like many hosts who use cPanel, is that one site is a “master” and the others essentially exist as directories within that site. My master site is smallaxesolutions.com, which is the “company” under which I sometimes do my web design and network support business.

One of the things I (used to) do as Small Axe Solutions was publish the core code of the engine that powers firsttube.com, Small Axe. Small Axe code was built up as 0.1, then 0.2, then 0.3. At that point, I had added several features to firsttube.com that I had yet to merge upstream into Small Axe. So, I created a build system so I could slowly integrate the changes. In short, it worked like this: I had a directory called “build_source” which contained my current code. Of course, it had all kinds of problems out of the box, like the config files which pointed to nonsensical location like /path/to/your/blog/. It had no valid database connection info. The flatfiles were unwritable. So, in short, the code was (usually) solid, but PHP couldn’t compile it.

Meanwhile, another directory called “demo” was waiting silently.

Lastly, a third directory, outside the web root, called “static” was sitting with pre-built config files, db connection files, and some other stuff.

Then it was just a matter of a simple shell script. The script did the following: it deleted everything in the “demo” directory. Then it copied all of the files in the “build_source” directory into the demo directory. It deleted the config file and overwrote it with a copy from the “static” directory. Same for the db connection and a few other files. It left the demo directory as a live, fully functional build of the current code. Then it zipped everything in the “build_source” directory and put it into my downloads section. It ran this script every 30 minutes for probably 2 years now. I only chose 30 minutes because it made sense from a development standpoint to see the updates quickly. I stopped working on that version some time ago, but never got around to updating or changing the script.

Fast forward to a few weeks ago, I was cleaning out a bunch of old directories. Within 5 minutes, EVERYTHING was gone: my mail, *all* of my sites, my temp files, everything in my home directory that wasn’t a hidden file preceded with a dot. I didn’t realize this for several hours, but I then I restored from a backup and within 45 minutes, everything was gone again! Oh noes!

I immediately begin researching security and disabling all of my upload scripts. Something is wrong, I thought. I searched high and low. But, as you guessed, I didn’t find anything wrong, because there was nothing wrong. In my cleanup, as you may have gussed by now, I decided to delete the “demo” folder. The first line of my shell script is “cd /home/adam/public_html/build_source.” Then second, scary line, is “rm -rf *“. Since there was no “build_source” folder, the first line flat out failed, leaving the script in /home/adam. Then, unfortunately, it ran rm -rf * in the root of my home directory. Killer!

It took my some time to swallow my own stupidity. All I had to do was comment out the cron job to prevent this disaster. But alas, I dropped the ball. We’re back online now, and a little smarter.

A Little About Code Names

Throughout the internet, you’ll find a slew of geeks who refer to their projects by “code name.” Realistically, this isn’t GI Joe, so there’s no real reason to need a code name for your projects, right? I’m here to argue that.

Since I’m involved in several web endeavors, there is always a lot of development code on my computers. When I start something like a firsttube.com redesign or something much larger, like an OSNews redesign, it doesn’t make sense to have a hundred folders called “osnewsv4″ or somesuch littered about. I used to date the folders, but osnewsv4-tuesday doesn’t help. And something like osnewsv4-20071017 doesn’t help much either.

Now it gets even more complex: what if I build something and then decide to approach it differently? How will I know which folder is the one that contains relevant code? Enter codenames!

When I knew I was going to build a brand spankin’ new version of OSNews, I knew it would eventually be called version 4, so it made no sense to start calling the first code off my fingers “v4.” As it turns out, there were actually almost 10 versions of “OSNews version 4″ before we accepted a codebase. The first ones were much different in both look and feel and code. So, for my own organizational purposes, I use code names. All that matters is which code base eventually gets promoted to the “version 4″ title.

So, here a list of the codenames I’ve used on my projects in the past, going back as far as I can remember:

I used to maintain an open source weblog called Flip, which later become Small Axe. Although Flip 2.0 may have had a codename, I can’t remember or find any reference to it. Flip 2.1 was called Lobster. Flip 2.2 was called Shark, although I never released that code, largely because before I finished it, I released Flip 3.0, Turtle. Flip 3.1 was to be called Jackrabbit, but again, I never released it. Flip 4.0 earned the codename Blueberry, but it was merged into the first release of Small Axe. We’ll get back to Small Axe in a minute. The nicknames of Flip were entirely random, they meant nothing, except that I wanted the 2.x and 3.x family to be animals, and for 4.x, a complete rewrite, I decided to use fruits. That never materialized.

A large part of why verison of Flip went entirely unreleased is because the app became big and tough to handle. As a result, I stripped out the core of it and released “Flip Lite,” which was called “Red Squirrel.” There was a running joke in college about a “blue raccoon,” so “red squirrel” was a silent tribute. When Flip Lite 2 came about, it was called “Rivet Boy.” Here’s why I called it “rivet boy”.

Small Axe Weblog took over where Flip left off – I really need to get around to updating it, since I’ve probably worked up to v 0.7 by now! – but the roadmap, along with the codenames, are listed here. They are codenamed after the japanese Iron Chefs and their popular guests.

firsttube.com itself had codenames, some of the time. firsttube.com 3 was “Milky”. 3.1 was Crossbow because it was built to be cross-platform. 3.2 was Scoop Face, because it was inspired by Scoop. 3.3 was “Semi-Scoop”, much for the same reasons. 3.3.1 was “Flip”, because it was the first version to use code from the Flip project. 4.0 was lazily called “Lobster” because it was running Flip 2.1. 5.0 was “Linkfarm”, because it was – for the few weeks it lived – a link farm. 6.0 may or may not have actually had a codename when I built it, but it was listed in one directory as, “Wikitube”, because it ran phpwiki software. I merged it and my weblog for version 7.0, which, along with 8.0, didn’t earn codenames. The recently released firsttube.com 9.0 was called “Chalkboard,” because at one point, I thought the header looked like a chalkboard. Obviously, it doesn’t anymore.

On to OSNews: Again, these codenames are mine and mine only, they are neither “official,” nor even known the rest of the staff, as it was only as I was developing code that I used the codenames. The now defunct OSNews Meta Blog is actually Small Axe, so it was in a folder called “Small Axe.” We renamed it “meta blog” literally days before making it live.

The OSNews Staff Blog used to be called ftblogroller, and I actually still have the very first working version on my company’s intranet test server. The funny thing is, I chronicled it long ago on firsttube.com. That was the engine of the OSNews Staff Blog. It also powers the OSGalaxy site, although there I refer to it as “Galaxy,” I never actually got around to packaging it.

Jobs.OSNews, an experiment that everyone liked but nobody used, was called Meadow, only because it was green.

OSNews v4 had a few codenames on my computer. “NEW” was one of them, as was “TCO,” which was an acronym for “three column OSNews.” The one that eventually earned the title version 4 was Blueprint, because I threw everything away and literally started from scratch. Even the queries that fetch data were rewritten to be most efficient.

Two projects in the words: “Timber” is the codename of a module that does OSNews native polling. Why Timber? A poll takes a tally, tally like tally ho, like timber ho!. I didn’t say they made sense or were funny, I just said I used them.

Another project that has had several lives already is the iPhone optimized OSNews site. I have gone through several versions of this code as well. Recently, I tossed aside “iui-osnews” and “knox” to really work on project “McBragg.” Commander McBragg was the general in the Underdog cartoons. I seemed to remember him going on several safaris, so I stole his name for my code. McBragg’s javascript framework and CSS is not finished yet, but the underlying PHP appears to be sound, so I expect to finish that within the next few weeks.

As you can see, having codenames can help a develper understand what code he’s looking at. It would not help me at all to see a folder called “firsttube.com-20060722″ because I wouldn’t know what version of firsttube.com or whether the code was even used on the live site. But certainly, if I saw a subfolder in my OSNews directory called “mcbragg,” I’d know it has relevent code. I think there’s something to be said for categorizing your code that way, plus, it’s kinda cool to have codenames. Yeah, I said it.

Trackback Spam Gateway

It’s over. My referrer experiment is over… at least, in its current form. Today, I roll out firsttube.com referrer gateway version 1.0. That makes it sound fancy, but it’s not. Basically, it’s PHP to prevent trackback spam.

Traffic at firsttube.com has grown steadily, for some reason, and the logs reveal it: we get a TON of traffic from search engines, and the most popular terms are surprising – sensitive readers beware – here are the terms that most frequently drive people here:

cumtube, red-tube, uporn, adult youtube, milf, gay tube, tube 8 and many more equally odd terms.

You know why? Because, in a shrewd move that search engines seem to love, I display links back to my referrers, thinking they are trackbacks. But when it’s not from Google, Yahoo, Live.com, or OSNews, it’s most often spam. Why? Because not only are we using the name “tube” in our title, but with each erroneous entry, we tell the search engine it’s a good thing by back-linking to that search. In short, I’m perpetuating the problem. As a result, dozens of spammers have begun issuing basic GET requests in the hundreds placing their sites in my referrer lists.

Some time ago, I began the battle by adding rel=”nofollow” to all outgoing links not added via the admin section. But alas, that wasn’t good enough, the spammer didn’t care, so I implemented a pre-check, whereby referrers are, via regular expressions, matched against a list of known crap. As of today, there are 36 terms that I actively filter. In time, this will be performance intensive, if it isn’t already.

Thus, a gateway. Now, *all* referring traffic goes into a temp table, and each entry must be approved. I wrote a nice tool to batch import, batch delete, or even approve based on certain filters, such as domain or term. As it matures and I get an idea of time, I will “whitelist” certain domains that can immediately post to the referrer table. In the meantime, I need to decide if I want to filter referrers with obscene unrelated terms or just leave them and let the magic run its course; after all, these are not “spam,” they are simply organic mistakes. An argument could be made that it’s interesting, and therefore, mostly the reason to post referrers, to see what terms and sites around the internet drive traffic to a site.

Anyway, spammers, take note: I gotcher number! Stop referrer spamming me! That means you , you stupid lyrics sites!

HAXX0RED

So, I updated firsttube.com to “revision 9″ on Friday, and when I went to show someone last night, imagine my surprise when I found the whole thing hosed. The site was missing entire chunks – random, non-sequential directories, missing entirely.

I’ll spare you the details: I got hacked. Someone either brute forced their way into the admin site (which is now pretty locked down, until I figure this all out) or brute forced into SSH and uploaded several malicious PHP scripts. They are scary, I actually have them intact in a backup from a few days ago. How much has been revealed? My MySQL passwords? It’s impossible to tell. Virtually everything will need scrubbing.

In the meantime, excuse any wonkiness until all is repaired. The good news is this finally forces me to finish work on the new administrative area I’ve been playing with.

firsttube.com revision 9

Here it is: firsttube.com, once again, entirely redesigned. Unlike previous redesigns, this one is interface only, there are no backend modifications.

Enjoy.

Trackback Spam, Again

Once again, I am dealing with trackback spam, aka referrer spam. Since firsttube.com records the pages that refer hits to us, I’ve had to deal with jerks who issue HTTP requests so that they get a link back. Too bad they don’t realize that every referrer gets a rel=”nofollow” attribute (more here).

So, I had to issue these SQL statements to the database today:

DELETE FROM user_agent_table
WHERE (referrer LIKE 'http://mp3%' OR referrer LIKE '%mp3.com%')
 
DELETE FROM user_agent_table
WHERE referrer LIKE '%musicforum.org%'

Musicforum.org has some asshole posting all sorts of links that pass a GET variable with a firsttube.com URL in it, which appears to do nothing other than ping the page. So, effective immediately, we run a regex validator on referrers and will be doing a more frequent clean up.

Hear that spammers? Take your crap elsewhere.

Changes to firsttube.com

I am very excited about how portable Small Axe, the engine that powers firsttube.com, has become. I am going to be upgrading the site in the next few weeks. You won’t see a ton of new stuff, but it will be much more powerful and configurable for me.

One place I have made some changes is in the RSS and Atom feeds. Although I advertise my feeds at feeburner, our source feeds are at firsttube.com/feed and firsttube.com/feed/atom. They have received some stylesheet love and are much more readable by the human eye.