2009/10/30

Windows 7: Disable builtin DHCP server for “Internal network” in Virtual PC

I recently installed Windows 7 and I have been waiting for the final release of XP Mode and Virtual PC which occurred last 22nd of October. I previously had (in Windows Vista and using Virtual PC 2007) a virtual domain, composed of virtual machines such as:

  • server2003: a domain controller and DHCP server, with fixed IP address, connected to the “internal network” of Virtual PC.
  • isa2006: with two interfaces (dual homed), one connected to the physical host network adapter (for connecting to the internet), the other one connected to the “internal network”. Both IPs are manually set.
  • sql2008: the database server for the tests with this virtual domain, IP address assigned dinamically through DHCP.
  • vs2008xp: a Windows XP with Visual Studio 2008, belonging to the domain for testing and developing, IP configured through DHCP (that should be handled by server2003).

With such a testing environment, all traffic that should go to/from the internet passes though isa2006. If isa2006 is not running (for instance) the virtual domain is isolated and the virtual machines can only see themselves (members of the domain).

This was the scenario that I had configured in my old Vista using Virtual PC 2007 and wanted to reuse the .vhd files so that I do not need to rebuild the playground from scratch again.

It was quite simple, I just recreated every single virtual machine using the wizard, and when asked for the hard disk, I selected ‘the existing one’ instead creating an empty one. Then, when the machine was first started, I reinstalled the Virtual Machine Additions (now called Integration Components), and after a couple of restarts everthing seemed to be working… but it only seemed.

Then I realized that sql2008 and vs2008xp (both were configured to use dynamic IPs using DHCP) cannot browse the internet, nor ping any other server in the domain. They were using the “Internal network”, but their IP addresses were not assigned by the DHCP running in server2003, since they were not in the expected range/mask.

After Gooling for a while I learned that Virtual PC has its own builtin DHCP server and it seems it is (incorrectly) enabled for the “Internal network”. Fortunately there is a fix for it:

  1. Turn off or hibernate all your running Virtual Machines.
  2. From the Task manager, kill vpc.exe if it does not exit on its own.
  3. Edit "%localappdata%\microsoft\Windows Virtual PC\options.xml"
  4. Search for the “Internal network” section, and then inside the <dhcp> section, disable it: <enabled type="boolean">false</enabled> and save the file. You can keep a backup of the original xml file just in case.
  5. Turn your VMs and verify everything runs as expected.

2009/10/11

URL Canonicalization with 301 redirects for ASP.NET

There are lots of pages talking about the benefits of canonicalization (c14n for short). It is a common agreement that it is just a set of rules in order to have our pages indexed in the most standardized, simplified and optimal way as possible. This would allow us to recollect our PageRank instead of having it spread among all the possible combinations of writing an URL for a particular page. In this post we will cover some canonicalization cases and their implementations for our IIS server running ASP.NET.

These different cases include:

  • Secure versus non secure versions of a page: Are http://www.example.com and https://www.example.com the same?
  • Upper and lowercase characters in the URL: Are ~/Default.aspx, ~/default.aspx and ~/DeFaUlT.aspx the same page?
  • www versus non-www domain: Do http://example.com and http://www.example.com return the same contents?
  • Parameters in the QueryString: Should ~/page.aspx?a=123&b=987 and ~/page.aspx?b=987&a=123 be considered the same? Are we handling bogus parameters? What happens if someone links us with a parameter that is not expected/used such as ~/page.aspx?useless=33 ?
  • Percent encoding: Do ~/page.aspx?p=d%0Fa and ~/page.aspx?p=d%0fa return the same page?

If your answer is yes in all cases, you must keep on reading. If you only answer yes in some cases, this post will be interesting for you anyway; you could skip those points that do not apply in your scenario by just commenting some lines of code, or modify them to match your needs. Sample VS2008 website project with full VB source code is available for downloading.

In our sample code we will be following these assumptions:

  • We prefer non-secure version over secure version, except for some particular (secure) paths: If we receive an https request from a non-authenticated user for a page that should not be served a secure, we will do a 301 redirect to the same requested URL but without the secure ‘s’.
  • We will prefer lowercase for all the the URLs: If we receive a request that contains any uppercase char (parameter names and their values are not considered), we will do a permanent 301 redirect to the lowercase variant for the URL being requested.
  • www vs. non-www should be handled by creating a new website in IIS for the non-www version and placing there a 301 redirect to the www version. This case is not covered by our code in ASP.NET since it only needs some IIS configuration work.
  • The parameters must be alphabetically ordered: If we receive a request for ~/page.aspx?b=987&a=123, we will do permanent redirect to ~/page.aspx?a=126&b=987, since the alphabetic sort a is before b. Regarding lower and uppercase variants either in the name of the parameter or the value itself, we will consider them as being different pages, in other words, no redirecting will be done if the name of a QueryString is found in upper/mixed/lowercase. The same would apply for the value of those parameters: ~/page.aspx?a=3T, ~/page.aspx?A=3T and ~/page.aspx?a=3t will be considered as different pages, no redirection will be done. In pages that accept parameters extra coding must be done to check that no other than the allowed parameters are used.
  • We will prefer percent encoded characters in their uppercase variant, for that reason %2f for instance will be redirected to %2F whenever they appear in the value of any parameter. This way we follow RFC 3986 that states:
    Although host is case-insensitive, producers and normalizers should use lowercase for registered names and hexadecimal addresses for the sake of uniformity, while only using uppercase letters for percent-encodings.

<link rel=”canonical” …>

Last february 2009 Google announced through their Google Webmaster Central Blog a way for you to explicitly declare your preferred canonical version for every page (see Specify your canonical ). By simply adding a <link> tag inside the <head> section of your pages, you can tell spiders the way you prefer them to index your content, the canonical preferred way. This helps to concentrate the GoogleJuice to that particular canonical URL from any other URL version or URL variation pointing to it in this way (the link rel=canonical way). This very same method was later adopted by Ask.com, Microsoft Live Search and Yahoo!, so it can be considered a de facto standard.

We will adopt this relative new feature in our sample code. Most of the times we will be using permanent 301 redirects, but there might be cases where you may not want to do a redirect and simply return the requested page as is (with no redirection) and return the canonical URL as a hint for Search Engines. Whenever we receive a request for a page, including bogus parameters in the query string, we will handle the request as a normal one but we will discard the useless parameters when calculating the link rel=canonical version of the page.

In particular, if you are using Google Adwords, your landing pages will be hit with an additional parameter called gclid that is used for Adwords auto-tagging. We do not want to handle those requests differently, nor treat them as errors in any way. We will only discard the unknown variables when creating the rel=canonical URL for any request.

Related links.

Internet Information Services IIS optimization
Are upper- and lower-case URLs the same page?
Google Indexing my Home Page as https://. 
http:// and https:// - Duplicate Content? 
SEO advice: url canonicalization

Q: What about https and http versions? I have a site is indexed for https, in place of http. I am sure this too is a form of canonical URIs and how do you suggest we go about it?
A: Google can crawl https just fine, but I might lean toward doing a 301 redirect to the http version (assuming that e.g. the browser doesn’t support cookies, which Googlebot doesn’t).

Specify your canonical

Keywords.

canonicalization, seo, optimization, link, rel, canonical, c14n, asp.net, http vs. https, uppercase vs. lowercase