Ziskind.com > Writing > Harnessing the Internet to Deliver a True One to One Web Experience

Harnessing the Internet to Deliver a True One to One Web Experience

By Ben Ziskind
February 24, 1999

The major advantage of the Internet over other communication mediums is the potential for one-to-one relationships between each customer and a company. As web server technology has matured over the years, it has become possible to customize content for each user based on his or her preferences. In addition many e-commerce sites have started to build databases which can match basic patterns in user choices, and make recommendations to users on the site. However, there are many problems with these technological approaches that limit their effectiveness.

Current websites place a significant burden on the user, forcing them to provide information about their interests, something which in many cases users don't want to do either because it is too inconvenient, or due to concerns over privacy. For example, CNN provides a personalized news service. Creating a profile can require checking off over one hundred check boxes on over 14 different pages.

On the web today, sites are still unable to market to each individual user in a way which truly matches their interests and needs. Sites make broad generalizations based on huge volumes of sales data that don't accurately an individual person. Amazon.com tracks relationships in the buying trends of its users, and makes recommendations based upon that data. While viewing a CD by Tori Amos, the site recommends three other Tori Amos CDs that are often bought by people who buy first CD.

What would be far more useful would be if web sites were able to understand a specific user's interests and provide them with the content that was relevant to them, automatically. Instead of requiring you to provide your interests, a website could learn the type of content that interests you and automatically place that information in more prominent locations on the page. Taking the music example further, if a web site knew where you lived, it could provide you with information when the artist is touring in your area. From an advertising standpoint, this technology could be used for better targeting of advertisements to make them more relevant to each user.

It is possible to harvest much of this information simply by knowing a user's zip code, and watching the pages they visit on a given website. Few companies are attempting to use this information, in real time, to push useful, relevant and personalized content to their viewing audience and establish lasting relationships. This can be accomplished without forcing the user to turn over significant amounts of personal data, and while keeping their identity private.

Technology is emerging among many companies that allow sites to begin to manage and use the data they have been accumulating in log files for years. Much of this technology revolves around assigning a unique identifier to each person as they move around any particular web site. This is often accomplished through assigning a cookie to track users between sessions. Sites often pass data appended on the browser's URL or use session data to track users during each session on a given site. There are limits to these technologies can currently accomplish. The security constraints on browsers prevent one website from gaining information on what the user did on any other site. Obviously, we don't want what we do online known to everyone. Yet the ability to share knowledge of this information can be extremely beneficial to both the individual and the site they are visiting.

When a user first visits a web site, even if the site was able to generate personalized, relevant content, it would take a considerable amount of time clicking around the site before it was able to provide useful information to an individual. However, if web sites were able to share information about their visitors between each other, there could be significant benefits to both parties involved. When a user first enters a news site, that site has no information about their interests. If that same user had just finished reading a lot of articles about California on a travel site, that information could be used to provide relevant content on the news site. The new site could provide California weather information and/or articles relevant for someone traveling to California in a more prominent location on the front page. This in no way infringes on the user's privacy, but provides them with relevant information in a quicker more efficient manner.

In order for a personalization scheme to be truly effective between different sites, the sites which plan on sharing profiles would need to come to an agreement on what interests they want to track. A simple solution is to generate a list of thousands of standardized interests and then categorize each page by how much it is related to each topic. A generic category list might be something similar to the first two or three levels of Yahoo's site categories. As an individual user moves through the pages on a site, the topics of these pages will be registered in a database associated with that user's profile. By comparing the number of pages a user has visited on any given topic to the number of total pages the user has visited it is possible to determine how important that information is to that user.

A lot of the potential with this technology revolves around sites sharing user profiles to more quickly identify the interests of a new user. An architecture and protocol would be needed to transfer this sort of profile information between different websites. Fortunately, many companies already involved with the Internet have realized this. In October of 1998, the Information and Content Exchange (ICE) Protocol was submitted to the W3C to accomplish this exact task as an extension to the current web protocol (HTTP) employing XML. Members of Vignette Corp, Adobe, Sun, Microsoft, Ziff-Davis, CNET and National Semiconductor authored the document, submitted with support from over 80 companies. While this document has yet to be approved by the W3C, companies are already beginning to implement parts of this protocol.

What happens when a web browser first hits the website.

In order for multiple sites to share information, it is necessary for all the sites accurately recognize individuals who are already registered with the network, and to assign a unique ID for new users no matter what site in the network they first visit. In order to accomplish this task it is necessary to have a primary Identification Server that is responsible for assigning new IDs to users and aggregating the information on a user's traffic among all the sites they visit in the network. When a user visits a site which is part of the profile sharing network for the first time, the web site they are visiting realizes that the user doesn't have an ID assigned, and redirects the browser to the Identification server. The Identification Server assigns an ID and sets a cookie to the browser. The Identification Server then redirects the user back to the content server, carrying this unique ID. The content server would take the unique ID and create a cookie that only its site can access. When a user hits a new site that is a member of the network, the process is nearly identical, but the Identification Server is able to recognize the user, and sends that information back with the client instead of creating a new ID.

Now that the each user can be identified as she navigates between different sites which are members of the Identification Network, the next step is to track each user's interests as the move from page to page. In addition to tracking the content each visitor reads, other useful information would include each visitor's zip code and birthdate. These two pieces of information create the opportunity to link each visitor to a wealth of demographic data that allows site creators to better understand their visitors while still keeping the visitor completely anonymous as an individual. In addition, knowing the location of a user allows sites to author content relevant to people living in that area of the country or world.

But how is this profile data site shared with all the other content sites? What happens when a web browser hits the website. When a user hits a web page, the hit is recorded in the standard log file, but is also stored in a user session database and is sent to the Identification Server to be propagated to all the other web servers in the network. In order to increase performance of the system, each Content Site has a user session database that keeps track of what each user is doing on the site for real-time dynamic-page generation. Long-term information on the user is kept in the Content Site's data mart. The page generation engine accesses both of these data sources and builds and each page based on this information. Because the content server is responsible for its own real-time processing, the Identification Server does not need to process every hit from every content site in real time. It therefore makes sense to have a messaging queue to capture the records of hits from each content site and update the Identification Server database. As users are likely to browse on only one content website at a time, the Identification Server can propagate its aggregate information on each user to all the data-marts as necessary.

Needless to say, this type of technology scares a lot of people. Privacy advocates argue that such systems could be abused and be made to harm the individuals they track. However, there are a number of ways such concerns can be addressed. First, the suggested design uses cookies to track and identify users. As a result, users can choose to become aware of what sites are tracking their behavior, and can choose not to let a site track them. Even users who choose to be profiled by this system are safe and anonymous. In it's most basic form, sites do not track any information which could be used to link a web browser to a real world individual. As a result, there is no way information recorded by the system could be sold or provided for anything other than its purpose. Another possibility to further ease the concerns of privacy advocates would be to allow web users to see their own profiles - somewhat similar to getting a credit report. There would also be the potential to allow users to modify their own profiles, if they want, to more accurately tweak their profile. The bottom line is that for most people, time is money, and the advantages of getting the information they want quickly outweighs the minimal loss in privacy when a site begins to accurately track and use information on the pages they visit. Recent studies have shown that most web users are willing to give up profiling and contact information to sites in exchange for free content.

Another question raised is why would any site want to join a network that shares profile data? There are actually many good reasons. Particularly for smaller sites, joining such a network gives them an ability to compete as equals against far larger competitors through the sharing of profile data. In addition, smaller, specialized websites, could join networks where they sell profile data to larger sites. This technology is also becoming necessary for large companies who own multiple web properties and want to leverage their size by combining the underling infrastructure of their sites.

When it comes to information privacy, privacy advocates often jump to conclusions as to how any new technology will transform America into the Big Brother State of 1984. The fact of the matter is no harm is caused to individuals browsing websites using this technology. Profiling technology can give companies an opportunity to create more revenue streams and better relationships with customers while adding value to the customer's experience.