Ecommerce Developer
 
 

APIs & Plug-ins

Importing Mechanical Turk Results into Magento

 

One of the most surprising challenges in retail ecommerce is how difficult it is to generate good, customer-facing product descriptions. But savvy web developers can use Amazon's Mechanical Turk service and Magento's API to rapidly import product information.

Product information is a challenge for a number of reasons.

First, many — if not most — manufacturers don't supply product descriptions suitable for publication on a website. As an example, a denim vest from a work wear maker might have an official description of V01BRN DUCK VST XLT. While this may describe the product, it really isn't something that you would want to show to customers.

Next, even if a manufacturer does supply a good, ecommerce product description, it is supplying that description to every business that sells its products, which can really hurt a business in terms of search engine indexing.

While it may not be the best solution for complex products, Amazon's Mechanical Turk may be able to help with the writing process. The service assigns quick tasks to a great number of workers. Recently, I was able to generate a few hundred usable product descriptions in just a couple of hours on the Turk.

Mechanical Turk Results CSV

In the Turk assignments — called "hits" — I asked workers to create a product title, a product description, and a set of bullets outlining the product's features or specifications.

With the hits complete, Mechanical Turk provided me with a comma-separated values (CSV) file containing all of the information I had provided about each product — things like SKU, UPC, manufacturer, and similar — and all three requested values: product title, a product description, bullet points. The requested values were combined as a single value separated with a pipe “|” character.

As mentioned above, I had a few hundred of these, and my intention was to add a few hundred new products each week, so there is no way that I wanted to upload this manually. The Magento import feature could have been used to ingest the CSV, but it would have taken a lot of field mapping, so instead I decided to write a script to manage the import via Magento's API.

The first thing this script would need to do was open the CSV file and arrange each value in an array. The script was built to accept the file path to the CSV as a GET value in a URL, so http://somedomain.com/script.php?file=pathtofile.

if( isset($_GET['file']) ) {
	
	/* set variables */
	$file_path = $_GET['file'];

	
	/* open the CSV file for reading */
	$handle = fopen($file_path, "r");
	$row_num = 1;
	$row_array = array();
	$row_keys = array();
	
	/* create an array from each row */
	if( $handle !== FALSE ){
		while (($row = fgetcsv($handle, 2000, ",")) !== FALSE) {
			if( $row_num == 1){
				$row_keys[] = $row;
			}else{
				$temp_array = array_combine( $row_keys[0], $row);
				
				/* parse the Answer.comment column spliting on the | character */
				$boom = explode("|", $temp_array['Answer.comment']);
				$temp_array['Answer.comment'] = $boom;
				
				
				$row_array[] = $temp_array;
			}
			$row_num++;
		}
				
	

Notice that the script first checks for a file path. If that path is present, it is opened for reading.

$handle = fopen($file_path, "r");

The script creates a couple of arrays and a variable that will serve as a counter.

$row_num = 1;
$row_array = array();
$row_keys = array();

A While loop is used to add the information from the CSV file to an array. The first row of the CSV file contains column headers like UPC, SKU, and Answer.comment, which is the column containing the Turk-generated product title, description, and bullet points. The values from this row are used as keys for the associative array.

while (($row = fgetcsv($handle, 2000, ",")) !== FALSE) {
	if( $row_num == 1){
		$row_keys[] = $row;
	}else{
		$temp_array = array_combine( $row_keys[0], $row);
				
		/* parse the Answer.comment column spliting on the | character */
		$boom = explode("|", $temp_array['Answer.comment']);
		$temp_array['Answer.comment'] = $boom;
		
				
		$row_array[] = $temp_array;
	}
	$row_num++;
}

As mentioned above, the product title, description, and bullet points are stored in a single value with the key Answer.comment. The values are separated by a pipe character, and I used PHP's explode function to slit these into an array.

$boom = explode("|", $temp_array['Answer.comment']);
$temp_array['Answer.comment'] = $boom;

I now have all of the Turk data stored in an array named $row_array, including the Answer.comment array for each row of data.

Placing the Products in Magento

To connect to the Magento API, you need to create web services users and roles in the Magento administration panel "Settings/Web Services." Once these have been created and proper permissions set, it is possible to connect to the API via a PHP script.

/* connect to Magento */
$proxy = new SoapClient('http://somedomain.com/index.php/api/soap/?wsdl');
$sessionId = $proxy->login('some_username','some_password');	

If you are having issues with connecting to the API, you may wish to check out the Magento API documentation.

Before the script starts loading products and product descriptions into Magento, I wanted it to check to make sure that the product (SKU) did not already exist in Magento. This was possible since the site in question has many thousands of products that are loaded from a number of sources.

/* test to learn if the product is already in Magento */
foreach( $row_array as $product_array){
	$skuNo = $product_array["Input.sku"];
	$result = $proxy->call($sessionId, 'product_stock.list', array('sku'=>$skuNo));
		
	if( !isset($result[0]["product_id"]) ){
		/* the product is not in Magento, create it */

The call method requires the session id that was created when the script authenticated with the Magento API, a method to call, and an array specifying what should be returned. In this case, the script is asking for a list of product SKU values. If the SKU is in Magento, it will return a result. If it is not in Magento, nothing is returned.

If the product does not yet exist in Magento, the script builds an array, adding some required attribute fields and some custom attribute fields to the data retrieved from the Turk's CSV file.

$newProductData = array(
	'name' => $product_array['Answer.comment'][0],
	'websites' => array(1),
	'short_description' => $product_array['Answer.comment'][2],
	'description' => $product_array['Answer.comment'][1],
	'status' => 1,
	'weight' => 1,
	'tax_class_id' => 1,
	'categories' => array(274),
	'price' => 0.00,
	'brand' => $product_array['Input.manufacturer'],
	'manufacturer' => 'KINCO',
	'upc' => $product_array['Input.upc'],
	'part_or_style' => $product_array['Input.partno'],
	'vendor_description' => $product_array['Input.name'],
	'meta_description' => $product_array['Answer.comment'][1]
				
);

The Magento API's product.create method used via call(), creates the new product in Magento. The method takes three parameters, including the session id, the product.create method name, and an array of information about the product, including its type — simple in the example — a product attribute set id number, the products SKU, and the array of product attributes.

$proxy->call($sessionId, 'product.create', array('simple', 4, $skuNo, $newProductData ));

Once this code runs, the new products with their Mechanical Turk-generated title, description, and bullet points are added to Magento.

Related Articles

2 Comments

Rss-sm