Forum
Perch Runway Add On
Hi all.
I have a Perch add on that is used to consume XML data via a POST to a URL on my site, this then inserts the data into a collection.
I have some problems with the insertion process generating large amounts of records in the Perch2_collection_index table. The add on was developed by another developer and as a front end developer who dips in and out of traditional back end stuff please bear with me. Caveat to account for anything glaringly obvious and my long winded question :-)
There are 1151 rows in the "collection_items" table for the specific collection we are using ( collection is called jobs) and 789,533 rows in the "collection_index" table for that particular collection. Approx 685 "collection_index" rows per collection item row (assuming an even distribution), which I'm sure is not correct.
Each job that is inserted should have a unique jobID but when I query the "collection_index" table using a unique job id as the indexValue (the related indexKey is "jobID) I get 25 rows returned. Of these 25 - 5 of them have sequential "itemRev" column values of 1-5 respectively. The other 20 have an "itemRev" column value of "0".
All the rows have an "itemID" column value of 121 which represent the collection item ID and they each have a unique indexID.
I'm not sure how this "collection_index" table is used but it seems to have a row for each field in the itemJSON column in "collection_items" so I'm assuming the 20 rows with an "itemRev" value of 0 are duplicates.
I appreciate that 3rd party addons are not supported and that this add on is not using an official API to insert the collection (as there is no such thing) but if anyone has any thoughts it would be greatly appreciated. If anyone has any ideas on the best approach to clean up the database that would be appreciated also.
The code for the add on is as follows:
<?php
// include_once('../../../core/inc/api.php');
// include_once('../../../core/apps/content/runtime.php');
include(PERCH_PATH.'/addons/apps/getunified_job_importer/class/CustomJobs.php');
class ImportApp {
public function __construct(){
//instantiate API
$this->API = new PerchAPI(1.0, 'getunified_job_importer');
$this->Lang = $this->API->get('Lang');
$this->localLog = './logs/job_import.log';
$this->collectionName = 'Jobs';
$this->collectionID = 7; // should do a db call to get this based on name....
$this->collection = new CustomJobs($this->API, $this->collectionID, $this->collectionName);
}
public function local_log($message) {
$message = date('Y-m-d H:i:s')." ".$message."\n";
file_put_contents($this->localLog, $message, FILE_APPEND);
}
public function start($fileData=null) {
// pass a filename, or xml data
if (!isset($fileData) || $fileData == null) {
echo 'Can not start importer as there is no filename or data passed<br>';
die();
}
// if its a file passed, load the data from it.
if (is_file($fileData)) {
$fileData = file_get_contents($fileData);
}
// I have the xml as one big string at this point
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadXML($fileData);
$jobs = $dom->getElementsByTagName('job');
foreach ($jobs as $job) {
$this->_processJob($job);
}
}
private function _processJob($newJob) {
$newJobData = array();
foreach ($newJob->childNodes as $node) {
$newJobData[$node->nodeName] = $node->nodeValue;
}
switch ($newJobData['command']) {
case 'Add':
$this->_addRecord($newJobData);
break;
case 'Delete':
$this->_deleteRecord($newJobData);
break;
}
}
private function _addRecord($data) {
// find or create, if you find it, update.
// echo 'Add new record: '.$data['jobid'].'<br>';
$job = $this->collection->getJob($data['jobid']);
if ($job) {
// echo 'update';
$this->collection->delete($job);
$this->_createRecord($data);
} else {
if ($this->_createRecord($data)) {
// echo 'Job add success';
} else {
// echo 'Job add failed';
}
}
}
private function _deleteRecord($data) {
$job = $this->collection->getJob($data['jobid']);
if ($job) {
$this->collection->delete($job);
}
}
private function _createRecord($data) {
$actualJobType = perch_collection('Job Types', array(
'skip-template' => true,
'filter' => 'jobTypeID',
'match' => 'eq',
'value' => $data['JobTypeID'],
'count' => 1
));
$actualSector = perch_collection('Sectors', array(
'skip-template' => true,
'filter'=>array(
array(
'filter'=>'SectorID',
'match'=>'eq',
'value'=> $data['SectorID'],
),
array(
'filter'=>'SubSectorID',
'match'=>'eq',
'value'=> $data['SubSectorID'],
),
),
'match' => 'or'
));
$realSectorID = 0;
$realSubSectorID = 0;
foreach ($actualSector as $sector) {
if ($data['SectorID'] === $sector['SectorID'] && $sector['SubSectorID'] == 0) {
$realSectorID = (int)$sector['_id'];
}
if ($data['SectorID'] === $sector['SectorID'] && $sector['SubSectorID'] === $data['SubSectorID']) {
$realSubSectorID = (int)$sector['_id'];
}
}
$actualRegion = perch_collection('Regions', array(
'skip-template' => true,
'filter' => 'RegionID',
'match' => 'eq',
'value' => $data['RegionID'],
'count' => 1
));
$createdOK = $this->collection->create(array(
'jobID' => $data['jobid'],
'reference' => $data['Reference'],
'title' => $data['Title'],
'jobSlug' => $this->collection->getSlug($data['Title']),
'jobDesc' => $data['Summary'],
'regionID' => array($actualRegion[0]['_id']),
'sectorID' => array($realSectorID),
'subSectorID' => array($realSubSectorID),
'jobTypeID' => array($actualJobType[0]['_id']),
'salary' => $data['Salary'],
'consultantID' => $data['ConsultantID'],
'consultantTelephone' => $data['ConsultantTelephone'],
'contactEmail' => $data['ContactEmail'],
'expiryDate' => $data['ExpiryDate-year'].'-'.$data['ExpiryDate-month'].'-'.$data['ExpiryDate-day'],
'status' => $data['Status']
));
return $createdOK;
}
}
and the customJobs class is as follows:
class CustomJobs{
private $hasError = false;
//set collection name
protected $collectionName = "";
//collection id
public $collectionID = 0;
//string of searchable words for the job
public $itemSearch = '';
//set jobs array
private $_jobs = array();
//set job
private $_job = array();
function __construct($API, $collectionID, $collectionName) {
//instantiate custom collection item
$this->collectionID = $collectionID;
$this->collectionName = $collectionName;
$this->collectionItems = new PerchContent_CollectionItems($API);
}
public function isError() {
return $this->hasError;
}
public function jobExists($jobID){
return ($this->getJob($jobID)) ? true : false;
}
public function getJob($jobID) {
$this->_job = perch_collection($this->collectionName, array(
'raw' => true,
'skip-template' => true,
'filter' => 'jobID',
'match' => 'eq',
'value' => $jobID,
'count' => 1
));
return (count($this->_job) > 0) ? $this->_job[0] : false;
}
public function delete($job) {
$Item = $this->collectionItems->find_item($this->collectionID, $job['_id']);
$Item->delete();
return true;
}
public function getAll(){
$this->_jobs = perch_collection($this->collectionName, array(
'raw' => true,
'skip-template' => true,
));
$this->count = count($this->_jobs);
}
public function deleteAll(){
//loop through all jobs
foreach ($this->_jobs as $job) {
//check for ID (this can be missing, if the job was not inserted correctly)
if(isset($job['_id'])){ //_id key is from from JSON, not field.
//set details for item
$details = array(
'collectionID' => $this->collectionID,
'itemRowID' => false,
'itemID' => $job['_id']
);
//instantiate collection item, passing the details
$jobItem = new PerchContent_CollectionItem($details);
//delete it
$jobItem->delete();
}
}
//check all again
$this->getAll();
//whats the count?
if($this->count > 0){
//echo "Could not delete everything. ".$this->count." items remaining.";
}
}
public function slugExists($slug){
$this->_job = perch_collection($this->collectionName, array(
'raw' => true,
'skip-template' => true,
'filter' => 'slug',
'match' => 'eq',
'value' => $slug,
'count' => 1,
));
return (count($this->_job) == 0)? false : true;
}
public function getSlug($slug){
$slug = PerchUtil::urlify($slug);
$testingSlug = $slug;
$i = 0;
while($this->slugExists($testingSlug) == true){
$testingSlug = $slug."-".$i;
$i++;
}
return $slug = $testingSlug;
}
public function setSearch($arr){
//set search string
$this->itemSearch = $arr['title']." ".$arr['reference']." ".$arr['jobDesc'];
}
public function create(array $arr){
//set collection item vars
$itemID = (string) $this->collectionItems->get_next_id();
//set collection item id
$arr['_id'] = $itemID;
//set the search strig
$this->setSearch($arr);
//set details with collection id
$details = array('collectionID' => $this->collectionID);
//instantiate collection
$collection = new PerchContent_Collection($details);
//add a new collection item
$newItem = $collection->add_new_item();
if (!$newItem) {
// err
$this->hasError = true;
}
//store the item in an accessible var
$item = $collection->get_items_for_editing($itemID);
//add item row to the details var
$details['itemRowID'] = $item[0]['itemRowID'];
//instantiate collection item, passing the details
$collectionItem = new PerchContent_CollectionItem($details);
//update this collection item
$collectionItem->update(array(
'itemJSON' => PerchUtil::json_safe_encode($arr),
'itemSearch' => strip_tags($this->itemSearch),
'itemUpdatedBy' => 1,
));
//publish the item
$collectionItem->publish();
//index the item
$collectionItem->index();
// return true if hasError is false = no errors
return !$this->hasError;
}
}
Many thanks
Lee
What's the state of your
collection_revisions
table? The indexed items are per revision, not per item. The default undo buffer is 8 revisions, so if you've not changed that, you're looking at 685 / 8 = 86 rows per revision. That's not implausible for larger templates.The purpose of the
collection_index
table is literally to create an index that can be queried with SQL. Your data has a completely flexible schema determined by the template. That doesn't map to a traditional relational database like MySQL, so we store the data as a document (like a document store DB would) and also spin out an index to enable that data to be filtered with traditional SQL. The increased complexity is the trade-off for the flexibility of being able to store any data without first defining it.Thanks for the explanation Drew, I've done some more digging.
I've noticed that if I add a job via the CMS then it creates a timestamped row in the collection items table with an itemRev value of 1 and an itemID but the itemJSON and Search fields are empty. At the same it also creates a second row with an itemRev of 2 and the appropriate itemJSON values. This itemJSON contains an "_id" value in the itemJSON that contains the corresponding itemID value.
However a job inserted via the add on generates only a single row, this row contains the itemJSON but has no "_id" value prepended.
If I go in to the CMS and save that inserted job it generates another collection item row with an itemRev value of 2 (to be expected) and has the "_id" in the itemJSON.
The cde for the add on can be found above but I've walked through it and this seems to be the method to create the collection item. Is there anything glaringly obvious as to why this would not set the _id value in itemJSON and does the lack of this value have the potential to cause any issues?
Further to this the 3rd party API requirements (that generates the XML for this add on to consume) states that "The interface must be multi-threaded / thread safe as we may make concurrent transactions" . Forgive my ignorance on this I'm at the limit of my skills here but does this represent any issue for Runway when inserting collection items via an add on in the manner described above.
Many thanks
I don't think the
_id
value is necessarily an issue. What issue are you trying to debug here?The reason the control panel creates an initial empty revision is that this is needed to load the empty edit view. It's a workflow issue, nothing more.
Generally web apps are fine with concurrency - there should be no issue here.
The problem seems to be that I have a bunch of duplicate rows in the collection index table and I'm not sure how they got there.
For example if I run the following query:
I get 1406 rows. Generating a job via the CMS or indeed via the addon during my testing generates 39 rows per revision (1 per template field I think) and the revision numbers do not seem to be zero index so I'm a little bit lost as to where these records came from. There are currently 1549589 rows in the collection index table a whole which seems excessive to me?
Is there anyway to turn of the revision buffer or reduce the default?
Do you have any suggestions for the best approach to clean the database of duplicate rows in the collection index?
Many thanks
Hi Drew,
Further to the issue of a large volume of records in the DB - which I have cleaned up manually. The add on is seems to be inserting collection items correctly but not setting the item search when inserting the collection item, is this likely to cause an issue outside of the item not being searchable within the CMS.
The method in the CustomJobs class for this is below:
This is called via another class called ImportApp like so :
No, that's not likely to create an issue.