Forum

Thread tagged as: Question, Configuration

Duplicate database content

On the latest 2 Perch website i've build I noticed a lot of duplicate database content in perch2_content_index and perch2_content_items. Both run Perch 2.7.4.

I understand that revisions are saved to the perch2_content_items table and you can limit those with the PERCH_UNDO_BUFFER config constant.

What i don't understand is the massive amount of data in the perch2_content_index table. For example, on a single-page perch mini site i've built this table has almost 6000 rows with most rows being duplicate content.

I've found a topic* having the same 'issue' and some info on a non documented 'no-index="true"' attribute, which can limit the amount of duplicate content in the perch2_content_index table. But i still can't understand what it is indexing and why.

I'm not using search on the last two sites so i've deselected all 'Include in search results' setting on every region, still there's a lot of duplicate content in perch2_content_index.

So basically my questions are: 1. Why is there so much duplicate content in perch2_content_index? 2. How can it be prevented and where does the no-index="true" attribute work (i.e. is it specifically for repeaters?)

Benjamin Verkleij

Benjamin Verkleij 0 points

  • 6 years ago
Drew McLellan

Drew McLellan 2638 points
Perch Support

You've not said what problem you're encountering with you site - how does the problem manifest itself?

Hi Drew,

I'm sorry and maybe it's me but i find your answer a bit strange, i explain a situation, tag it with Question and Configuration, ask two direct questions, and you reply by asking me what the problem is?

There's no immediate problem i just wish to find out why a one-page site, with 5 regions, needs 5892 rows in the perch2_content_index table because none of the regions should be indexed for search, and the content itself lives in perch2_content_items. And my second question was to find out if i can prevent that table from growing that fast with the no-index attribute which is mentioned in the linked topic.

Drew McLellan

Drew McLellan 2638 points
Perch Support

Ok, sorry, I was worried I was missing a conclusion like "... and my site has stopped working!". If not, that's great.

The index table has nothing to do with search, so turning off search indexing won't have any impact. It's for filtering - e.g. show all products priced between X and Y. Perch is schemaless when it comes to your content - it's much more like a document store database. To make that content fit into a relational database system like MySQL, we have to do some tricks like creating our own indexes of where everything is.

A table of 6,000 items is nothing to worry about at all. It's tiny in terms of MySQL's capability. Once you're up to 6 million rows you might want to give it some thought, but even then isn't necessarily a problem. The content index won't grow to that size though. The growth characteristic is that most of the growth is front-loaded. It'll reach a certain size quite quickly, but not grow rapidly after that.

I wouldn't worry about trying to limit the size of the index unless it becomes a problem for some reason.

Thanks for elaborating on the issue, i know 6000 rows is not allot in terms of mySQL's capability, but in relation to the very small site it felt like allot of data for so little content. But now i see that the indexing also remembers all revisions, so that's why it's building rows so fast on a repeater region with about 40 images, but like you said with won't keep growing as fast when the revision limit is reached.

Explains a lot, really appreciated!