You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
Fundraising/Data and Integrated Processes/Acoustic Integration
We export data to acoustic once a day from our CiviCRM database. We also import data back in on a more frequent schedule.
(You may hear Acoustic referred to one of it's many previous brandings - such as Silverpop, WCA, Watson Campaign Automation, IBM ). Our supplier in our relationship is Trilogy and our contact there is Brian Sisolak. Brian is a Phab user and can be added to tasks).
List of Scheduled jobs & api
|Job title||Job type||Job name||API||Options used||Notes|
|Get recipient data from Silverpop||Import||omnimail_recipient_load||Omnirecipient.load||mailing_prefix: sp (prepends 'sp' to mailing_idenfier)
limit :50000 (batch limit)
|Throttle is a check to abort if db load is high - based on the number of contributions imported in the specified seconds.|
|Get mailings from Silverpop||Import||omnimail_mailing_load||Omnimailing.load||mail_provider:Silverpop
Implicitly, start_date:1 week ago
|The statistics against the mailings change a little over time as people take a while before they open them (for example). The default job settings load the last week's mailings (over & over again until they age out). We run less frequent jobs to get the last 4 & 8 weeks to pull in any late breaking opens|
|Get group member data from Silverpop||Import||omnimail_groupmember_load||Omnigroupmember.load||group_id:310 - civicrm group contacts are added to
group_identifier:18468760 - Acoustic group we fetch from
limit :10000 (batch limit)
|Fetchs group members from Acoustic and puts them into CiviCRM. The Acoustic group is a search for contact without a contact ID and they are loaded to CiviCRM. We then update the contact ID back to acoustic in our exports and the contact falls out of the group.|
|Set on hold for emails we know not to be emailable due to bounce types from Silverpop||Import actions||omnimail_recipient_process_onhold|
|Unsubscribe contacts who have oped out in Silverpop from CiviCRM||Import actions||omnimail_recipient_process_unsubscribes||Omnirecipient.process_unsubscribes||mail_provider:Silverpop
|Omnimail forget me job delete in Silverpop||Forget me||omnimail_recipient_process_forgetme||Omnirecipient.process_forgetme||mail_provider:Silverpop||Processes requests to forget contacts by sending them to Acoustic|
|Check forget me||Forget me||omnimail_check_progress||OmnimailJobProgress.check||version=4||This (v4) api checks to ensure that Acoustic has reported successful completion of any forgetme requests it has been sent. An email is sent to fr-tech if it finds forget me jobs that have not cleared in a reasonable period of time.|
|Export||silverpop_daily||Calls silverpop_emails_build_export_files then silverpop_emails_upload_files||This is the scheduled daily job|
|Silverpop emails - Build export files||Export||silverpop_emails_build_export_files||Used ad hoc, not scheduled - Builds and exports the csv to be uploaded|
|Export||silverpop_emails_export_only||Used ad hoc, not scheduled - Only builds the silverpop DB & view|
|Silverpop emails - Upload files||Export||silverpop_emails_upload_files||Used ad hoc, not scheduled - Only exports the files|
Exporting data to Acoustic
The export scripts upload 4 separate files to Acoustic which are imported into Acoustic using mappings that Katie H manages within Acoustic. Note that when the files are uploaded Acoustic declares how many 'duplicates' there are. Until early 2020 this number was how many rows duplicated existing DB entries. A change early 2020 means it refers to how many duplicates are in the csv file
The files are
1) DatabaseUpdate-*.csv - this file holds detailed information about contacts who have recently been updated. The update is cumulative - ie. we add/update contacts but don't delete them. We use the field civicrm_contact.modified_date to determine which contacts to include in the update.
2) Unsubscribes-*.csv - used to update the master suppression list.
3) Optout-*.csv - this file is identical to the unsubscribes file. It is used separately from the Acoustic end - updating the main database with opt out information whereas the Unsubscribes csv is used to update the Master Suppression list
Database Update export
This is our main export job. It runs once a day and sends a complete dataset for any contacts within it's upload set. The job only every updates or creates - it NEVER DELETES. Any deleting is done through a separate manual process by Katie.
How the script works
At a high level the script builds a table of contacts who have been modified within the last 7 days and then builds out various tables of information about them. It creates a final view to be exported from these tables. The reason for the tables is to store calculated values to keep each query somewhat manageable.
Some notes on the modified_date & it's limits are in this phab.
The followin files are called by the script
- update_silverpop_staging.sql - builds the 'world' of contacts modified in the last 7 days
- update_table.sql - builds the rest of the details about the contacts and generates the final view
- silverpop_update_world - emails connected to contacts update in the last 7 days
- silverpop_export_staging - key details about each contact
- silverpop_email_map - mapping of email to the highest email id - used for merging details for contacts with the same email
- silverpop_export_stat, silverpop_export_latest, silverpop_export_highest, silverpop_endowment_latest, silverpop_endowment_highest , silverpop_has_recur- calculation tables
- silverpop_export - final collated table of details
- silverpop_export_view - view formatted for export
- silverpop_export_view_full - as per the view but not restricted by modified date
- silverpop_missing_countries - best guess at a contact's country from contribution_tracking (fallback for properly stored country) See phab
- silverpop_countrylangs - best guess at the language to use based on a hard-coded list of countries to languages (fallback for properly stored language) See phab
Who is included in the update?
The upload set is contacts modifed within the configured date range (currently 7 days).
The upload set EXCLUDES contacts who are calculated to be opted out - this means that for opted out contacts or not-opted int contacts their aggregate details and optout details are not updated - which can be confusing. I can only speculate that the reason for this is to reduce the number of non-emailable contacts in Acoustic. Or perhaps it's just one of those things that seemed like a good idea once. Only updates to the silverpop_export table have this restriction - all other tables should be updated with the latest details for these contacts.
When we export we treat all contact records with the same email address as the same contact and export fields that hold aggregate values. ie. if there are 2 contacts with the email address 'email@example.com' then the exported most recent donation date would be the most recent for either contact. The total donated would be the total across both contacts. This is partly because there was a time we didn't have deduping in CiviCRM at all and partly to compensate for gaps in our deduping efforts.
What is included in the update?
The date for the export is compiled into the view `silverpop_export_view` before being exported into a csv. This lives in the silverpop database on the staging server.
|Exported Field||CiviCRM data source||Notes|
|civicrm_email.email WHERE is_primary = 1|
|gender||civicrm_contact_gender_id||Options mapped to Male|Female|Transgender|
|country||civicrm_address.country_id WHERE is_primary = 1||If empty an attempt will be made to get it from contribution_tracking|
|state||civicrm_address.state_province_id WHERE is_primary = 1|
|postal_code||civicrm_address.postal_code WHERE is_primary = 1|
|employer_name||civicrm_contact.organization_name||This is filtered out if not provided by the contact - see notes on matching gifts fields|
|employer_id||civicrm_contact.employer_id||This is filtered out if not provided by the contact - see notes on matching gifts fields|
|IsoLang||civicrm_contact.preferred_language||First 2 letters - eg. en, es, de. If empty a country lookup table is used|
|latest_optin_response||civicrm.civicrm_value_1_communication_4.opt_in||This custom field is exported as 'Yes', 'No' or '' - the blank value indicates the Opt in form has not been presented to them.|
|TS_birth_date||civicrm_contact.birth_date||US format - ie '21/05/1980'|
|TS_estimated_net_worth||civicrm_value_1_prospect_5.estimated_net_worth_144||Fields are transformed to show the labels not db values|
|both_funds_donation_count||Calculated field based on wmf_donor|
|AF_donation_count||Number of donations made to WMF (Annual Fund) by the contact|
|matching_gifts_provider_info_url||civicrm_value_matching_gift.provider_info_url||The civicrm_value_matching_gift table holds the details for the employer, as accessed from the HEP database. We export the value based on using civicrm_contact.employer id on the individual record.
civicrm_value_relationship_metadata.provided_by_donor - a custom data table that extends the relationship in CiviCRM
CiviCRM Communication Fields
On Contact Record
|preferred_communication_method||Whether contact prefers email/phone/mail/SMS/Fax(!). We don't really use this field except possibly for high touch (major gifts) contacts||email, phone||n/a|
|do_not_email||Do not email this contact at any address - there are also separate fields for do_not_phone and do_not_sms
||yes/no||If true set opted_out at acoustic, if false, ignore|
|preferred_language||What language to use with this contact||spanish||IsoLang - we export a 2 character code - eg. 'es'|
|email_greeting_id/email_greeting_custom||These fields allow the contact to specify how they want to be greeted in an email, the ids reference a standard set of options or it can be customized. Generally this is used for high touch contacts where we want to be more deliberate in how to address them||Dear Space Martian||n/a|
|is_opt_out||This is for when the user has opted out of bulk mails - in our DB this would be via SIlverpop or the unsubscribe page ds use. CiviCRM will not permit CiviMails to be sent to users with this flag set. We pass this field to silverpop as 'opted_out' - which is true if is_opt_out OR civicrm_email.on_hold OR do_not_solicit is TRUE||yes/no||If true set opted_out at acoustic, if false, ignore|
On Email Record
|on_hold||Do not email this address as it has bounced or been undeliverable in the past||yes/no|
|is_bulk_mailing||Can send buik mail here||yes/no|
|is_primary||Is this the email address we should use for the contact?||yes/no|
Communication Custom Group - Related to Contact
|opt_in||Whether a contact has explicitly opted in to our mailing lists||yes/no|
|do_not_solicit||Generally entered by Major Gifts||yes/no|
Unsubscribes and opt out export
Acoustic automatically (and non-optionally) excludes 'opted out' and 'suppressed' contacts from mailings. It stored these in 3 places....
- Global suppression list - this is a list of emails who have opted out of all emails coming from Acoustic. We cannot see this list as it covers more than just our organization
- Master suppression list (found here). This list holds contacts that have opted out directly to Acoustic and contacts who we have uploaded to acoustic as opted out. The list cannot be searched from within the main search criteria screens in Acoustic as it is organisation-wide whereas the other screens only cover 1 db. To get around this we also update....
- Opted out setting - this cannot be seen within Acoustic on the contact record but the contact will be missing from search results unless specified on the search screen (see image)
We upload the same contact list to theopt out list and to the main WMF Acoustic DB in 2 separate jobs. This is picked up by 2 separate jobs in Acoustic - 1 adds to the Master Suppression list and the other updates the opt out list. The reason for maintaining these 2 lists is searchabilitity in Acoustic
Who do we opt out?
The nightly uploads include a calculation of which mails are opted_out based on if any of the following fields indicating opt_out in any of the contact records sharing the contact name.
Contacts can become opted out by the following ways
- Filling in a donation form with an opt-in option and not selecting it - the contact record is set to opt_in = 0 on import & on merge the latest is kept
- An action we import from Silverpop causing them to put on hold - see Recipient data
- An action we import from Silverpop causing them to be unsubscribed - see Recipient data
- Donor services manually unsubscribing them based on donor feedback via the Unsubscribe form (our url is civicrm/a/#/email/unsubscribe) - results in is_opt_out being set
- Major gifts marking them do_not_solicit
- The donor accessing our unsubscribe form & their desire not to be subscribed being processed through our queue.
How we generate the list
We basically compile a table of every email that has ever existed in our database (from log_civicrm_email) and then remove all the contacts we have previously determined are not opted out and have not declined to be opted in. We also delete all emails associate with civicrm user accounts.
This results in the deliberate suppressing of some emails that we might NOT want to opt out long term such as
- Non-primary addresses
- Previous email addresses
- Addresses dropped during contact merge
- Addresses that are opted out based on Acoustic suppression that is then lifted (for example they used to suppress all Comcast emails)
- Addresses that chose not to opt in - but later may do so.
How do opted out contacts get opted in again
It's not clear that they do.
We only send Acoustic updates as to who should be opted out. These are imported to BOTH the master suppression list and the opted out list as per above. There is an occassional manual process by Katie to purge the master suppression list but uploading them to opted_out is relatively new and it does not appear that we have any process to update the opted out setting.
Note that our table (silverpop_excluded) only holds updates - to do a full update we need to run the sql directly in mysql without the restrictive modified date and then run the export only job to regenerate the 2 csvs holding ALL contacts who should be opted out. However, the deletion would also need to be figured out
https://engage4.silverpop.com/lists.do?action=listSummary&listId=9574332 (master suppression list)
Matching Gifts Export
We have a CiviCRM extension to pull data about corporate matching gift policies from SSBInfo and store it as Organization records with custom fields. These are exported to Acoustic as follows
|CiviCRM field||Acoustic export field name||notes|
|entity_id||employer_id||Within CiviCRM this field links the custom data we store for matching gifts for this organisation to the contact record|
|name_from_matching_gift_db||employer_name||Name as specified by matching gifts provider|
|matching_gifts_provider_info_url||matching_gifts_provider_info_url||Matching gifts provider info URL (for SSB: https://javamatch.matchinggifts.com/search/companyprofile/wikimedia_iframe/<id> from json: online_resources.id)|
|online_form_url||online_form_url||Online form URL|
|minimum_gift_matched_usd||minimum_gift_matched_usd||Minimum (USD) gift matched|
|match_policy_last_updated||match_policy_last_updated||Matching gifts info last updated date?|
Other matching gift fields
- Matching gifts provider ID
- List of subsidiaries in a text blob (JSON? delimited list? what delimiter?)
- Flag to suppress companies from the frontend dropdown if they come in on the search but turn out to NOT actually match our donations.
Importing back from Acoustic
We retrieve the following types of data back into CiviCRM from Acoustic
- Mailing data - details & content for the emails sent through Acoustic
- Remind me later contacts - contact details of people recorded as remind me later in Acoustic
- Recipient data - details about all actions taken regarding the emails (including being send an email, opens, blocks etc)
Mailing data is the details of the emails sent from Acoustic. They only store this information for 450 days so we need to store it in CiviCRM if we want to see what was sent more than 18 months-ish ago. Bringing this data into CiviCRM also allows people dealing with donors to see what emails a donor has been sent.
We retrieve 2 types of data - content and statistics. The former is saved in the table civicrm_mailing and the latter is saved in the table civicrm_mailing_stats.
We store the following data in the civicrm_mailing table:
|name||The unique name in WCA or WCA reference (the WCA reference is the same value as stored in hash)||20190125_UnitedStates(US)_English(en)_TYCampaign_R2-5-FromGratefulInternet (1)|
|from_name||From name in email||Wikimedia Foundation|
|from_email||Email in from firstname.lastname@example.org|
|replyto_email||Reply to email@example.com|
|subject||Subject line||Our gift to you|
|body_text||Text version of content||Unwrap 15 gorgeous image......|
|body_html||Html version of content||<!DOCTYPE html>....|
|hash||WCA unique mailing reference||sp58317985|
|created_date||Date the mailing was created on||2019-01-25 13:35:36|
|scheduled_date||In practice this is the same as the created date||2019-01-25 13:35:36|
|campaign_id||Reference to civicrm_campaign table - The only additional data this table holds is start_date from WCA. I feel like the reasons for using the civicrm_campaign table may no longer be valid - I can't recall or determine what they were but I think it was to do with additional requirements that didn't eventuate - ie. distinguishing between Major gifts mailings & normal ones||97713|
And we store additional fields in the table civicrm_mailing_stats - these fields are generally aggregate information as calculated by Acoustic
This table is provided by the extension Extended Mailing Stats . - it also adds the table civicrm_mailing_stats_performance which we don't use. There is double up between this & the civicrm_mailing table on some fields but that is because the schema was
determined by an external extension writer with slightly different needs.
|mailing_id||Link to civicrm_mailing table||4|
|mailing_name||Mailing name or SP ref||20190125_UnitedStates(US)_English(en)_TYCampaign_R2-5-FromGratefulInternet (1)|
|created_date||Scheduled Date||2019-01-25 13:35:36|
|start_date||Start date||2019-01-25 14:35:36|
|recipients||Number of mails sent for the mailing||4000|
|delivered||Number sent less number bounced||3950|
|opened_total||Total number of opens (to the extent email clients make that info available)||2500|
|opened_unique||Number of opens by unique recipients||2000|
|unsubscribed||Number of unsubscribes in response to the mailing||60|
|suppressed||Number of emails that WCA suppressed mailing to due to it's own internal listing of recipients who can receive our emails & have not opted out via them||200|
|blocked||Number of emails blocked by the recipient's provider. Providers such as AOL, gmail may block some or all of the emails based on whitelisting and blacklisting.||90|
|abuse_complaints||Number of users who identified our email as 'spam' or complained to WCA||3|
Remind me later contacts
Contacts who are in Silverpop but who do not have a contact_id from CiviCRM are considered to be 'remind me later contacts' - who signed up via silverpop. We retrieve these contacts and from a list at Acoustic based on the criteria of 'has a country and has no contact ID' add them to the group in CiviCRM called 'Silverpop imports' (group id is 310 ). We retrieve the following information about them
- created date
We retrieve information about mailing actions (sending, opening etc) for each contact. This is raw data and contains functional duplication - eg.there will be an 'OPEN' event each time a person opens an email, some mail clients do this over and over as part of some sort of polling.
In addition to importing this information to our database for the purposes of viewing / querying we take action to not email people for some types of events. There are 2 types of actions we take
Put on hold
The api action for this is
omnirecipient.process_onhold this puts the email address on hold - if we get a new email for the contact they will start getting emails again. This involves the following steps
- Set civicrm_email.on_hold to 1
The api action for this is
omnirecipient.unsubscribe - this unsubscribes the contact which is involves the following steps.
- add an Unsubscribe activity
- set civicrm_contact.is_opt_out to 1.
- We also search for other instances of that email address & set civicrm_email.is_bulk_mail to 0 for them. Setting this is_bulk_email doesn't really affect anything at the moment but we also do is for DS unsubscribes.
This information is stored in the civicrm_mailing_provider_data table with the following fields
|contact_identifier||Acoustic contact reference||This can be appended to this url https://engage4.silverpop.com/searchRecipient.do?action=edit&listId=9644238&recipientId= to find the contact in Acoustic|
|contact_id||CiviCRM contact reference||The CiviCRM contact ID is stored in Acoustic against the contact record. If the Acoustic contact does not have a contactId they will be imported into CiviCRM as a 'remind me later' contact - ie a contact who entered our system from an Acoustic form.|
|mailing_identifier||Acoustic Mailing reference||This is the internal Acoustic mailing reference. It consists of the prefix 'sp' and the Acoustic reference number. The 'sp' prefix stands for Silverpop and is a convention from the email team (possibly legacy). The reference number can be appended to this url https://engage4.silverpop.com/ux/#/sentMailing/ to view the mailing in Acoustic
|Contact email||Email that was used for the sending - contact's email may change but this should not.|
|recipient_action_datetime||Timestamp of action||Makes up unique key in combination with contact_identifier & mailing_identifier|
|is_civicrm_updated||Track whether we have performed an action (e.g unsubscribe) yet|
When a contact asks to be forgotten we upload this request to Acoustic. We need to do that once for each of the 'databases' Acoustic holds for us - the list of these is held in a setting within CiviCRM and hard-coded into civicrm.settings.php on production.
Under the hood the following happens
- When the forget me action is take on a contact a row is added to the civicrm_omnimail_job_progress table
- When the scheduled omnimail_recipient_process_forgetme job runs separate rows in the table are created for each Acoustic 'database' (these can be viewed through search kit - as in this search display or the API explorer
- A forget me request is sent for each database
- Next run it checks on the status of the forget me request (by querying Acoustic) - if the request has successfully been completed the row is removed
- A separate job checks the rows do not stall in the job_progress table and sends fail mail if they do.
The forget me request is technically the same as uploading an email in the UI to be subject to a GDPR erasure request as outlined here. Despite the terrifying message it will only deleted the email/s in the uploaded csv.
Running silverpop code locally
On docker builds
This is what works for me.... https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/673176
Pre docker notes
The following notes were writting prior to our docker shift.
To test this locally, you'll need a settings file for silverpop_export. All of the python tools look for settings files in /etc/fundraising, overridden by settings in $HOME/.fundraising/.
So in one of those places, create a silverpop_export.yaml with contents like so:
logging: disable_existing_loggers: false version: 1 formatters: app_prefixed: # TODO: You need to replace this with each app's name, until we # figure out something better. format: "silverpop_export: %(message)s\n" handlers: console: class: logging.StreamHandler stream: ext://sys.stdout error: # Defaults to sys.stderr class: logging.StreamHandler level: ERROR syslog: class: logging.handlers.SysLogHandler level: DEBUG # The app prefix is required to trigger patterns on the other end # of rsyslogd. formatter: app_prefixed # TODO: Custom rsyslogd configurations will require `address` and # `socktype` keys here, for example: address: - localhost - 514 # Magic for socket.SOCK_STREAM, aka. the TCP protocol. socktype: 1 # Note that overriding the root logger is rude. root: # Pass through maximum logging, and let syslog sort it out. level: DEBUG handlers: - syslog - console - error # Directory which will stage the working files working_path: /tmp/ # For archival purposes, how many days of old runs should we keep? 0 means forever. days_to_keep_files: 1 # Login credentials for the silverpop transfer server sftp: host: 18.104.22.168 username: foo password: "abc123" host_key: remote_root: /upload/ log_civicrm_db: db: civicrm civicrm_db: db: civicrm drupal_db: db: drupal silverpop_db: host: localhost user: "silverpopuser" passwd: "pass1234" db: silverpop debug: true charset: "utf8"
Depending on your rsyslogd configuration, you may need to change the handlers/syslog/address key.
The silverpop user should have ALL rights in the silverpop db, and SELECT rights to the other dbs.
cd into your tools folder and test the export like so:
PYTHONPATH=`pwd` python3 silverpop_export/update.py
For most updates, you will be making changes in the silverpop_export/update_table.sql file. Make sure to coordinate with Caitlin Cogdill before deploying anything that will change the output format.